Pandas Tutorial Pandas References

Pandas Series - corr() function



The Pandas Series corr() function computes correlation of a Series with other Series, excluding missing values. Both NA and null values are automatically excluded from the calculation.

Syntax

Series.corr(other, method='pearson', min_periods=None)

Parameters

other Required. Specify a Series with which to compute the correlation.
method Optional. Specify method of correlation. Default is 'pearson'. Possible values are:
  • pearson : standard correlation coefficient
  • kendall : Kendall Tau correlation coefficient
  • spearman : Spearman rank correlation
  • callable : callable with two 1d ndarrays as input and returning a float. Please note that the returned correlation matrix will have 1 along the diagonals and will be symmetric regardless of the callable's behavior.
min_periods Optional. An int to specify minimum number of observations required to have a valid result.

Return Value

Returns correlation with other.

Example: using corr() on a Series

In the example below, the corr() function is used to calculate the correlation of given series.

import pandas as pd
import numpy as np

GDP = pd.Series([1.02, 1.03, 1.04, 0.98])
HDI = pd.Series([1.02, 1.01, 1.02, 1.03])

print("The GDP contains:")
print(GDP, "\n")

print("The HDI contains:")
print(HDI, "\n")

#calculating correlation 
print("GDP.corr(HDI) returns:")
print(GDP.corr(HDI))

The output of the above code will be:

The GDP contains:
0    1.02
1    1.03
2    1.04
3    0.98
dtype: float64 

The HDI contains:
0    1.02
1    1.01
2    1.02
3    1.03
dtype: float64 

GDP.corr(HDI) returns:
-0.776150525706333

Example: using corr() on selected series in a DataFrame

Similarly, the corr() function can be applied on selected series/column of a given DataFrame. Consider the following example.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "GDP": [1.02, 1.03, 1.04, 0.98],
  "GNP": [1.05, 0.99, np.nan, 1.04],
  "HDI": [1.02, 1.01, 1.02, 1.03],
  "Agriculture": [1.02, 1.02, 0.99, 0.98]},
  index= ["Q1", "Q2", "Q3", "Q4"]
)

print("The DataFrame is:")
print(df)

#correlation matrix using GDP and HDI series
print("\ndf['GDP'].corr(df['HDI']) returns:")
print(df['GDP'].corr(df['HDI']))

The output of the above code will be:

The DataFrame is:
     GDP   GNP   HDI  Agriculture
Q1  1.02  1.05  1.02         1.02
Q2  1.03  0.99  1.01         1.02
Q3  1.04   NaN  1.02         0.99
Q4  0.98  1.04  1.03         0.98

df['GDP'].corr(df['HDI']) returns:
-0.776150525706333

❮ Pandas Series - Functions

5