Pandas Tutorial Pandas References

Pandas DataFrame - cov() function



The Pandas DataFrame cov() function computes pairwise covariance of columns, excluding NA/null values. The returned DataFrame is the covariance matrix of the columns of the DataFrame. Both NA and null values are automatically excluded from the calculation.

Syntax

DataFrame.cov(min_periods=None, ddof=1)

Parameters

min_periods Optional. An int to specify minimum number of observations required per pair of columns to have a valid result. Default is None.
ddof Optional. Specify Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Return Value

Returns the covariance matrix of the series of the DataFrame.

Example: Creating a covariance matrix using whole DataFrame

In the example below, a DataFrame report is created. The cov() function is used to create a covariance matrix using all numeric columns of the DataFrame.

import pandas as pd
import numpy as np

report = pd.DataFrame({
  "GDP": [1.02, 1.03, 1.04, 0.98],
  "GNP": [1.05, 0.99, np.nan, 1.04],
  "HDI": [1.02, 1.01, 1.02, 1.03]},
  index= ["Q1", "Q2", "Q3", "Q4"]
)

print(report,"\n")
print(report.cov())

The output of the above code will be:

     GDP   GNP   HDI
Q1  1.02  1.05  1.02
Q2  1.03  0.99  1.01
Q3  1.04   NaN  1.02
Q4  0.98  1.04  1.03 

          GDP       GNP       HDI
GDP  0.000692 -0.000450 -0.000167
GNP -0.000450  0.001033  0.000250
HDI -0.000167  0.000250  0.000067

Example: Creating a covariance matrix using selected columns

Instead of whole DataFrame, the cov() function can be applied on selected columns. Consider the following example.

import pandas as pd
import numpy as np

report = pd.DataFrame({
  "GDP": [1.02, 1.03, 1.04, 0.98],
  "GNP": [1.05, 0.99, np.nan, 1.04],
  "HDI": [1.02, 1.01, 1.02, 1.03],
  "Agriculture": [1.02, 1.02, 0.99, 0.98]},
  index= ["Q1", "Q2", "Q3", "Q4"]
)

#displaying the dataframe
print(report,"\n")

#covariance matrix using two columns
print("report[['GDP', 'HDI']].cov() returns:")
print(report[['GDP', 'HDI']].cov(),"\n")

#covariance matrix using three columns
print("report[['GDP', 'HDI', 'Agriculture']].cov() returns:")
print(report[['GDP', 'HDI', 'Agriculture']].cov(),"\n")

The output of the above code will be:

     GDP   GNP   HDI  Agriculture
Q1  1.02  1.05  1.02         1.02
Q2  1.03  0.99  1.01         1.02
Q3  1.04   NaN  1.02         0.99
Q4  0.98  1.04  1.03         0.98 

report[['GDP', 'HDI']].cov() returns:
          GDP       HDI
GDP  0.000692 -0.000167
HDI -0.000167  0.000067 

report[['GDP', 'HDI', 'Agriculture']].cov() returns:
                  GDP       HDI  Agriculture
GDP          0.000692 -0.000167     0.000275
HDI         -0.000167  0.000067    -0.000133
Agriculture  0.000275 -0.000133     0.000425 

❮ Pandas DataFrame - Functions