Pandas Tutorial Pandas References

Pandas DataFrame - var() function



The Pandas DataFrame var() function returns the unbiased variance over the specified axis. The syntax for using this function is mentioned below:

Syntax

DataFrame.var(axis=None, skipna=None, level=None, 
              ddof=1, numeric_only=None)

Parameters

axis Optional. Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index', variances are generated for each column. If 1 or 'columns', variances are generated for each row. Default: 0
skipna Optional. Specify True to exclude NA/null values when computing the result. Default is True.
level Optional. Specify level (int or str). If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series. A str specifies the level name.
ddof Optional. Specify Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_only Optional. Specify True to include only float, int or boolean data. Default: False

Return Value

Returns variance of Series or DataFrame if a level is specified.

Example: using var() column-wise on whole DataFrame

In the example below, a DataFrame df is created. The var() function is used to get the variance for each column.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "Bonus": [5, 3, 2, 4],
  "Salary": [60, 62, 65, 59]},
  index= ["John", "Marry", "Sam", "Jo"]
)

print("The DataFrame is:")
print(df)

#variance of all entries column-wise
print("\ndf.var() returns:")
print(df.var())

The output of the above code will be:

The DataFrame is:
       Bonus  Salary
John       5      60
Marry      3      62
Sam        2      65
Jo         4      59

df.var() returns:
Bonus     1.666667
Salary    7.000000
dtype: float64

Example: using var() row-wise on whole DataFrame

To perform the operation row-wise, the axis parameter can be set to 1.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "Bonus": [5, 3, 2, 4],
  "Salary": [60, 62, 65, 59]},
  index= ["John", "Marry", "Sam", "Jo"]
)

print("The DataFrame is:")
print(df)

#variance of all entries row-wise
print("\ndf.var(axis=1) returns:")
print(df.var(axis=1))

The output of the above code will be:

The DataFrame is:
       Bonus  Salary
John       5      60
Marry      3      62
Sam        2      65
Jo         4      59

df.var(axis=1) returns:
John     1512.5
Marry    1740.5
Sam      1984.5
Jo       1512.5
dtype: float64

Example: using var() on selected column

Instead of whole DataFrame, the var() function can be applied on selected columns. Consider the following example.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "Bonus": [5, 3, 2, 4],
  "Last Salary": [58, 60, 63, 57],
  "Salary": [60, 62, 65, 59]},
  index= ["John", "Marry", "Sam", "Jo"]
)

print("The DataFrame is:")
print(df)

#variance of single column
print("\ndf['Salary'].var() returns:")
print(df["Salary"].var())

#variance of multiple columns
print("\ndf[['Salary', 'Bonus']].var() returns:")
print(df[["Salary", "Bonus"]].var())

The output of the above code will be:

The DataFrame is:
       Bonus  Last Salary  Salary
John       5           58      60
Marry      3           60      62
Sam        2           63      65
Jo         4           57      59

df['Salary'].var() returns:
7.0

df[['Salary', 'Bonus']].var() returns:
Salary    7.000000
Bonus     1.666667
dtype: float64

❮ Pandas DataFrame - Functions

5