Pandas Tutorial Pandas Resources
Python Java C++ C C# PHP R SQL DS Algo InterviewQ

Pandas DataFrame - skew() function



The Pandas DataFrame - skew() function returns the unbiased skew over the specified axis. The syntax for using this function is mentioned below:

Syntax

DataFrame.skew(axis=None, skipna=None, level=None, numeric_only=None)

Parameters

axis Optional. Specify {0 or 'index', 1 or 'columns'}. If 0 or 'index' skewness are generated for each column. If 1 or 'columns' skewness are generated for each row. Default: 0
skipna Optional. Specify True to exclude NA/null values when computing the result. Default is True.
level Optional. Specify level (int or str). If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series. A str specifies the level name.
numeric_only Optional. Specify True to include only float, int or boolean data. Default: False

Return Value

Returns skew of Series or DataFrame if a level is specified.

Example: Using skew() column-wise on whole DataFrame

In the example below, a DataFrame df is created. The skew() function is used to get the skew for each column.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 3),
   index = pd.date_range('1/1/2018', periods=5),
   columns = ['col_A', 'col_B', 'col_C'])

print("The DataFrame is:")
print(df)

#skew of all entries column-wise
print("\ndf.skew() returns:")
print(df.skew())

The output of the above code will be:

The DataFrame is:
               col_A     col_B     col_C
2018-01-01  1.005540  0.881717 -2.223541
2018-01-02 -0.834890  1.381463 -0.372747
2018-01-03  0.756971 -0.460319  0.675641
2018-01-04  0.592442  1.006504  0.297356
2018-01-05  0.360232 -1.351610  0.637709

df.skew() returns:
col_A   -1.653272
col_B   -0.815506
col_C   -1.643528
dtype: float64

Example: Using skew() row-wise on whole DataFrame

To get the row-wise sum, the axis parameter can set to 1. Consider the example below.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 3),
   index = pd.date_range('1/1/2018', periods=5),
   columns = ['col_A', 'col_B', 'col_C'])

print("The DataFrame is:")
print(df)

#skew of all entries row-wise
print("\ndf.skew(axis=1) returns:")
print(df.skew(axis=1))

The output of the above code will be:

The DataFrame is:
               col_A     col_B     col_C
2018-01-01 -0.269205  1.119245 -0.257763
2018-01-02 -1.112495 -0.706605 -1.411835
2018-01-03  0.133028 -0.682958 -0.724209
2018-01-04 -1.022292 -0.586516 -0.368645
2018-01-05  0.936610  1.420745 -0.555835

df.skew(axis=1) returns:
2018-01-01    1.731651
2018-01-02    0.446999
2018-01-03    1.717875
2018-01-04   -0.935304
2018-01-05   -1.311800
Freq: D, dtype: float64

Example: Using skew() on selected column

Instead of whole data frame, the skew() function can be applied on selected columns. Consider the following example.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 3),
   index = pd.date_range('1/1/2018', periods=5),
   columns = ['col_A', 'col_B', 'col_C'])

print("The DataFrame is:")
print(df)

#skew of single column
print("\ndf['col_B'].skew() returns:")
print(df["col_B"].skew())

#skew of multiple columns
print("\ndf[['col_B', 'col_C']].skew() returns:")
print(df[["col_B", "col_C"]].skew())

The output of the above code will be:

The DataFrame is:
               col_A     col_B     col_C
2018-01-01  0.700766 -0.381063  0.690955
2018-01-02 -0.071808 -0.021680  0.619261
2018-01-03  0.066570 -0.253957  1.177085
2018-01-04  1.171888 -0.102689 -1.387702
2018-01-05 -0.516739 -1.335787  0.228582

df['col_B'].skew() returns:
-1.870672707435965

df[['col_B', 'col_C']].skew() returns:
col_B   -1.870673
col_C   -1.593254
dtype: float64

❮ Pandas DataFrame - Functions

5