Pandas Tutorial Pandas References

Pandas DataFrame - rank() function



The Pandas DataFrame rank() function computes numerical data ranks (1 through n) along specified axis. By default, the function assigns equal values a rank which is the average of the ranks of those values.

Syntax

DataFrame.rank(axis=0, method='average', numeric_only=None, 
               na_option='keep', ascending=True, pct=False)

Parameters

axis Optional. Index to direct ranking. It can be {0 or 'index', 1 or 'columns'}. Default is 0.
method Optional. Specify how to rank the group of records in case of tie:
  • average: average rank of tied group
  • min: lowest rank in the group
  • max: highest rank in the group
  • first: ranks assigned in order they appear in the array
  • dense: like 'min', but rank always increases by 1 between groups
Default is 'average'.
numeric_only Optional. Specify True to rank only numeric columns.
na_option Optional. Specify how to rank NaN values:
  • keep: assign NaN rank to NaN values
  • top: assign lowest rank to NaN values
  • bottom: assign highest rank to NaN values
Default is 'keep'.
ascending Optional. Specify whether or not the elements should be ranked in ascending order. Default is True.
pct Optional. Specify whether or not to display the returned rankings in percentile form. Default is False.

Return Value

Returns a Series or DataFrame with data ranks as values.

Example: rank() example

The example below demonstrates how this function behaves with the above parameters:

  • default_rank: Default behavior obtained without using any parameter.
  • max_rank: When setting method = 'max'. The records that have the same values are ranked using the highest rank (For example - 'x2' and 'x3' are both in the first and second position, rank 2 is assigned).
  • NA_bottom: When setting na_option = 'bottom'. If there are NaN values in the record they are placed at the bottom of the ranking.
  • pct_rank: When setting pct = True. The ranking is expressed as percentile rank.
import pandas as pd
import numpy as np

df = pd.DataFrame({
  "values": [20, 10, 10, np.NaN, 30]},
  index= ["x1", "x2", "x3", "x4", "x5"]
)

print(df,"\n")

df['default_rank'] = df['values'].rank()
df['max_rank'] = df['values'].rank(method='max')
df['NA_bottom'] = df['values'].rank(na_option='bottom')
df['pct_rank'] = df['values'].rank(pct=True)

print(df,"\n")

The output of the above code will be:

    values
x1    20.0
x2    10.0
x3    10.0
x4     NaN
x5    30.0 

    values  default_rank  max_rank  NA_bottom  pct_rank
x1    20.0           3.0       3.0        3.0     0.750
x2    10.0           1.5       2.0        1.5     0.375
x3    10.0           1.5       2.0        1.5     0.375
x4     NaN           NaN       NaN        5.0       NaN
x5    30.0           4.0       4.0        4.0     1.000 

❮ Pandas DataFrame - Functions

5