Pandas DataFrame - rank() function

The Pandas DataFrame rank() function computes numerical data ranks (1 through n) along specified axis. By default, the function assigns equal values a rank which is the average of the ranks of those values.

Syntax

DataFrame.rank(axis=0, method='average', numeric_only=None, 
               na_option='keep', ascending=True, pct=False)

Parameters

`axis`	`Optional.` Index to direct ranking. It can be {0 or 'index', 1 or 'columns'}. Default is 0.
`method`	`Optional.` Specify how to rank the group of records in case of tie: average: average rank of tied group min: lowest rank in the group max: highest rank in the group first: ranks assigned in order they appear in the array dense: like 'min', but rank always increases by 1 between groups Default is 'average'.
`numeric_only`	`Optional.` Specify True to rank only numeric columns.
`na_option`	`Optional.` Specify how to rank NaN values: keep: assign NaN rank to NaN values top: assign lowest rank to NaN values bottom: assign highest rank to NaN values Default is 'keep'.
`ascending`	`Optional.` Specify whether or not the elements should be ranked in ascending order. Default is True.
`pct`	`Optional.` Specify whether or not to display the returned rankings in percentile form. Default is False.

Return Value

Returns a Series or DataFrame with data ranks as values.

Example: rank() example

The example below demonstrates how this function behaves with the above parameters:

default_rank: Default behavior obtained without using any parameter.
max_rank: When setting method = 'max'. The records that have the same values are ranked using the highest rank (For example - 'x2' and 'x3' are both in the first and second position, rank 2 is assigned).
NA_bottom: When setting na_option = 'bottom'. If there are NaN values in the record they are placed at the bottom of the ranking.
pct_rank: When setting pct = True. The ranking is expressed as percentile rank.

import pandas as pd
import numpy as np

df = pd.DataFrame({
  "values": [20, 10, 10, np.NaN, 30]},
  index= ["x1", "x2", "x3", "x4", "x5"]
)

print(df,"\n")

df['default_rank'] = df['values'].rank()
df['max_rank'] = df['values'].rank(method='max')
df['NA_bottom'] = df['values'].rank(na_option='bottom')
df['pct_rank'] = df['values'].rank(pct=True)

print(df,"\n")

The output of the above code will be:

    values
x1    20.0
x2    10.0
x3    10.0
x4     NaN
x5    30.0 

    values  default_rank  max_rank  NA_bottom  pct_rank
x1    20.0           3.0       3.0        3.0     0.750
x2    10.0           1.5       2.0        1.5     0.375
x3    10.0           1.5       2.0        1.5     0.375
x4     NaN           NaN       NaN        5.0       NaN
x5    30.0           4.0       4.0        4.0     1.000

❮ Pandas DataFrame - Functions