Pandas Tutorial Pandas References

Pandas DataFrame - plot.hist() function



A histogram is a graphical representation of the distribution of numerical data. To construct a histogram, the steps are given below:

  • Bin (or bucket) the range of values.
  • Divide the entire range of values into a series of intervals.
  • Count how many values fall into each interval.

The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often (but not required to be) of equal size.

The DataFrame.plot.hist() functions groups the values of all given Series in the DataFrame into bins and draws all bins in one matplotlib.axes.Axes. This is useful when the DataFrame's Series are in a similar scale.

Syntax

DataFrame.plot.hist(by=None, bins=10)

Parameters

by Optional. Specify columns in the DataFrame to group by as str or sequence.
bins Optional. Specify the bins as int. It defines the number of equal-width bins. default is 10.

Return Value

Return a histogram plot.

Example: Histogram example

In the example below, a DataFrame df is created. A histogram is created using this dataframe:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#providing seed for reproducibility
#of the result
np.random.seed(10)

df = pd.DataFrame( 
        np.random.randint(1, 10, 5000),
        columns = ['Sample1']
)
df['Sample2'] = df['Sample1'] + np.random.randint(1, 10, 5000)

#displaying top 10 rows of the DataFrame
print(df.head(10),"\n")
#creating the plot
df.plot.hist(bins=18, alpha=0.5)

#displaying the plot
plt.show()

The output of the above code will be:

   Sample1  Sample2
0        5       11
1        1        9
2        2        5
3        1        2
4        2        8
5        9       15
6        1        2
7        9       14
8        7       16
9        5        9 
Bar Plot

❮ Pandas DataFrame - Functions