NumPy Tutorial NumPy Statistics NumPy References

NumPy - Geometric Distribution



Geometric Distribution is a discrete probability distribution and it expresses the probability distribution of the random variable (X) representing number of Bernoulli trials needed to get one success.

The probability mass function (pmf) of geometric distribution is defined as:

Geometric Distribution

Where, k is the number of Bernoulli trials (k = 1,2...) and p is probability of success in each trial.

An geometric distribution has mean 1/p and variance (1-p)/p2.

The cumulative distribution function (cdf) evaluated at k, is the probability that the random variable (X) will take a value less than or equal to k. The cdf of geometric distribution is defined as:

Geometric Distribution

The NumPy random.geometric() function returns random samples from a geometric distribution.

Syntax

numpy.random.geometric(p, size=None)

Parameters

p Required. Specify probability of success in each trial, must be in range [0, 1]. float or array_like of floats.
size Optional. Specify output shape. int or tuple of ints. If the given shape is (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if n and p are both scalars. Otherwise, np.broadcast(n, p).size samples are drawn.

Return Value

Returns samples from the parameterized geometric distribution. ndarray or scalar.

Example: Values from geometric distribution

In the example below, random.geometric() function is used to create a matrix of given shape containing random values drawn from specified geometric distribution.

import numpy as np

size = (5,3)

sample = np.random.geometric(0.5, size)
print(sample)

The possible output of the above code could be:

[[2 2 4]
 [2 2 1]
 [1 1 1]
 [1 1 2]
 [5 2 1]]

Plotting geometric distribution

Example: Histogram plot

Matplotlib is a plotting library for the Python which can be used to plot the probability mass function (pmf) of geometric distribution using hist() function.

import matplotlib.pyplot as plt
import numpy as np

#fixing the seed for reproducibility
#of the result
np.random.seed(10)

size = 10000
#drawing 10000 sample from 
#geometric distribution
sample = np.random.geometric(0.5, size)
bin = np.arange(0,20,1)

plt.hist(sample, bins=bin, edgecolor='blue') 
plt.title("Geometric Distribution") 
plt.show()

The output of the above code will be:

Geometric Distribution

Example: Comparing cdfs

Multiple cumulative distribution functions can be compared graphically using Seaborn ecdfplot() function. In the example below, cdf of three geometric distributions (each with different success probability) are compared.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#fixing the seed for reproducibility
#of the result
np.random.seed(10)

size = 1000
#plotting 1000 sample from 
#different geometric distribution
sns.ecdfplot(np.random.geometric(0.2, size))
sns.ecdfplot(np.random.geometric(0.5, size))
sns.ecdfplot(np.random.geometric(0.8, size))

plt.legend(["$p = 0.2$", 
            "$p = 0.5$", 
            "$p = 0.8$"])
plt.show()

The output of the above code will be:

Geometric Distribution