Pandas Tutorial Pandas References

Pandas Series - drop_duplicates() function



The Pandas Series drop_duplicates() function returns Series with duplicate values removed.

Syntax

Series.drop_duplicates(keep='first', inplace=False)

Parameters

keep Optional. Determines which duplicates (if any) to keep. Possible values are:
  • first : (Default) Drop duplicates except for the first occurrence
  • last : Drop duplicates except for the last occurrence
  • False : Drop all duplicates
inplace Optional. If True, performs operation in place and returns None.

Return Value

Returns Series with duplicates dropped or None if inplace=True.

Example: drop_duplicates() example

In the example below, the drop_duplicates() function is used to drop duplicate values from a given Series.

import pandas as pd
import numpy as np

x = pd.Series(['UK', 'USA', 'UK', 'FRA', 'USA', 'JPN'])

print("The Series contains:")
print(x, "\n")

#removes duplicate values
print("x.drop_duplicates() returns:")
print(x.drop_duplicates(),"\n")

The output of the above code will be:

The Series contains:
0     UK
1    USA
2     UK
3    FRA
4    USA
5    JPN
dtype: object 

x.drop_duplicates() returns:
0     UK
1    USA
3    FRA
5    JPN
dtype: object 

Example: using keep parameter

By using keep parameter, we can specify which duplicate value to keep. Consider the example below:

import pandas as pd
import numpy as np

x = pd.Series(['UK', 'USA', 'UK', 'FRA', 'USA', 'JPN'])

print("The Series contains:")
print(x, "\n")

#keeping first duplicate value
print("x.drop_duplicates(keep='first') returns:")
print(x.drop_duplicates(keep='first'),"\n")

#keeping last duplicate value
print("x.drop_duplicates(keep='last') returns:")
print(x.drop_duplicates(keep='last'),"\n")

The output of the above code will be:

The Series contains:
0     UK
1    USA
2     UK
3    FRA
4    USA
5    JPN
dtype: object 

x.drop_duplicates(keep='first') returns:
0     UK
1    USA
3    FRA
5    JPN
dtype: object 

x.drop_duplicates(keep='last') returns:
2     UK
3    FRA
4    USA
5    JPN
dtype: object 

Example: using inplace parameter

By using inplace parameter, the duplicate values can be replaced in place from the Series. Consider the example below:

import pandas as pd
import numpy as np

x = pd.Series(['UK', 'USA', 'UK', 'FRA', 'USA', 'JPN'])

print("The Series contains:")
print(x, "\n")

#removing duplicates in place 
x.drop_duplicates(inplace=True)

print("The Series contains:")
print(x, "\n")

The output of the above code will be:

The Series contains:
0     UK
1    USA
2     UK
3    FRA
4    USA
5    JPN
dtype: object 

The Series contains:
0     UK
1    USA
3    FRA
5    JPN
dtype: object 

❮ Pandas Series - Functions

5