One ,drop_duplicates Function usage

pandas Medium drop_duplicates() Function can be passed through SQL Keywords in distinct To understand , The data set is de duplicated according to the specified field .

Two ,drop_duplicates() The specific parameters of the function

*
usage :
DataFrame.drop_duplicates(subset=None, keep=‘first’, inplace=False)

*
Parameter description

parameter explain
subset Duplicate according to the specified column name , Default entire dataset
keep Optional {‘first’, ‘last’,
False}, default first, That is, the first occurrence of the duplicate value is retained by default , And delete other duplicate data ,False Delete all duplicate data .
inplace Do you want to modify the dataset itself , default False
Three ,drop_duplicates Examples of usage

* De duplication according to the specified field , Keep the first occurrence of data import pandas as pd # Create data frame df=pd.DataFrame({ 'a':[1,2,
4,3,3,3,4], 'b':[2,3,3,4,4,5,3] }) print(' Before de duplication :\n',df) # According to field a Carry out de duplication , Keep the first occurrence of data df.
drop_duplicates(['a'],keep='first',inplace=True) print(' After de duplication :\n',df) >>> Before de duplication : a b
0 1 2 1 2 3 2 4 3 3 3 4 4 3 4 5 3 5 6 4 3 After de duplication : a b 0 1 2 1 2 3 2 4 3 3 3 4

Technology