nan:not a number

inf:infinity; Positive infinity

numpy Medium nan and inf All of them float type

 

t!=t return bool An array of types ( matrix )

np.count_nonzero() Returns a non in the array 0 Number of elements ;true The number of .

np.isnan() return bool An array of types .

So here comes the question... , In a set of data, simply put nan Replace with 0, Is it suitable ? What is the impact ?

such as , Replace all with 0 after , If the average value before replacement is greater than 0, After replacement, the mean is bound to decrease , So a more general way is to replace the missing values with the mean values ( median ) Or delete the line with missing value directly

demo.py(numpy, Set the nan Replace with the corresponding mean value ):
# coding=utf-8 import numpy as np def fill_ndarray(t1): for i in
range(t1.shape[1]): # Traverse each column ( In each column nan Replace with the mean value of the column ) temp_col = t1[:, i] # Current column
nan_num = np.count_nonzero(temp_col != temp_col) if nan_num != 0: #
Not for 0, Indicates that there are nan temp_not_nan_col = temp_col[temp_col == temp_col] #
Remove nan Of ndarray # Select current as nan Location of , Assign the value to not nan Mean value of temp_col[np.isnan(temp_col)] =
temp_not_nan_col.mean() # mean() Mean value . return t1 if __name__ == '__main__': t1
= np.array([[ 0. 1. 2. 3. 4. 5.], [ 6. 7. np.nan, np.nan, np.nan,
np.nan], [12. 13. 14. 15. 16. 17.], [18. 19. 20. 21. 22. 23.]]) t1 =
fill_ndarray(t1) # take nan Replace with the corresponding mean value print(t1) ''' [[ 0. 1. 2. 3. 4. 5.] [ 6. 7. 12.
13. 14. 15.] [12. 13. 14. 15. 16. 17.] [18. 19. 20. 21. 22. 23.]] '''
 

 

Technology