如何在python中创建用于检测缺失值的函数？

Name Sex Age Ticket_No Fare 0 Braund male 22 HN07681 2500 1 NaN female 42 HN05681 6895 2 peter male NaN KKSN55 800 3 NaN male 56 HN07681 2500 4 Daisy female 22 hf55s44 NaN 5 Manson NaN 48 HN07681 8564 6 Piston male NaN HN07681 5622 7 Racline female 42 Nh55146 NaN 8 Nan male 22 HN07681 4875 9 NaN NaN NaN NaN NaN

1条回答

网友

1楼 · 发布于 2024-09-21 00:47:40

试试这个：

    # Your original df
print(df)

# First drop any rows which are completely NaN
df = df.dropna(how = "all")
# Create a list to hold other lists.
# This will be used as the data for the new dataframe
new_data = []  
# Parse through the columns
for col in df.columns:
    # Create a new list, which will be one row of data in the new dataframe
    # The first item containing only the columns name, 
    # to correspond with the new df's first column
    _list = [col] 
    _list.append(df.dtypes[col]) # DType for that colmn is the second item/second column
    missing = df[col].isna().sum() # Total the number of "NaN" in column
    if missing > 30:
        print("Max total number of missing exceeded")
        continue # Skip this columns and continue on to next column
    _list.append(missing) 

    # Get the mean This will error and pass if it's not possible
    try: mean = df[col].mean()
    except: 
        mean = np.nan
    _list.append(mean) # Append to proper columns position

    # Get the median This will error and pass if it's not possible
    try: median = df[col].median()
    except: 
        median = np.nan
    _list.append(median)

    # Get the mode. This will error and pass if it's not possible
    try: mode = df[col].mode()[1]
    except: 
        mode = np.nan
    _list.append(mode)

    new_data.append(_list)

columns = ["col_Name", "DType", "No_of_Missing", "Mean", "Median", "Mode"]
new_df = pd.DataFrame(new_data, columns = columns)
print("============================")
print(new_df)

输出：

Name     Sex   Age Ticket_No    Fare
0   Braund    male  22.0   HN07681  2500.0
1      NaN  female  42.0   HN05681  6895.0
2    peter    male   NaN    KKSN55   800.0
3      NaN    male  56.0   HN07681  2500.0
4    Daisy  female  22.0   hf55s44     NaN
5   Manson     NaN  48.0   HN07681  8564.0
6   Piston    male   NaN   HN07681  5622.0
7  Racline  female  42.0   Nh55146     NaN
8      NaN    male  22.0   HN07681  4875.0
9      NaN     NaN   NaN       NaN     NaN
============================
    col_Name   DType   No_of_Missing         Mean  Median   Mode
0       Name   object              3          NaN     NaN  Daisy
1        Sex   object              1          NaN     NaN    NaN
2        Age  float64              2    36.285714    42.0    NaN
3  Ticket_No   object              0          NaN     NaN    NaN
4       Fare  float64              2  4536.571429  4875.0    NaN

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在python中创建用于检测缺失值的函数？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >