如何在python中创建用于检测缺失值的函数?

2024-09-21 00:47:40 发布

您现在位置:Python中文网/ 问答频道 /正文

   Name      Sex       Age        Ticket_No   Fare
0  Braund    male      22         HN07681     2500
1  NaN       female    42         HN05681     6895
2  peter     male      NaN        KKSN55      800
3  NaN       male      56         HN07681     2500
4  Daisy     female    22         hf55s44     NaN
5  Manson    NaN       48         HN07681     8564
6  Piston    male      NaN        HN07681     5622
7  Racline   female    42         Nh55146     NaN 
8  Nan       male      22         HN07681     4875
9  NaN       NaN      NaN         NaN         NaN

列名称缺少平均值模式 0名称3 NaN NaN 1性别1楠楠楠 2岁2 36 42 22 3票2 4536 4875 2500


Mean/Median/Mode is only for numerical datatype, otherwise should be null.

Tags: noname名称agenanticketmalefemale
1条回答
网友
1楼 · 发布于 2024-09-21 00:47:40

试试这个:

    # Your original df
print(df)

# First drop any rows which are completely NaN
df = df.dropna(how = "all")
# Create a list to hold other lists.
# This will be used as the data for the new dataframe
new_data = []  
# Parse through the columns
for col in df.columns:
    # Create a new list, which will be one row of data in the new dataframe
    # The first item containing only the columns name, 
    # to correspond with the new df's first column
    _list = [col] 
    _list.append(df.dtypes[col]) # DType for that colmn is the second item/second column
    missing = df[col].isna().sum() # Total the number of "NaN" in column
    if missing > 30:
        print("Max total number of missing exceeded")
        continue # Skip this columns and continue on to next column
    _list.append(missing) 

    # Get the mean This will error and pass if it's not possible
    try: mean = df[col].mean()
    except: 
        mean = np.nan
    _list.append(mean) # Append to proper columns position

    # Get the median This will error and pass if it's not possible
    try: median = df[col].median()
    except: 
        median = np.nan
    _list.append(median)

    # Get the mode. This will error and pass if it's not possible
    try: mode = df[col].mode()[1]
    except: 
        mode = np.nan
    _list.append(mode)

    new_data.append(_list)

columns = ["col_Name", "DType", "No_of_Missing", "Mean", "Median", "Mode"]
new_df = pd.DataFrame(new_data, columns = columns)
print("============================")
print(new_df)

输出:

Name     Sex   Age Ticket_No    Fare
0   Braund    male  22.0   HN07681  2500.0
1      NaN  female  42.0   HN05681  6895.0
2    peter    male   NaN    KKSN55   800.0
3      NaN    male  56.0   HN07681  2500.0
4    Daisy  female  22.0   hf55s44     NaN
5   Manson     NaN  48.0   HN07681  8564.0
6   Piston    male   NaN   HN07681  5622.0
7  Racline  female  42.0   Nh55146     NaN
8      NaN    male  22.0   HN07681  4875.0
9      NaN     NaN   NaN       NaN     NaN
============================
    col_Name   DType   No_of_Missing         Mean  Median   Mode
0       Name   object              3          NaN     NaN  Daisy
1        Sex   object              1          NaN     NaN    NaN
2        Age  float64              2    36.285714    42.0    NaN
3  Ticket_No   object              0          NaN     NaN    NaN
4       Fare  float64              2  4536.571429  4875.0    NaN

相关问题 更多 >

    热门问题