如何在python中检查非空列的约束?

2024-05-18 10:16:52 发布

您现在位置:Python中文网/ 问答频道 /正文

df1型:

   ColumnName   Nullable
0  name         True
1  Desgn        True
2  Emp_number   False
3  Salary       True

df2型:

   name     Desgn     Emp_number  Salary
0  krul                125796    45000
1  arnold   lawyer     789632    25000
2  daisy    engg       256498    
3  alex                456985    65884
4  mandy    arch       456258    36958
5  krul     painter    
6  perry               789632 
7  timu     lawyer     
8  timy     lawyer     789632    69822
9  daisy    engg       
10 daisy    engg       256498    54869

如何检查df2中可为Null的列(nullable==True)缺少的值的数量,如果不可为Null的列缺少值,则引发错误,否则用median或mode替换?你知道吗


Tags: nametruenumbernulldf1df2salaryemp
3条回答

没有for循环:

import pandas as pd
from io import StringIO

df2 = pd.read_table(StringIO("""   name     Desgn     Emp_number  Salary
0  krul     nan           125796    45000
1  arnold   lawyer     789632    25000
2  daisy    engg       256498    nan
3  alex      nan          456985    65884
4  mandy    arch       456258    36958
5  krul     painter    nan       nan
6  perry      nan         789632    nan
7  timu     lawyer     nan     nan
8  timy     lawyer     789632    69822
9  daisy    engg       nan       nan
10 daisy    engg       256498    54869"""), sep='\s+')

df1 = pd.read_table(StringIO("""   ColumnName   Nullable
0  name         True
1  Desgn        True
2  Emp_number   False
3  Salary       True"""), sep='\s+')


# Transpose switches dtype, so we need to know what they were originally
a = df2.T.loc[df1.loc[df1.Nullable==True, 'ColumnName']].T
a = a.astype(df2[a.columns].dtypes.to_dict())

# Replace with median
df2[a.columns] = a.fillna(a.median())

# If any null in non nullable, raise ValueError
non_nullable_has_null = df2.T.loc[df1.loc[df1.Nullable==False, 'ColumnName']].T.isnull().any().any()
if non_nullable_has_null:
    raise ValueError('non nullable has a null')

您可以创建一个新对象并计算空值

new_df = df2.replace(to_replace=[None, ''], value=pd.np.nan) 
new_df.isnull().sum() 

In [424]: df.isnull().sum()                                                                                                                                                                                 
Out[424]: 
name          0
Desgn         3
Emp_number    3
Salary        5
dtype: int64
for idx, row in df1.iterrows():
    if not row["Nullable"]:
        # Get all the rows in df2 which has that column as null
        nulls = df2[df2[row["ColumnName"]].isnull()]

        # No of rows that has the column null
        print(len(nulls))

相关问题 更多 >