在Python中使用pandas从单元格和更新列中获取有效值

2024-10-03 23:20:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我在数据框下面有这个。实际的数据是巨大的,并且有很多不确定因素。你知道吗

        Date        ID      Code   Value   Value1  Value2     Value3
0 1945-12-30  H0010603  ZZZ008-2  zzz=ID   AAC=10    NaN        NaN
1 1945-12-30  H0010603  ZZZ008-2  zzz=ID   AAC=01    NaN        NaN
2 1945-12-30  H0010603  ZZZ008-2    NaN       NaN  VEC=1        NaN
3 1945-12-30  H0010603  ZZZ008-2    NaN       NaN  VEC=2    AAC= 1 A
4 1945-12-30  H0010603  ZZZ008-2    NaN       NaN  VEC=3    AAC= 1 A

这是最终的预期数据。你知道吗

        Date        ID      Code    zzz       AAC    VEC       AAC.1
0 1945-12-30  H0010603  ZZZ008-2     ID        10    NaN        NaN
1 1945-12-30  H0010603  ZZZ008-2     ID        01    NaN        NaN
2 1945-12-30  H0010603  ZZZ008-2    NaN       NaN      1        NaN
3 1945-12-30  H0010603  ZZZ008-2    NaN       NaN      2        1 A
4 1945-12-30  H0010603  ZZZ008-2    NaN       NaN      3        1 A

我需要用单元格中的值来更新列名。你知道吗

df = pd.read_excel(xlPath, 0)
writer = pd.ExcelWriter(xlPath,
                        engine='xlsxwriter',
                        date_format='mm/dd/yyy',
                        datetime_format='mm/dd/yyyy')
df = df.fillna('')
for ColumnName, values in df.iteritems():
    for index, value in enumerate(values):
        if '=' in str(value):
            df.set_value(index, ColumnName, str(value).split('=')[1])
            NewColumnName = str(value).split('=')[0]
            df.rename(columns={ColumnName: NewColumnName}, inplace=True)

df.to_excel(writer, index=False)
writer.save()

但由于一列被重复,这是错误的。 所以,我想,我可以循环通过df,得到给定列中的第一个有效值,并将其放入一个列表中。你知道吗

AllColumns = list(df.columns.values)
NewColNameList = []
for ColumnName, values in df.iteritems():
    a = 0
    for index, value in enumerate(values):
        while a < len(values):
            if '=' in str(value):
                if value != '':
                    print(index, values)
                    NewColNameList.append(value)
                    break
                a += 1
print(NewColNameList)

但我的While循环没有想象的那么强。任何帮助获得所需的数据框是感激的。你知道吗


Tags: 数据iniddfforindexvaluenan
1条回答
网友
1楼 · 发布于 2024-10-03 23:20:17

IIUC公司:

数据集:

In [314]: df
Out[314]:
         Date        ID      Code   Value  Value1 Value2    Value3
0  1945-12-30  H0010603  ZZZ008-2  zzz=ID  AAC=10    NaN       NaN
1  1945-12-30  H0010603  ZZZ008-2  zzz=ID  AAC=01    NaN       NaN
2  1945-12-30  H0010603  ZZZ008-2     NaN     NaN  VEC=1       NaN
3  1945-12-30  H0010603  ZZZ008-2     NaN     NaN  VEC=2  AAC= 1 A
4  1945-12-30  H0010603  ZZZ008-2     NaN     NaN  VEC=3  AAC= 1 A

解决方案:

def get_col_name(col):
    if col.dtype != object:
        return col.name
    s = col.loc[col.str.contains(r'\w+\=').idxmax()]
    if s and '=' in s:
        return s.split('=')[0]
    return col.name

df = (df.rename(columns=lambda x: get_col_name(df[x]) if x.startswith('Value') else x)
        .replace(r'\w+\=', '', regex=True))

结果:

In [83]: %paste
df = (df.rename(columns=lambda x: get_col_name(df[x]) if x.startswith('Value') else x)
        .replace(r'\w+\=', '', regex=True))
##   End pasted text  

In [84]: df
Out[84]:
         Date        ID      Code  zzz  AAC  VEC   AAC
0  1945-12-30  H0010603  ZZZ008-2   ID   10  NaN   NaN
1  1945-12-30  H0010603  ZZZ008-2   ID   01  NaN   NaN
2  1945-12-30  H0010603  ZZZ008-2  NaN  NaN    1   NaN
3  1945-12-30  H0010603  ZZZ008-2  NaN  NaN    2   1 A
4  1945-12-30  H0010603  ZZZ008-2  NaN  NaN    3   1 A

相关问题 更多 >