如何为列表中的几个列填充数据框的值?

2024-09-28 17:30:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一张名单

labels = [['A','B','D','E'], ['G','J','H'],['C','H']]

我有一个数据框

          A    B    C    D    E    G    H    J
         NAN  NAN  NAN  NAN  NAN  NAN  NAN  NAN
  Df =   NAN  NAN  NAN  NAN  NAN  NAN  NAN  NAN
         NAN  NAN  NAN  NAN  NAN  NAN  NAN  NAN   

我希望一次只取一个列表,对照Df中的列名检查其值,如果列名与列表中的字符串匹配,则用值1填充其单元格,否则填充0

预期产出:

          A    B    C    D    E    G    H    J
          1    1    0    1    1    0    0    0 
  Df =    0    0    0    0    0    1    1    1
          0    0    1    0    0    0    1    0

如果选择第一个列表,则必须根据上述条件填充Df的第一行。类似地,第二个列表应该填充Df的第二行


Tags: 数据字符串df列表labelsnan条件名单
3条回答
import pandas as pd

labels = [['A','B','D','E'], ['G','J','H'],['C','H']]
unique = set(x for l in labels for x in l)

data = []
for item in labels:
    raw = {}
    for value in unique:
        if value in item:
            raw[value] = 1
        else:
            raw[value] = 0
    data.append(raw)

df = pd.DataFrame.from_dict(data)
df = df.reindex(sorted(df.columns), axis=1)

输出:

   A  B  C  D  E  G  H  J
0  1  1  0  1  1  0  0  0
1  0  0  0  0  0  1  1  1
2  0  0  1  0  0  0  1  0

这里有一个粗略的方法:

for row, col in enumerate(labels):
    df.loc[row,col] = 1
print(df.fillna(0).astype(int))

输出:

   A  B  C  D  E  G  H  J
0  1  1  0  1  1  0  0  0
1  0  0  0  0  0  1  1  1
2  0  0  1  0  0  0  1  0

您可以对此使用列表理解,并迭代标签

[df.set_value(i,x,1)  for i,x in enumerate(labels)]    
df.fillna(0).astype('int8')

输出

    A   B   C   D   E   G   H   J
0   1   1   0   1   1   0   0   0
1   0   0   0   0   0   1   1   1
2   0   0   1   0   0   0   1   0

相关问题 更多 >