从列表列表中提取值并添加到新列

2024-09-25 12:26:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,其中一列是包含地址信息的列表

我的数据:

import pandas as pd

data = [['location1', [(123, 'Number'),('Main', 'Street'),('New York', 'City')]], ['location2', [('Broadway', 'Street'),('New York', 'City'),(11111, 'ZIP')]], ['location3', [(987, 'Number'),('Grand', 'Street'),('Chicago', 'City'), (55555,'ZIP')]]]

df = pd.DataFrame(data, columns = ['Location', 'Address_Info'])

这将创建一个如下所示的数据帧:

    Location    Address_Info
0   location1   [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')]
1   location2   [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')]
2   location3   [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')]

我需要提取包含“Number”值的列表。然后,我需要将该列表中的数字添加到新列中的数据框中

生成的数据帧如下所示:

    Location    Address_Info                                                                 Number
0   location1   [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')]                  123
1   location2   [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')]               NaN
2   location3   [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')]  987

我遇到的一个问题是“地址信息”中没有包含“号码”的列表


Tags: 数据streetnumbercity列表newmainzip
3条回答

将列表分解为行,然后将元组展开为列,并仅保留带有Number的行

df['Number'] = df['Address_Info'].explode() \
                                 .apply(pd.Series) \
                                 .rename(columns={0: 'value', 1: 'key'} \
                                 .query('key == "Number"')['value']
>>> df
    Location                                       Address_Info Number
0  location1  [(123, Number), (Main, Street), (New York, City)]    123
1  location2  [(Broadway, Street), (New York, City), (11111,...    NaN
2  location3  [(987, Number), (Grand, Street), (Chicago, Cit...    987

您可以使用列表理解和str访问器:

df['Address_Info'].apply(lambda l: [i[0] for i in l if i[1] == 'Number']).str[0]

输出:

0    123.0
1      NaN
2    987.0

要将其保存在新列中,请执行以下操作:

df['Number'] = (df['Address_Info']
                  .apply(lambda l: [i[0] for i in l if i[1] == 'Number'])
                  .str[0]
               )

NB。如果您需要几个数字,可以省略.str[0],然后您将得到一个数字列表(如果没有,则为空):

df['Address_Info'].apply(lambda l: [i[0] for i in l if i[1] == 'Number'])

输出:

0    [123]
1       []
2    [987]

在创建DF之前准备好数据

def get_number(lst):
    for x in lst:
        if x[1] == 'Number':
            return x[0]
    return None

data = [['location1', [(123, 'Number'),('Main', 'Street'),('New York', 'City')]], ['location2', [('Broadway', 'Street'),('New York', 'City'),(11111, 'ZIP')]], ['location3', [(987, 'Number'),('Grand', 'Street'),('Chicago', 'City'), (55555,'ZIP')]]]
for entry in data:
    entry.append(get_number(entry[1]))
print(data)
# now you can create the DF 

输出

[['location1', [(123, 'Number'), ('Main', 'Street'), ('New York', 'City')], 123], ['location2', [('Broadway', 'Street'), ('New York', 'City'), (11111, 'ZIP')], None], ['location3', [(987, 'Number'), ('Grand', 'Street'), ('Chicago', 'City'), (55555, 'ZIP')], 987]]

相关问题 更多 >