使用列表创建新列

import pandas as pd import numpy as np City_Name_List = ['Amsterdam', 'Antwerp', 'Brussels', 'Ghent', 'Asheville', 'Austin', 'Boston', 'Broward County', 'Cambridge', 'Chicago', 'Clark County Nv', 'Columbus', 'Denver', 'Hawaii', 'Jersey City', 'Los Angeles', 'Nashville', 'New Orleans', 'New York City', 'Oakland', 'Pacific Grove', 'Portland', 'Rhode Island', 'Salem Or', 'San Diego'] data = {'host_identity_verified':['t','t','t','t','t','t','t','t','t','t'], 'neighbourhood':['Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands', 'NaN', 'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands', 'NaN', 'Amsterdam, North Holland, Netherlands', 'Amsterdam, North Holland, Netherlands'], 'neighbourhood_cleansed':['Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West', 'Oostelijk Havengebied - Indische Buurt', 'Centrum-Oost', 'Centrum-West', 'Centrum-West', 'Centrum-West'], 'neighbourhood_group_cleansed': ['NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN'], 'latitude':[ 52.36575, 52.36509, 52.37297, 52.38761, 52.36719, 52.36575, 52.36509, 52.37297, 52.38761, 52.36719]} df = pd.DataFrame(data) df['City'] = [x for x in City_Name_List if x in df.loc[:,'host_identity_verified':'latitude'].values][0]

Traceback (most recent call last): File "C:/Users/YAZAN/PycharmProjects/Yazan_Work/try.py", line 63, in <module> df['City'] = [x for x in City_Name_List if x in df.loc[:,'host_identity_verified':'latitude'].values][0] IndexError: list index out of range

3条回答

网友

1楼 · 编辑于 2024-10-06 12:16:59

使用^{}

df['City'] = df.apply(
    lambda row: [x if x in row.loc['neighbourhood'] for x in City_Name_List][0],
    axis=1
)

执行上述操作后，df['city']将包含一个城市（通过将其包含在City_Name_List中定义），如果在每行的'neighbourhood'列中找到一个城市

改良溶液

您可以更明确地指定City应该填充在每行的'neighbourhood'字段中第一次出现,之前的第一个子字符串上。如果'neighbourhood'列在结构上可靠地统一，这可能是一个好主意，因为它有助于缓解由类似命名的城市、作为City_Name_List中其他城市的子串的城市等引起的任何不必要的行为

df['City'] = df.apply(
    lambda row: [x if x in row.loc['neighbourhood'].split(',')[0] for x in City_Name_List][0],
    axis=1
)

注意：上述解决方案只是您如何解决所遇到问题的示例。它们没有考虑异常、边缘情况等的正确处理。您应该在代码中注意考虑这些因素

网友

2楼 · 编辑于 2024-10-06 12:16:59

df['City'] = df['neighbourhood'].apply(lambda x: [i for i in x.split(',') if i in City_Name_List])
df['City'] = df['City'].apply(lambda x: "" if len(x) == 0 else x[0])

网友

3楼 · 编辑于 2024-10-06 12:16:59

问题在于，当您说x in df.loc[]时，您并不是在检查城市名称是否在每个特定字符串中，而是检查城市名称是否在整个序列中，而事实并非如此。你需要的是这样的东西：

df['city'] = [x if x in City_Name_list else '' for x[0] in df['neighbourhood'].str.split(',')]

这将沿逗号拆分df['Neighbourt']中的每一行，并返回第一个值，然后检查该值是否在城市名称列表中，如果是，则将其放入“城市”系列中

使用^{}

改良溶液

相关问题更多 >

编程相关推荐

热门问题

热门文章