我有一个dataframe(df
),我想创建一个名为country
的新列,它是通过查看region列来计算的,如果region值出现在EnglandRegions列表中,则country值设置为England,否则它的值将来自region列
请看下面我想要的输出:
name salary region B1salary country
0 Jason 42000 London 42000 England
1 Molly 52000 South West England
2 Tina 36000 East Midland England
3 Jake 24000 Wales Wales
4 Amy 73000 West Midlands England
您可以看到country中的所有值都设置为England,除了分配给Jakes记录的值设置为Wales(因为Wales不在EnglandRegions
列表中)。以下代码产生以下错误:
File "C:/Users/stacey/Documents/scripts/stacey.py", line 20
df['country'] = np.where((df.loc[df['region'].isin(EnglandRegions)),'England', df['region'])
^
SyntaxError: invalid syntax
代码如下:
import pandas as pd
import numpy as np
EnglandRegions = ["London", "South West", "East Midland", "West Midlands", "East Anglia"]
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'salary': [42000, 52000, 36000, 24000, 73000],
'region': ['London', 'South West', 'East Midland', 'Wales', 'West Midlands']}
df = pd.DataFrame(data, columns = ['name', 'salary', 'region'])
df['B1salary'] = np.where((df['salary']>=40000) & (df['salary']<=50000) , df['salary'], '')
df['country'] = np.where((df.loc[df['region'].isin(EnglandRegions)),'England', df['region'])
print(df)
错误引用的具体问题是,您缺少一个]来括起.loc。但是,解决这个问题无论如何都是行不通的。尝试:
df['country'] = np.where(df['region'].isin(EnglandRegions), 'England', df['region'])
不管怎样,这基本上就是你在上面那一行已经拥有的
B1salary
相关问题 更多 >
编程相关推荐