我想通过在dataframe中将两列合并为一个列表来创建一个名为“topic”的新列。还需要从我的json抑制文件中过滤这些单词
输入CSV电路文件:
location country
Melbourne, Australia Australia
Kuala Lumpur, Malaysia Malaysia
Sakhir Bahrain
Istanbul, Turkey Turkey
Monte-Carlo Monaco
df = pd.read_csv("circuits.csv")
def dedup(value):
words = set(value.split(', '))
return ', '.join(words)
def worldplay(frame):
#print(df.head(3))
df['topic'] = df['location'] + ", " + df['country']
df["topic"] = df['topic'].str.split(', ').apply(set).str.join(', ')
df['topic'] = df["topic"].apply(dedup)
f = open('exclude.json',)
data = json.load(f)
index= json.dumps(data["topic"])
res = [item for item in df['topic'] if item not in index]
res_o = [x for xs in res for x in xs.split(',')]
df['topic'] = res_o
worldplay(df)
Json文件:
{
"topic": [
"Australia", "Melbourne", "Malaysia"
]
}
我希望我的输出是:
exclude.json文件中的所有字符串都应该是“Topic”列上的takeout
location country topic
Melbourne, Australia Australia, []
Kuala Lumpur, Malaysia Malaysia ['Kuala Lumpur']
Sakhir Bahrain ['Bahrain', 'Sakhir']
Istanbul, Turkey Turkey ['Turkey', 'Istanbul']
Monte-Carlo Monaco ['Monte-Carlo', 'Monaco']
给你:
输出为:
我会让它比上面更通用。。。但这回答了你的问题
编辑 对于
topics
及其从json中的读取,您可以:我的输出:
{'Australia', 'Malaysia', 'Melbourne'}
EDIT2 如果位置是包含不同城市的单个字符串,该怎么办
输出与原始答案中提到的完全相同
相关问题 更多 >
编程相关推荐