将两列合并为列表数据类型,并从json文件中过滤出单词

2024-09-24 00:33:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我想通过在dataframe中将两列合并为一个列表来创建一个名为“topic”的新列。还需要从我的json抑制文件中过滤这些单词

输入CSV电路文件:

location                 country           
Melbourne, Australia     Australia   
Kuala Lumpur, Malaysia   Malaysia
Sakhir                   Bahrain
Istanbul, Turkey         Turkey 
Monte-Carlo              Monaco


df = pd.read_csv("circuits.csv")    

def dedup(value):
    words = set(value.split(', '))    
    return ', '.join(words)

def worldplay(frame):
    
    #print(df.head(3))
    df['topic'] = df['location'] + ", " + df['country']
    df["topic"] = df['topic'].str.split(', ').apply(set).str.join(', ')
    df['topic'] = df["topic"].apply(dedup)
    
    f = open('exclude.json',) 
    data = json.load(f) 
    index= json.dumps(data["topic"])
    res = [item for item in df['topic'] if item not in index] 
    res_o = [x for xs in res for x in xs.split(',')]
    df['topic'] = res_o   

worldplay(df)  

Json文件:

{
"topic": [
    "Australia", "Melbourne", "Malaysia"
]    
}

我希望我的输出是:

exclude.json文件中的所有字符串都应该是“Topic”列上的takeout

location                 country           topic
Melbourne, Australia     Australia,        []
Kuala Lumpur, Malaysia   Malaysia          ['Kuala Lumpur']
Sakhir                   Bahrain           ['Bahrain', 'Sakhir']
Istanbul, Turkey         Turkey            ['Turkey', 'Istanbul']
Monte-Carlo              Monaco            ['Monte-Carlo', 'Monaco']

Tags: 文件injsondftopicreslocationcountry
1条回答
网友
1楼 · 发布于 2024-09-24 00:33:30

给你:

import pandas as pd

locations = [
    ['Melbourne', 'Australia'],
    ['Kuala Lumpur'],
    ['Sakhir'],
    ['Istanbul'], 
    ['Monte-Carlo'],
]

countries = [
    'Australia',
    'Malaysia',
    'Bahrain',
    'Turkey',
    'Monaco',
]

my_mapping = {
    'country': countries,
    'location': locations,
}

my_df = pd.DataFrame(my_mapping)

not_wanted_stuff = {'Australia', 'Melbourne', 'Malaysia'}

def create_topic_column_on_df(df: pd.DataFrame):
    df['topic'] = None
    for i, (country, cities) in enumerate(zip(df.country, df.location)):
        my_set = set([country, *cities])
        my_set -= not_wanted_stuff
        df.loc[i, 'topic'] = list(my_set)
        
create_topic_column_on_df(my_df)
print(my_df)    

输出为:

     country                location                  topic
0  Australia  [Melbourne, Australia]                     []
1   Malaysia          [Kuala Lumpur]         [Kuala Lumpur]
2    Bahrain                [Sakhir]      [Bahrain, Sakhir]
3     Turkey              [Istanbul]     [Istanbul, Turkey]
4     Monaco           [Monte-Carlo]  [Monte-Carlo, Monaco]

我会让它比上面更通用。。。但这回答了你的问题

编辑 对于topics及其从json中的读取,您可以:

import json
with open('path_to_json', 'r') as f:
    topics = set(json.load(f).get('topic', None))
    print(topics)

我的输出: {'Australia', 'Malaysia', 'Melbourne'}

EDIT2 如果位置是包含不同城市的单个字符串,该怎么办

import pandas as pd

locations = [
    'Melbourne, Australia',
    'Kuala Lumpur',
    'Sakhir',
    'Istanbul', 
    'Monte-Carlo',
]

countries = [
    'Australia',
    'Malaysia',
    'Bahrain',
    'Turkey',
    'Monaco',
]

my_mapping = {
    'country': countries,
    'location': locations,
}

my_df = pd.DataFrame(my_mapping)

not_wanted_stuff = {'Australia', 'Melbourne', 'Malaysia'}


def create_topic_column_on_df(df: pd.DataFrame):
    df['topic'] = None
    for i, (country, cities) in enumerate(zip(df.country, df.location)):
        cities = cities.split(', ')
        my_set = set([country, *cities])
        my_set -= not_wanted_stuff
        df.loc[i, 'topic'] = list(my_set)
        
create_topic_column_on_df(my_df)
print(my_df)    

输出与原始答案中提到的完全相同

相关问题 更多 >