在Python中用字典键（具有多个值）替换文本更高效

countries = list(CountryList.keys()) for country in countries: for i in range(len(CountryList[country])): lender = CountryList[country][i] country = str(country) lender = str(lender).replace("['",'',).replace("']",'') df['Ingredient'] = df['Ingredient'].str.replace(lender,country)

3条回答

网友

1楼 · 编辑于 2024-05-19 20:27:02

如果您想使用正则表达式，只需为每个键连接CountryListby pipe |中的所有值，然后为每个键调用^{}，这将比您尝试的方式快得多

joined={key: '|'.join(item[0] for item in value) for key,value in CountryList.items()}

for key in joined:
    df['Ingredient'] = df['Ingredient'].str.replace(joined[key], key, regex=True)

输出：

  Dish  Price                 Ingredient
0    A     15  FRUIT FRUIT apricot MEAT 
1    B      8        CEREAL MEAT venison
2    C     20          FRUIT MEAT guinea

另一种方法是反转字典中的键和值，然后对每个key使用dict.get，默认值为key，拆分Ingredient列中的单词：

reversedContries={item[0]:key for key,value in CountryList.items() for item in value}

df['Ingredient'].apply(lambda x: ' '.join(reversedContries.get(y,y) for y in x.split()))

网友

2楼 · 编辑于 2024-05-19 20:27:02

更改CountryList的格式：

import itertools

CountryList2 = {}
for k, v in CountryList.items():
    for i in (itertools.chain.from_iterable(v)):
        CountryList2[i] = k

>>> CountryList2
{'apple': 'FRUIT',
 'orange': 'FRUIT',
 'banana': 'FRUIT',
 'oat': 'CEREAL',
 'wheat': 'CEREAL',
 'corn': 'CEREAL',
 'chicken': 'MEAT',
 'lamb': 'MEAT',
 'pork': 'MEAT',
 'turkey': 'MEAT',
 'duck': 'MEAT'}

现在您可以使用replace：

df['Ingredient'] = df['Ingredient'].replace(CountryList2, regex=True)

>>> df
  Dish  Price                 Ingredient
0    A     15   FRUIT FRUIT apricot MEAT
1    B      8        CEREAL MEAT venison
2    C     20          FRUIT MEAT guinea

网友

3楼 · 编辑于 2024-05-19 20:27:02

通过创建一个字典，其中键是子列表的值，您可以构建product to type的反向索引

product_to_type = {}
for typ, product_lists in CountryList.items():
    for product_list in product_lists:
        for product in product_list:
            product_to_type[product] = typ

一个小小的python魔术可以让您将此步骤压缩到一个生成器中，该生成器创建dict

product_to_type = {product:typ for typ, product_lists in CountryList.items()
   for product_list in product_lists for product in product_list}

然后，您可以创建一个函数来拆分成分并将其映射到类型，并将其应用到数据帧

import pandas as pd

CountryList = {'FRUIT': [['apple'], ['orange'],  ['banana']],
 'CEREAL': [['oat'], ['wheat'],  ['corn']],
 'MEAT': [['chicken'],  ['lamb'],  ['pork'],  ['turkey'], ['duck']]}

product_to_type = {product:typ for typ, product_lists in CountryList.items()
   for product_list in product_lists for product in product_list}

def convert_product_to_type(products):
    return " ".join(product_to_type.get(product, product) 
        for product in products.split(" "))
    
df =  pd.DataFrame({'Dish':  ['A', 'B','C'],
        'Price': [15,8,20],
         'Ingredient': ['apple banana apricot lamb ', 'wheat pork venison', 'orange lamb guinea']
        })

df["Ingredient"] = df["Ingredient"].apply(convert_product_to_type)

print(df)

注意：此解决方案在单词边界上拆分成分列表，假设成分本身没有空格

相关问题更多 >

编程相关推荐

热门问题

热门文章