Python：对事务进行分类的最有效方法

transactions: [ { "id": "20200117-16045-0", "date": "2020-01-17", "creationTime": null, "text": "SuperB Vesterbro T 74637", "originalText": "SuperB Vesterbro T 74637", "details": null, "category": null, "amount": { "value": -160.45, "currency": "DKK" }, "balance": { "value": 12572.68, "currency": "DKK" }, "type": "Card", "state": "Booked" }, { "id": "20200117-4800-0", "date": "2020-01-17", "creationTime": null, "text": "Rent 45228", "originalText": "Rent 45228", "details": null, "category": null, "amount": { "value": -48.00, "currency": "DKK" }, "balance": { "value": 12733.13, "currency": "DKK" }, "type": "Card", "state": "Booked" }, { "id": "20200114-1200-0", "date": "2020-01-14", "creationTime": null, "text": "Superbest 86125", "originalText": "SUPERBEST 86125", "details": null, "category": null, "amount": { "value": -12.00, "currency": "DKK" }, "balance": { "value": 12781.13, "currency": "DKK" }, "type": "Card", "state": "Booked" } ]

1条回答

网友

1楼 · 发布于 2024-06-26 02:27:57

IIUC

我们可以从字典中创建管道分隔列表，并使用.loc进行赋值

print(df)
for k,v in CATEGORIES.items():
    pat = '|'.join(v)
    df.loc[df['text'].str.contains(pat),'category'] = k
print(df[['text','category']])
                       text   category
0  SuperB Vesterbro T 74637  Groceries
1         Rent        45228    Housing
2  Superbest          86125  Groceries

更有效的解决方案：

我们创建一个包含所有值的列表，并在重新创建字典的同时使用str.extract提取它们，因此每个值现在都是我们将映射到目标数据帧的键

words = []
mapping_dict = {}
for k,v in CATEGORIES.items():
    for item in v:
        words.append(item)
        mapping_dict[item] = k


ext = df['text'].str.extract(f"({'|'.join(words)})")
df['category'] = ext[0].map(mapping_dict)
print(df)
                       text   category
0  SuperB Vesterbro T 74637  Groceries
1         Rent        45228    Housing
2  Superbest          86125  Groceries

相关问题更多 >

编程相关推荐

热门问题

热门文章