我有一个要分类的事务的大列表。 看起来是这样的:
transactions: [
{
"id": "20200117-16045-0",
"date": "2020-01-17",
"creationTime": null,
"text": "SuperB Vesterbro T 74637",
"originalText": "SuperB Vesterbro T 74637",
"details": null,
"category": null,
"amount": {
"value": -160.45,
"currency": "DKK"
},
"balance": {
"value": 12572.68,
"currency": "DKK"
},
"type": "Card",
"state": "Booked"
},
{
"id": "20200117-4800-0",
"date": "2020-01-17",
"creationTime": null,
"text": "Rent 45228",
"originalText": "Rent 45228",
"details": null,
"category": null,
"amount": {
"value": -48.00,
"currency": "DKK"
},
"balance": {
"value": 12733.13,
"currency": "DKK"
},
"type": "Card",
"state": "Booked"
},
{
"id": "20200114-1200-0",
"date": "2020-01-14",
"creationTime": null,
"text": "Superbest 86125",
"originalText": "SUPERBEST 86125",
"details": null,
"category": null,
"amount": {
"value": -12.00,
"currency": "DKK"
},
"balance": {
"value": 12781.13,
"currency": "DKK"
},
"type": "Card",
"state": "Booked"
}
]
我像这样加载数据:
with open('transactions.json') as transactions:
file = json.load(transactions)
data = json_normalize(file)['transactions'][0]
return pd.DataFrame(data)
到目前为止,我有以下类别,我想按以下方式对交易进行分组:
CATEGORIES = {
'Groceries': ['SuperB', 'Superbest'],
'Housing': ['Insurance', 'Rent']
}
现在,我想循环遍历数据帧中的每一行,并对每个事务进行分组。
我想通过检查text
是否包含CATEGORIES
字典中的一个值来实现这一点
如果是这样,该事务应该被分类为CATEGORIES
字典的键-例如Groceries
我如何才能最有效地做到这一点
IIUC
我们可以从字典中创建管道分隔列表,并使用
.loc
进行赋值更有效的解决方案:
我们创建一个包含所有值的列表,并在重新创建字典的同时使用
str.extract
提取它们,因此每个值现在都是我们将映射到目标数据帧的键相关问题 更多 >
编程相关推荐