如何使用字典键标记Pandas系列

2024-10-01 07:35:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个列名为description的pandas系列(数据),并创建了一个新列Label,该列只检查description列中是否存在字典键,如果存在,则根据找到的键标记描述,例如

description                         Label

427096  alat airtime recharge       bills
1093255 alat nip transfer          transfers
549792  alat transfer              transfers
1163429 wema ussd transfer         transfers

字典

   labels = { #transfer
          "tnf":"transfers", "trsf":"transfers","trtr":"transfers", "trans":"transfers",
 
           #bills
           "otp":"bills","fee":"bills","charge":"bills",

           #airtime
          "recharge":"airtime","airtime":"airtime","top-up":"airtime",
      }

以下是执行检查的函数:

labs = []
    # Labelling the transaction according to the dictionary defined
    for i in data:
        f = 0
        #check if j is in data[i]
        for j in list(labels.keys()):
            if j in i:
                labs.append(labels[j])
                f = 1
                break
        if f == 0:
            labs.append("others")
    df["Label"] = pd.DataFrame(labs)

这里的主要问题是函数不检查精确匹配,像airtime recharge这样的键应该标记为airtime,字典键trans也将事务标记为transfer


Tags: in标记translabelsif字典descriptiontransfers
1条回答
网友
1楼 · 发布于 2024-10-01 07:35:27

问题是您没有检查精确匹配,您只是检查字符串中是否有子字符串。因此'recharge'中的'charge'之类的东西将返回True

因此,您可以使用正则表达式,也可以将描述拆分为一个列表,并检查该单词是否在列表中

这不是最有效的方法,但你可以这样做:

import pandas as pd


df = pd.DataFrame([['alat airtime recharge'],
                  ['alat nip transfer'],
                  ['alat transfer'] ,
                  ['wema ussd transfer']],columns=['description'])


labels = { #transfer
          "tnf":"transfers", "trsf":"transfers","trtr":"transfers", "trans":"transfers",
 
           #bills
           "otp":"bills","fee":"bills","charge":"bills",

           #airtime
          "recharge":"airtime","airtime":"airtime","top-up":"airtime",
      }


labs = []
data = df['description']
# Labelling the transaction according to the dictionary defined
for i in data:
    check_list = i.split()
    f = 0
    #check if j is in data[i]
    loop = True
    while loop==True:
        for j in list(labels.keys()):
            if loop==False:
                break
            for x in check_list:
                if loop==False:
                    break
                if x.startswith(j):
                    labs.append(labels[j])
                    f = 1
                    loop=False
    if f == 0:
        labs.append("others")
df["Label"] = pd.DataFrame(labs)

输出:

print(df)
             description      Label
0  alat airtime recharge    airtime
1      alat nip transfer  transfers
2          alat transfer  transfers
3     wema ussd transfer  transfers

相关问题 更多 >