如何使基于txt的关键字提取器在Pandas数据帧上更有效地使用'other'作为异常处理程序

2024-10-05 14:26:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我在pandas dataframe上创建了基于txt的关键字提取器,其中other作为异常处理程序,但代码似乎很长。这是我的数据集

id  description
1   description: kartu debit 20/10 indomaretcipete r
4   description: biaya adm
15  description: tarikan atm 14/10
20  description: trsf ws269b100420/home credit 0372540
22  description: kartu debit 09/10 starbuckspasaraya

下面是名为text.txt的txt文件

indomaret
starbucks
home credit

这是我的密码

with open('text.txt') as f: 
    content = f.readlines()
content = [x.strip() for x in content ]
def ambil(inp):
    try:
        out = []
        for x in content:      
            if x in inp:
                out.append(x)
        if len(out) == 0:
            return 'other'
        else:
            output = ' '.join(out)
            return output
    except:
        return 'other'

df['keyword'] = df['description'].apply(ambil)

这是输出

id  description                                         keyword
1   description: kartu debit 20/10 indomaretcipete r    indomaret
4   description: biaya adm                              other
15  description: tarikan atm 14/10                      other
20  description: trsf ws269b100420/home credit 0372540  home credit
22  description: kartu debit 09/10 starbuckspasaraya    starbucks

我想把我的代码缩短一些,用现有的熊猫函数,该怎么办呢


Tags: 代码intxtidhomereturndescriptioncontent