我有两个数据帧,一个包含一列字符串(df=data),我需要对其进行分类,另一个包含可能的类别和搜索词(df=categories)。我想在“数据”数据框中添加一列,返回基于搜索词的类别。例如:
数据:
**RepairName**
A/C is not cold
flat tyre is c
the tyre needs a repair on left side
the aircon is not cold
类别:
**Category** **SearchTerm**
A/C aircon
A/C A/C
Tyre repair
Tyre flat
期望结果 数据:
**RepairName** **Category**
A/C is not cold A/C
flat tyre is c Tyre
the tyre needs a repair on left side Tyre
the aircon is not cold A/C
我用apply尝试了以下lambda函数。我不确定我的列引用是否位于正确的位置:
data['Category'] = data['RepairName'].apply(lambda x: categories['Category'] if categories['SearchTerm'] in x else "")
data['Category'] = [categories['Category'] if categories['SearchTerm'] in data['RepairName'] else 0]
但我一直收到错误消息:
TypeError: 'in <string>' requires string as left operand, not Series
这提供了基于SearchTerm的类别是否存在的正确/错误信息,但是我无法返回与搜索术语关联的类别:
data['containName']=data['RepairName'].str.contains('|'.join(categories['SearchTerm']),case=False)
这两种方法有时都有效,但并非一直有效(可能是因为我的一些搜索词不止一个词?)
data['Category'] = [
next((c for c, k in categories.values if k in s), None) for s in data['RepairName']]
d = dict(zip(categories['SearchTerm'], categories['Category']))
data['CategoryCheck'] = [next((d[y] for y in x.split() if y in d), None) for x in data['RepairName']]
我们做
str.findall
然后map
在我确保所有列都是小写的情况下(为了更好地度量,我还删除了连字符和括号),这种方法就奏效了:
相关问题 更多 >
编程相关推荐