根据另一列中包含的字符串在新列中添加值

2024-07-08 19:36:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有数据传真

    date        descriptions           Code
1. 1/1/2020     this is aPple          6546
2. 21/8/2019    this is fan for him    4478
3. 15/3/2020    this is ball of hockey 5577
4. 12/2/2018    this is Green apple    7899
5. 13/3/2002    this is iron fan       7788
6. 14/5/2020    this ball is soft      9991

我想创建一个新的列“category”,其值为如果描述列中有表达式apple、fan、ball(大写或小写字母),则应在category列中分别输入值A001、F009、B099,所需的数据框为

    date        descriptions           Code   category
1. 1/1/2020     this is aPple          6546   A001
2. 21/8/2019    this is fan for him    4478   F009
3. 15/3/2020    this is ball of hockey 5577   B099
4. 12/2/2018    this is Green apple    7899   A001
5. 13/3/2002    this is iron fan       7788   F009
6. 14/5/2020    this ball is soft      9991   B099

Tags: 数据applefordateiscodethiscategory
2条回答

您可以使用numpy select,它允许多个条件选择

content = ["apple", "fan", "ball"]
condlist = [df.descriptions.str.lower().str.contains(letter) for letter in content]
choicelist = ["A001", "F009", "B099"]
df["category"] = np.select(condlist, choicelist)
df


    date    descriptions                Code    category
0   1/1/2020    this is aPple           6546    A001
1   21/8/2019   this is fan for him     4478    F009
2   15/3/2020   this is ball of hockey  5577    B099
3   12/2/2018   this is Green apple     7899    A001
4   13/3/2002   this is iron fan        7788    F009
5   14/5/2020   this ball is soft       9991    B099

使用str.extract从基于字符串的列中获取子字符串

d = {'apple': 'A001', 'ball': 'B099', 'fan': 'F009'}

df['category'] = (
    df.descriptions
      .str.lower()
      .str.extract('(' + '|'.join(d.keys()) + ')')
      .squeeze().map(d)
)

相关问题 更多 >

    热门问题