使用正则表达式从列中提取字符串

2024-10-03 23:27:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从title列中提取以下字符串,并将其附加到名为hazard_extract的新列中,如下面的示例所示

test = {'title': ['Other', 'Microbiological - Listeria', 'Extraneous Material', 'Chemical', 'Chemical - Histamine', 'Labelling, Other'], 'hazard_extract':['Other', 'Microbiological', 'Extraneous Material', 'Chemical', 'Chemical', 'Labelling']}
example = pd.DataFrame(test)
example

    title                       hazard_extract
0   Other                       Other
1   Microbiological - Listeria  Microbiological
2   Extraneous Material         Extraneous Material
3   Chemical                    Chemical
4   Chemical - Histamine        Chemical
5   Labelling, Other            Labelling

但是,我正在使用下面的代码-如果字符串没有-,,则不会提取字符串。在这种情况下,如何提取Extraneous Material中的单词和ChemicalOther中的单个单词

example['hazard_extract'] = example['title'].str.extract(r'^(.*?),? ')
    title                       hazard_extract
0   Other                       NaN
1   Microbiological - Listeria  Microbiological
2   Extraneous Material         Extraneous
3   Chemical                    NaN
4   Chemical - Histamine        Chemical
5   Labelling, Other            Labelling

非常感谢你的帮助


Tags: 字符串testtitleexampleextract单词materialhazard
3条回答

最简单的方法是使用split

example['title'].str.split(r'[-,]').str[0].str.strip()
0                  Other
1       Microbiological 
2    Extraneous Material
3               Chemical
4              Chemical 
5              Labelling

试试这个:

example['title'].str.extract(r'^(\w*\s*\w*)\s*[\,\-]?.*')

不需要复杂的正则表达式:

import pandas as pd

test = {'title': ['Other', 'Microbiological - Listeria', 'Extraneous Material', 'Chemical', 'Chemical - Histamine', 'Labelling, Other']}
example = pd.DataFrame(test)
print(example)
print()
example['hazard_extract'] = example['title'].str.split(' -|,').str[0]
print(example)
                        title
0                       Other
1  Microbiological - Listeria
2         Extraneous Material
3                    Chemical
4        Chemical - Histamine
5            Labelling, Other

                        title       hazard_extract
0                       Other                Other
1  Microbiological - Listeria      Microbiological
2         Extraneous Material  Extraneous Material
3                    Chemical             Chemical
4        Chemical - Histamine             Chemical
5            Labelling, Other            Labelling

相关问题 更多 >