在数据框中搜索部分字符串匹配,并将行放入一个只有其ID的新数据框中

2024-09-28 22:34:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含以下行的发布数据框:

出版物编号、标题、作者姓名、日期 12344,设计风格,Jake Kreath,20071208 12334,为什么的力量,萨曼莎·芬恩,20150704

我向用户请求一个字符串,并使用该字符串搜索标题

目标:搜索数据框以查看标题是否包含用户提供的单词,并返回新数据框中仅包含标题和发布标识的行

这是我目前的代码:

import pandas as pd
from pandas import DataFrame

 publications = pd.read_csv(filepath, sep= "|")

 search_term = input('Enter the term you are looking for: ')
 def stringDataFrame(publications, title, regex):
      newdf = pd.DataFrame()
      for idx, search_term in publications['title'].iteritems():
        if re.search(regex, search_term):
        newdf = concat([publications[publications['title'] == search_term], newdf], ignore_index=True)

        return newdf
print(newdf.stringDataFrame)

Tags: 数据字符串用户import标题dataframepandasfor
2条回答

使用.str.contains.loc的组合

publications.loc[publications.title.str.contains(search_term), ['title', 'publication_ID']]

小心点,因为如果你的标题是'nightlife',有人搜索'night',这将返回一个匹配项。如果这不是你想要的行为,那么你可能需要.str.split


正如jpp指出的,str.contains是区分大小写的。一个简单的解决方法就是确保所有内容都是小写的

title_mask = publications.title.str.lower().str.contains(search_term.lower())
pmids = publications.loc[title_mask, ['title', 'publication_ID']]

现在LordLoRDlord和所有其他排列都将返回有效匹配,并且原始DataFrame的大小写保持不变

完整的例子,但你应该接受@ALollz上面的答案

import pandas as pd
# you publications dataframe
publications = pd.DataFrame({'title':['The Odyssey','The Canterbury Tales','Inferno','The Lord of The Rings', 'Lord of The Flies'],'publication_ID':[1,2,3,4,5]})

search_term = input('Enter the term you are looking for: ')

publications[['title','publication_ID']][publications['title'].str.contains(search_term)]


Enter the term you are looking for: Lord

       title               publication_ID
3   The Lord of The Rings      4
4   Lord of The Flies          5

根据您的错误,您可以使用下面的新代码作为逻辑的一部分过滤掉所有np.nan值:

import pandas as pd
import numpy as np

publications = pd.DataFrame({'title':['The Odyssey','The Canterbury Tales','Inferno','The Lord of The Rings', 'Lord of The Flies',np.nan],'publication_ID':[1,2,3,4,5,6]})

search_term = input('Enter the term you are looking for: ')

publications[['title','publication_ID']][publications['title'].str.contains(search_term) & ~publications['title'].isna()]

Enter the term you are looking for: Lord

    title                 publication_ID
3   The Lord of The Rings       4
4   Lord of The Flies           5

相关问题 更多 >