在数据框中搜索部分字符串匹配，并将行放入一个只有其ID的新数据框中

import pandas as pd from pandas import DataFrame publications = pd.read_csv(filepath, sep= "|") search_term = input('Enter the term you are looking for: ') def stringDataFrame(publications, title, regex): newdf = pd.DataFrame() for idx, search_term in publications['title'].iteritems(): if re.search(regex, search_term): newdf = concat([publications[publications['title'] == search_term], newdf], ignore_index=True) return newdf print(newdf.stringDataFrame)

2条回答

网友

1楼 · 编辑于 2024-09-28 22:34:40

使用.str.contains和.loc的组合

publications.loc[publications.title.str.contains(search_term), ['title', 'publication_ID']]

小心点，因为如果你的标题是'nightlife'，有人搜索'night'，这将返回一个匹配项。如果这不是你想要的行为，那么你可能需要.str.split

正如jpp指出的，str.contains是区分大小写的。一个简单的解决方法就是确保所有内容都是小写的

title_mask = publications.title.str.lower().str.contains(search_term.lower())
pmids = publications.loc[title_mask, ['title', 'publication_ID']]

现在Lord、LoRD、lord和所有其他排列都将返回有效匹配，并且原始DataFrame的大小写保持不变

网友

2楼 · 编辑于 2024-09-28 22:34:40

完整的例子，但你应该接受@ALollz上面的答案

import pandas as pd
# you publications dataframe
publications = pd.DataFrame({'title':['The Odyssey','The Canterbury Tales','Inferno','The Lord of The Rings', 'Lord of The Flies'],'publication_ID':[1,2,3,4,5]})

search_term = input('Enter the term you are looking for: ')

publications[['title','publication_ID']][publications['title'].str.contains(search_term)]


Enter the term you are looking for: Lord

       title               publication_ID
3   The Lord of The Rings      4
4   Lord of The Flies          5

根据您的错误，您可以使用下面的新代码作为逻辑的一部分过滤掉所有np.nan值：

import pandas as pd
import numpy as np

publications = pd.DataFrame({'title':['The Odyssey','The Canterbury Tales','Inferno','The Lord of The Rings', 'Lord of The Flies',np.nan],'publication_ID':[1,2,3,4,5,6]})

search_term = input('Enter the term you are looking for: ')

publications[['title','publication_ID']][publications['title'].str.contains(search_term) & ~publications['title'].isna()]

Enter the term you are looking for: Lord

    title                 publication_ID
3   The Lord of The Rings       4
4   Lord of The Flies           5

相关问题更多 >

编程相关推荐

热门问题

热门文章