如何在数据框中查找任何位置都包含单个字符的句子

2024-09-25 00:31:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我试着从一个包含一个字符的单词的数据框中打印出句子,不管它是句子的开头、中间还是结尾,我试着的代码是

lookfor = '[' + re.escape("A-Za-z") + ']'

tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)



filtered = tdata[tdata.sentences.str.contains(lookfor, regex=True, na=False)]
print(filtered)

#a sample set
-----------------------------

#hi, how are; you z
#im  w good thanks
#How  am I
#good, what about  you
#my name is alex
#K hello, alex how are you !
#it  is a car
#great news
#thanks!
-----------------------------

expected output 

-----------------------------
#hi, how are; you z
#im  w good thanks
#How  am I
#K hello, alex how are you !
#it  is a car
-----------------------------

即使我在lookfor数组中写下了所有的字母,它也不起作用。它将打印包含这些字母的任何句子。不是当它们单独出现时。有什么想法吗


Tags: csvyoureadisare句子howpd
2条回答

^{}与一个具有单词边界的单词一起使用,并按^{}过滤:

df = df[df['sentences'].str.contains(r'\b\w{1}\b')]
print (df)
                     sentences
0           hi, how are; you z
1            im  w good thanks
2                    How  am I
5  K hello, alex how are you !
6                 it  is a car

编辑:对于排除AI,您可以在比较之前使用replace

df = df[df['sentences'].str.replace(r'\b[AI]\b', '').str.contains(r'\b\w{1}\b')]
print (df)
                     sentences
0           hi, how are; you z
1            im  w good thanks
5  K hello, alex how are you !
6                 it  is a car

或:

df = df[~df['sentences'].str.contains(r'\b[AI]\b') & 
         df['sentences'].str.contains(r'\b\w{1}\b')]
print (df)
                     sentences
0           hi, how are; you z
1            im  w good thanks
5  K hello, alex how are you !
6                 it  is a car

尝试:

df.loc[df.sentences.str.contains(r"([^\w]|^)\w([^\w]|$)")]

产出:

                     sentences
0           hi, how are; you z
1            im  w good thanks
2                    How  am I
5  K hello, alex how are you !
6                 it  is a car

相关问题 更多 >