我有一个excel表格,其中包含许多软件名称,如Visual studio 2012、Visual studio 2013、Visual studio 2017、Adobe Reader English、Adobe Reader Deutsche、Power shell 4.0、Power shell 2.0、Power shell 5.0。你知道吗
我只想得到一个相关的软件版本名。例如,在本例中,我希望我的输出是VisualStudio2013、PowerShell4.0、AdobeReaderEnglish,剩下的就不做了。我正在使用Python NLP。我已经删除了所有的垃圾字符和版本号,但我不知道如何进一步进行。你知道吗
有没有进一步建设的想法?在得到两个没有任何数字和垃圾字符的软件名之后,我尝试了序列匹配,但是结果并不准确和有效。你知道吗
import pandas as pd
from nltk.tokenize import wordpunct_tokenize
df = pd.read_csv('C:\\Users\\533471\\Desktop\\Book2.csv', encoding='Windows-1252')
saved_column = df.RowLabels[:]
second_column = df.RowLabels[:]
print(saved_column)
for eachcol in saved_column:
eachword = eachcol.split()
print(eachword)
for secondcol in second_column:
sentence = None
wordo = None
punct = None
x = []
copy = []
secondword = secondcol.split()[:]
####proceed only if the first word is equal
if eachword[0] in secondword[0]:
print("true")
sentence = eachword[:]
sentence += secondword
####splitting according to punctuations.
for token in sentence:
word = wordpunct_tokenize(token)
if wordo is None:
wordo = word
else:
wordo += word
####Removing all the punctuations.
punct = [item for item in wordo if item.isalpha()]
t = punct[:]
t.reverse()
for p in punct:
print(p)
if len(x) > 0:
print(x, "Appended")
a = str(p)
x += [p]
if p == x[0]:
break
else:
print("list is empty")
x += [p]
x.pop()
for z in t:
print(z)
if len(copy) > 0:
print(copy, "appended")
copy += [z]
if z == punct[0]:
break
else:
print("list is empty")
copy += [z]
print(copy)
else:
print("false")
目前没有回答
相关问题 更多 >
编程相关推荐