Python来获取相关的软件名

2024-10-05 14:31:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个excel表格,其中包含许多软件名称,如Visual studio 2012、Visual studio 2013、Visual studio 2017、Adobe Reader English、Adobe Reader Deutsche、Power shell 4.0、Power shell 2.0、Power shell 5.0。你知道吗

我只想得到一个相关的软件版本名。例如,在本例中,我希望我的输出是VisualStudio2013、PowerShell4.0、AdobeReaderEnglish,剩下的就不做了。我正在使用Python NLP。我已经删除了所有的垃圾字符和版本号,但我不知道如何进一步进行。你知道吗

有没有进一步建设的想法?在得到两个没有任何数字和垃圾字符的软件名之后,我尝试了序列匹配,但是结果并不准确和有效。你知道吗

import pandas as pd
from nltk.tokenize import wordpunct_tokenize

df = pd.read_csv('C:\\Users\\533471\\Desktop\\Book2.csv', encoding='Windows-1252')
saved_column = df.RowLabels[:]
second_column = df.RowLabels[:]

print(saved_column)

for eachcol in saved_column:
    eachword = eachcol.split()
    print(eachword)

    for secondcol in second_column:
        sentence = None
        wordo = None
        punct = None

        x = []
        copy = []
        secondword = secondcol.split()[:]

        ####proceed only if the first word is equal
        if eachword[0] in secondword[0]:
            print("true")
            sentence = eachword[:]
            sentence += secondword

            ####splitting according to punctuations.
            for token in sentence:
                word = wordpunct_tokenize(token)

                if wordo is None:
                    wordo = word
                else:
                    wordo += word

            ####Removing all the punctuations.
            punct = [item for item in wordo if item.isalpha()]
            t = punct[:]
            t.reverse()

            for p in punct:
                print(p)
                if len(x) > 0:
                    print(x, "Appended")
                    a = str(p)
                    x += [p]
                    if p == x[0]:
                        break
                else:
                    print("list is empty")

                    x += [p]

            x.pop()
            for z in t:
                print(z)
                if len(copy) > 0:
                    print(copy, "appended")

                    copy += [z]
                    if z == punct[0]:
                        break
                else:
                    print("list is empty")
                    copy += [z]

                print(copy)

        else:
            print("false")

Tags: innoneforifiscolumnelsesentence