包含elsewh的隔离字符串

2024-06-25 05:25:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在设置一个脚本,根据文件名中包含的文本合并PDF。我这里的问题是,“小提琴I”也包含在“小提琴II”中,“中音萨克斯管I”也包含在“中音萨克斯管II”中。我如何设置它,使圣殿骑士将只包含条目从“小提琴一”和排除“小提琴二”,反之亦然?你知道吗

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin II.pdf",  ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Baritone Saxophone"]


# create arrays for each instrument that can be used for merging/organization
def organizer():
    for fileName in pdfList:
        for instrument in instruments:
            tempList = []
            if instrument in fileName:
                tempList.append(fileName)
        print tempList


print pdfList
organizer()

Tags: inforpdffilenameiiorganizerinstrumentinstruments
2条回答

尝试进行以下更改:

...
if instrument+'.pdf' in fileName:
...

这能涵盖所有情况吗?你知道吗

避免包含子字符串的一种方法是使用正则表达式,如:

import re

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin \
II.pdf",  ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "\
Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Barit\
one Saxophone"]

# create arrays for each instrument that can be used for merging/organization   
def organizer():
    for fileName in pdfList:
        tempList = []
        for instrument in instruments:
            if re.search(r'\b{}\b'.format(instrument), fileName):
                tempList.append(fileName)
        print tempList

print pdfList
organizer()

这将用\b包装您的搜索词,以便仅当开头和结尾在单词边界上时匹配。另外,也许很明显,但值得指出的是,这也会使你的工具名成为regex的一部分,所以要注意,如果你使用的任何字符也是regex元字符,它们将被解释为这样(现在你不是)。一个更通用的方案需要一些代码来找到并正确地转义这些字符。你知道吗

相关问题 更多 >