在一对单引号之间匹配文本

label = [] txt = open("imagenet1000_clsid_to_human.txt").readlines() # print(str(txt)) p = re.compile(r"'(.*?)'") # print(txt) for i in range(len(txt)): # print(txt[i]) # print('\n') m = p.match(txt[i]) if m: lis = list(m.group())[:-1] s = ''.join(lis) print(s) label.append(s)

3条回答

网友

1楼 · 编辑于 2024-09-28 05:26:31

这样做有效：

import re
re.findall(r"'(.*?)'", txt)

此正则表达式链接：

https://regex101.com/r/QP8omt/1

网友

2楼 · 编辑于 2024-09-28 05:26:31

主要问题是您应该使用re.search()，而不是re.match()。re.match()匹配从字符串开头开始的模式，在模式开头有一个隐含的^。你知道吗

明智的做法是使用原始字符串来重新填充图案，而括号中的内容太多了：

import re

txt = "998: 'ear, spike, capitulum', 999: 'toilet tissue, toilet paper, bathroom tissue'"

p = re.compile(r"'(.*?)'")
m = p.search(txt)
print(m.groups())

提供：

('ear, spike, capitulum',)

网友

3楼 · 编辑于 2024-09-28 05:26:31

不是所有的事情都需要通过regex完成。你知道吗

label = []

with open("imagenet1000_clsid_to_human.txt", 'r', encoding='utf8') as f:
    for line in f:
        parts = line.split("'")
        if len(parts) == 3:
            label.append(parts[1])

旁注：总是打开带有特定编码的文本文件。如果您不确定文件的编码方式，那么Python也是如此。没有魔法编码检测，您不应该依赖Python的默认值。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章