如何在python中提取单词后面的字符串？

Ref.16570 Ref. 16570 Referenz 216570 Referenz 01 733 7653 4159-07 4 26 331.12.42.51.01.002 166.0173 AB012012/BB01 Ref. 167.021 PAM00292 14000M L3.642.4.56.6 161.559.50 801 666 753 116400GV Ref.: 231.10.39.21.03.002 3233 Ref: 233.32.41.21.01.002 T081.420.97.057.01 16750 ... almost each line in the example provided contains a certain ID

3条回答

网友

1楼 · 编辑于 2024-09-29 23:27:23

不完全确定是否需要match或extract，但Ref\.?([ \d.]+)将提取Ref之后的任何数字（不区分大小写），即：

import re
result = re.findall(r"Ref\.?([ \d.]+)", subject, re.IGNORECASE | re.MULTILINE)

^{pr2}$

Regex Demo
Python Demo

正则表达式解释

网友

2楼 · 编辑于 2024-09-29 23:27:23

尝试以下代码。它收集Ref之后的所有数据，直到一个预定义的塞子。使用句号是因为问题没有明确定义什么数据是引用（not always the same pattern，might be mixed with，for a human eye there is almost always）。我想需要额外的匹配处理来更准确地提取实际引用。在

import re

ref_re = re.compile('(?P<ref_keyword>Referenz|Ref\.|Ref)[ ]*(?P<ref_value>.*?)(?P<ref_stopper> - | / |,|\n)')

with open('1.txt', mode='r', encoding='UTF-8') as file:
    data = file.read()

for match in ref_re.finditer(data):
    print('key:', match.group('ref_keyword'))
    print('value:', match.group('ref_value'))
    # print('stopper:', match.group('ref_stopper'))

输出从以下行开始：

^{pr2}$

网友

3楼 · 编辑于 2024-09-29 23:27:23

这应该能做到：

import re
str = 'Explorer II Ref.16570 Box'
m = re.match('Ref\.[0-9]+', str)
if m:
    print(m.group(0)[4:])

更多信息：

相关问题更多 >

编程相关推荐

热门问题

热门文章