字符串行在位置n=4上匹配，需要从位置n+2提取值

with pdfplumber.open(file) as pdf: pages = pdf.pages for page in pdf.pages: text = page.extract_text() for i, line in enumerate(text.split('\n')): print(i, line) elif re.match(r"Error\s*:", line): tot = line.split() # how can I get line on position i+2

3条回答

网友

1楼 · 编辑于 2024-09-28 05:21:16

用.split('\n')提出的方法在大文件（或无限流）上不起作用

因为你会把一切都记在记忆里

正确的方法是：

import itertools

def pairwise_with_offset(iterable, offset: int):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    [next(b, None) for _ in range(offset)]
    return zip(a, b)

您可以在此处找到更多信息： https://stackoverflow.com/a/5434936/8933502

请学会使用正确的方法，即使你的PDF库没有优化。因为您可能会一次又一次地重复使用相同的方法，但将来可能会使用类似于文件的对象（或任何iterable）

网友

2楼 · 编辑于 2024-09-28 05:21:16

当您找到包含Error的行时，您知道包含该值的行号是当前行号i加上2

因此，将该行号存储在变量中，并在迭代时检查当前行号是否等于该行号。如果当前行号是您以前存储的行号，则会得到以下值：

value_line = None  # initialize with a value that is not a valid line number

for i, line in enumerate(text.split('\n')):
    if re.match(r"Error\s*:", line):
        value_line = i + 2
    if i == value_line:  # this will happen in a later iteration
        print(line)      # this is the line containing the value

或者，事先收集列表中的所有行。然后，您可以直接从列表中访问所需的行，无需不断迭代：

lines = text.split('\n')

for i, line in enumerate(lines):
    if re.match(r"Error\s*:", line):
        print(lines[i + 2])
        break  # found the value, can stop iterating

当然，不必打印包含该值的行，您可以使用它执行其他操作，例如拆分它并将第一项转换为整数

网友

3楼 · 编辑于 2024-09-28 05:21:16

因为“行”是一个列表，你可以在列表上输入，检查项目是否存在，从这一点你可以得到计数+1项目

import re
# Using readlines()
file1 = open('file.txt', 'r')
Lines = file1.readlines()
 
count = 0
# Strips the newline character
for line in Lines:
    count += 1
    if "Error" in line:
        print(Lines[count+1])

相关问题更多 >

编程相关推荐

热门问题

热门文章