没有从PyPDF2上的regex接收到正确的PDF格式

#First Change Current Working Directory to desktop import os os.chdir('/Users/Hussein/Desktop') #File is located on Desktop #Second is the PyPDF2 pdfFileObj=open('TEST1.pdf','rb') #Opening the File pdfReader=PyPDF2.PdfFileReader(pdfFileObj) pageObj=pdfReader.getPage(3) #For the test I only need page 3 TextVersion=pageObj.extractText() print(TextVersion) #Third is the Regular Expression import re match=re.findall(r'math',TextVersion) for match in TextVersion: print(match)

2条回答

网友

1楼 · 编辑于 2024-10-01 04:45:49

TextVersion变量保存文本。当您将它用于for循环时，它将一次为文本提供一个字符，如您所见。findall函数将返回一个匹配列表，因此，如果您将其用于for循环，您将得到每个单词（在测试中都是相同的）。在

import re

for match in re.findall(r'math',TextVersion):
      print(match)

findall返回的结果如下：

^{pr2}$

因此，您的输出将是：

math
math
math

网友

2楼 · 编辑于 2024-10-01 04:45:49

实际上，您正在迭代TextVersion变量的值。必须遍历re.findall返回的列表。在

所以你的for循环必须是

match=re.findall(r'math',TextVersion)
for i in match:
    print(i)

相关问题更多 >

编程相关推荐

热门问题

热门文章