擅长:python、mysql、java
<pre><code>import re
import PyPDF2
pdfFileObj = open('E://drive-download-20171015T225604Z-001/test_case/test2/try/xyz.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
print("Number of pages:-"+str(pdfReader.numPages))
num = pdfReader.numPages
i =0
while(i<num):
pageObj = pdfReader.getPage(i)
text=pageObj.extractText()
text1 = text.lower()
for line in text1:
if(re.search("abc",line)):
print(line)
i= i+1
</code></pre>
<p>我使用它来逐页迭代pdf,并在其中搜索关键术语并进一步处理。</p>