擅长:python、mysql、java
<p>谢谢你的信息,真的很有帮助。下面是我逐页检索内容的代码(有点脏,但它可以工作):</p>
<blockquote>
<pre><code> raw_xml = parser.from_file(file, xmlContent=True)
body = raw_xml['content'].split('<body>')[1].split('</body>')[0]
body_without_tag = body.replace("<p>", "").replace("</p>", "").replace("<div>", "").replace("</div>","").replace("<p />","")
text_pages = body_without_tag.split("""<div class="page">""")[1:]
num_pages = len(text_pages)
if num_pages==int(raw_xml['metadata']['xmpTPg:NPages']) : #check if it worked correctly
return text_pages
</code></pre>
</blockquote>