擅长:python、mysql、java
<p><a href="http://infohost.nmt.edu/~shipman/soft/pylxml/web/Element-findtext.html" rel="nofollow noreferrer">This tutorial</a>帮助我完成了类似的任务:</p>
<p>每次迭代都会找到一个名为“id”或“text”的标记。如果找不到标记,则返回字符串“None”。一次迭代的结果将被追加到一个列表中,允许我们以类似于数据帧的格式打印该列表。</p>
<pre><code>import lxml
import lxml.etree as ET
# Initialise a list to append results to
list_of_results = []
# Loop through the pages to search for text
for page in root:
id = page.findtext('id', default = 'None')
text = page.findtext('text', default = 'None')
list_of_results.append([id, text])
# Print list
list_of_results
</code></pre>
<p>结果:</p>
<pre><code>[['1', 'hello world!@'], ['1', 'hello world']]
</code></pre>
<p>如果只想打印文本,只需删除id行即可。</p>