回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>这是xml文件
<a href="http://www.diveintopython3.net/examples/feed.xml" rel="nofollow noreferrer">http://www.diveintopython3.net/examples/feed.xml</a></p>
<p>我的python代码:</p>
<pre><code>from lxml import etree
def lxml():
tree = etree.parse('feed.xml')
NSMAP = {"nn":"http://www.w3.org/2005/Atom"}
test = tree.xpath('//nn:category[@term="html"]/..',namespaces=NSMAP)
for elem in tree.iter():
print(elem.tag,'\t',elem.attrib)
print('-------------------------------')
test1 = tree.xpath('//nn:category',namespaces=NSMAP)
print('++++++++++++++++++++++++++++++++')
for node in test1:
test2 = node.xpath('./../nn:summary',namespaces=NSMAP) # return a list
print(test2.xpath('normalize-space(.)'))
print('*****************************************')
test3 = tree.xpath('//text()[normalize-space(.)]')# [normalize-space()] only remove the heading and tailing
print(test3)
</code></pre>
<p>输出为:。。你知道吗</p>
<pre><code>++++++++++++++++++++++++++++++++
['Putting an entire chapter on one page sounds\n bloated, but consider this &mdash; my longest chapter so far\n would be 75 printed pages, and it loads in under 5 seconds&hellip;\n On dialup.']
['Putting an entire chapter on one page sounds\n bloated, but consider this &mdash; my longest chapter so far\n would be 75 printed pages, and it loads in under 5 seconds&hellip;\n On dialup.']
['Putting an entire chapter on one page sounds\n bloated, but consider this &mdash; my longest chapter so far\n would be 75 printed pages, and it loads in under 5 seconds&hellip;\n On dialup.']
['The accessibility orthodoxy does not permit people to\n question the value of features that are rarely useful and rarely used.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
['These notes will eventually become part of a\n tech talk on video encoding.']
*****************************************
['\n ', 'dive into mark', '\n ', 'currently between addictions', '\n ', 'tag:diveintomark.org,2001-07-29:/', '\n ', '2009-03-27T21:56:07Z', '\n ', '\n ', '\n ', '\n ', '\n ', 'Mark', '\n ', 'http://diveintomark.org/', '\n ', '\n ', 'Dive into history, 2009 edition', '\n ', '\n ', 'tag:diveintomark.org,2009-03-27:/archives/20090327172042', '\n ', '2009-03-27T21:56:07Z', '\n ', '2009-03-27T17:20:42Z', '\n ', '\n ', '\n ', '\n ', 'Putting an entire chapter on one page sounds\n bloated, but consider this &mdash; my longest chapter so far\n would be 75 printed pages, and it loads in under 5 seconds&hellip;\n On dialup.', '\n ', '\n ', '\n ', '\n ', 'Mark', '\n ', 'http://diveintomark.org/', '\n ', '\n ', 'Accessibility is a harsh mistress', '\n ', '\n ', 'tag:diveintomark.org,2009-03-21:/archives/20090321200928', '\n ', '2009-03-22T01:05:37Z', '\n ', '2009-03-21T20:09:28Z', '\n ', '\n ', 'The accessibility orthodoxy does not permit people to\n question the value of features that are rarely useful and rarely used.', '\n ', '\n ', '\n ', '\n ', 'Mark', '\n ', '\n ', 'A gentle introduction to video encoding, part 1: container formats', '\n ', '\n ', 'tag:diveintomark.org,2008-12-18:/archives/20081218155422', '\n ', '2009-01-11T19:39:22Z', '\n ', '2008-12-18T15:54:22Z', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', '\n ', 'These notes will eventually become part of a\n tech talk on video encoding.', '\n ', '\n']..
</code></pre>
<p>我的问题是为什么有这么多'\n'。如何删除它们?你知道吗</p>
<p>另外一个问题是如何直接查询文本的标签,比如make获取“Mark”节点(条目文本的子节点)。你知道吗</p>
<p>非常感谢</p>