擅长:python、mysql、java
<p>考虑分析同一父级下的所有项,在顶层<code>xpath</code>上迭代。如果不存在attrib/element值,则使用XPath的<code>concat()</code>返回一个空长度的字符串<code>''</code>。下面还使用XPath的<code>normalize-space()</code>从值中删除换行符和回车符。在</p>
<pre><code># PARSING POSTED SNIPPET AS STRING
webContent = html.fromstring(htmlstr)
# INITIALIZING LISTS
acc = []; twitch = []; lastOnline = []
# ITERATING THROUGH SECOND CHILD <SPAN>
for i in webContent.xpath("//span/span[1]"):
acc.append(i.xpath("concat(normalize-space(a[contains(@href,'account/view-profile')]),'')"))
twitch.append(i.xpath("concat(@data-twitch-user, '')"))
lastOnline.append(i.xpath("concat(../@data-time, '')"))
# ZIP EQUAL LENGTH LISTS
xpath_list = list(zip(acc, twitch, lastOnline))
print(xpath_list)
# [('KonterA', '', '2017-02-20T22:37:42Z'), ('mardok', 'mardok_tv', '2017-02-19T11:28:20Z')]
</code></pre>