使用python和xpath选择多个值

<div class="members"> <h2>Members</h2> <div class="member"> <span title="Last Online: 2017-02-20 22:37:42" data-time="2017-02-20T22:37:42Z"> <span class="profile-link"> <a href="/account/view-profile/KonterBolet"> <img class="achievement" src="36.png" alt="Completed 36" title="Completed 36">KonterA</a> </span> <span class="memberType">Leader</span> </span> </div> <div class="member"> <span title="Last Online: 2017-02-19 11:28:20" data-time="2017-02-19T11:28:20Z"> <span class="profile-link hasTwitch twitchOffline" data-twitch-user="mardok_tv"> <a href="/account/view-profile/mardok"> <img class="achievement" src="35.png" alt="Completed 35" title="Completed 35">mardok</a> <a class="twitch" href="//www.twitch.tv/mardok_tv" target="_blank" title="Offline"></a> </span> <span class="memberType">Officer</span> </span> </div> </div>

2条回答

网友

1楼 · 编辑于 2024-10-01 04:52:53

考虑分析同一父级下的所有项，在顶层xpath上迭代。如果不存在attrib/element值，则使用XPath的concat()返回一个空长度的字符串''。下面还使用XPath的normalize-space()从值中删除换行符和回车符。在

# PARSING POSTED SNIPPET AS STRING
webContent = html.fromstring(htmlstr)

# INITIALIZING LISTS
acc = []; twitch = []; lastOnline = []

# ITERATING THROUGH SECOND CHILD <SPAN>
for i in webContent.xpath("//span/span[1]"):    
    acc.append(i.xpath("concat(normalize-space(a[contains(@href,'account/view-profile')]),'')"))
    twitch.append(i.xpath("concat(@data-twitch-user, '')"))
    lastOnline.append(i.xpath("concat(../@data-time, '')"))

# ZIP EQUAL LENGTH LISTS
xpath_list = list(zip(acc, twitch, lastOnline))

print(xpath_list)
# [('KonterA', '', '2017-02-20T22:37:42Z'), ('mardok', 'mardok_tv', '2017-02-19T11:28:20Z')]

网友

2楼 · 编辑于 2024-10-01 04:52:53

我们叫它们first_list, second_list and third_list。将second_list修改为：

second_list = [ i if i.strip("_tv") in first_list else "" for i in second_list ]

之后，请执行以下操作：

^{pr2}$

这应该以同样的方式给您一个元组列表。在

[('konterA','','2017-02-20T22:37:42Z'),('mardok','mardok_tv','2017-02-19T11:28:20Z')]

相关问题更多 >

编程相关推荐

热门问题

热门文章