python2.7，xml，beautifulsoup4:仅返回匹配的父标记

soup = bs(response, "xml") messages = soup.find_all('Message') for message in messages: hotel_code = message.get('HotelCode') reservations = message.find_all('HotelReservation') for reservation in reservations: uniqueid_id = reservation.UniqueID.get('ID') uniqueid_idcontext = reservation.UniqueID.get('ID_Context') roomstays = reservation.find_all('RoomStay') for roomstay in roomstays: total = roomstay.Total

1条回答

网友

1楼 · 发布于 2024-09-28 18:50:39

There can also sometimes be multiple Rate\Rates tags, so I can't just ask it to give me the 2nd "Total" tag.

为什么不迭代所有Total标记，跳过那些没有Taxes子标记的标记呢？你知道吗

reservations = message.find_all('HotelReservation')
for reservation in reservations:
    totals = reservation.find_all('Total')
    for total in totals:
        if total.find('Taxes'):
             # do stuff
        else:
             # these aren't the totals you're looking for

如果您更普遍地希望消除那些没有子节点的节点，可以执行以下任一操作：

if next(total.children, None):
    # it's a parent of something

if total.contents:
    # it's a parent of something

或者你可以use a function instead of a string as your filter：

total = reservation.find(lambda node: node.name == 'Total' and node.contents)

或者你可以用其他方法来定位这个标签：它是RoomStay的直接子代，而不仅仅是子代；它不是Rate的子代；它是RoomStay下的最后一个Taxes子代；等等。所有这些都可以很容易地完成。你知道吗

也就是说，这似乎是XPath的完美工作，BeautifulSoup不支持，但ElementTree和lxml支持

相关问题更多 >

编程相关推荐

热门问题

热门文章