用靓汤解析,得到不同级别的节点

2024-09-25 00:25:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图得到delititle,然后在delititle下得到两个菜单项Made to Order Deli Core和{}?我用美丽的汤4来做这个,但它不起作用。主菜时代也是如此吗?在

<html>
<head>
    <title></title>
</head>

<body>
    <table class="dayinner">
        <tr class="lun">
            <td class="mealname" colspan="3">LUNCH</td>
        </tr>

        <tr class="lun">
            <td class="station">&nbsp;Deli</td>

            <td class="menuitem">
                <div class="menuitem">
                    <input class="chk" id="S1L0000010000047598_35356" onclick=
                    "rptlist(this);" onmouseout="wschk(0);" onmouseover=
                    "wschk(1);" type="checkbox"> <span class="ul" onclick=
                    "nf('0000047598_35356');" onmouseout="pcls(this);"
                    onmouseover="ws(this);">Made to Order Deli Core</span>
                </div>
            </td>

            <td class="price"></td>
        </tr>

        <tr class="lun">
            <td class="station">&nbsp;</td>

            <td class="menuitem">
                <div class="menuitem">
                    <input class="chk" id="S1L0000020000047933_06835" onclick=
                    "rptlist(this);" onmouseout="wschk(0);" onmouseover=
                    "wschk(1);" type="checkbox"> <span class="ul" onclick=
                    "nf('0000047933_06835');" onmouseout="pcls(this);"
                    onmouseover="ws(this);">Turkey Chipotle Petite Wrap</span>
                </div>
            </td>

            <td class="price"></td>
        </tr>

        <tr class="lun">
            <td colspan="3" style="height:3px;"></td>
        </tr>

        <tr class="lun">
            <td colspan="3" style="background-color:#c0c0c0; height:1px;"></td>
        </tr>

        <tr class="lun">
            <td class="station">&nbsp;Entrée</td>

            <td class="menuitem">
                <div class="menuitem"><input class="chk" id=
                "S1L0000030000044794_08943" onclick="rptlist(this);"
                onmouseout="wschk(0);" onmouseover="wschk(1);" type="checkbox">
                <span class="ul" onclick="nf('0000044794_08943');" onmouseout=
                "pcls(this);" onmouseover="ws(this);">Steamed
                Corn</span><img alt="Vegan" class="icon" src=
                "images/g_062.gif"><img alt="Mindful Item" class="icon" src=
                "images/m_051.gif"></div>
            </td>

            <td class="price"></td>
        </tr>

        <tr class="lun">
            <td class="station">&nbsp;</td>

            <td class="menuitem">
                <div class="menuitem">
                    <input class="chk" id="S1L0000040000033087_22244" onclick=
                    "rptlist(this);" onmouseout="wschk(0);" onmouseover=
                    "wschk(1);" type="checkbox"> <span class="ul" onclick=
                    "nf('0000033087_22244');" onmouseout="pcls(this);"
                    onmouseover="ws(this);">Cuban Mojo Roasted Pork Loin</span>
                </div>
            </td>

            <td class="price"></td>
        </tr>
    </table>
</body>
</html>

或者,如果我能把它转换成这样的XML格式:

^{pr2}$

提前非常感谢你,我真的很感谢你抽出时间来帮助我。在


Tags: divinputthistrclasstdspanstation
2条回答

实际上,我使用了beautiful soup和element tree(用于xml解析) 获取<span>中的所有元素

# -*- coding: UTF-8 -*-

from bs4 import *
import xml.etree.ElementTree as ET

html='''<html>
<head>
    <title></title>
</head>

<body>
    <table class="dayinner">
        <tr class="lun">
            <td class="mealname" colspan="3">LUNCH</td>
        </tr>

        <tr class="lun">
            <td class="station">&nbsp;Deli</td>

            <td class="menuitem">
                <div class="menuitem">
                    <input class="chk" id="S1L0000010000047598_35356" onclick=
                    "rptlist(this);" onmouseout="wschk(0);" onmouseover=
                    "wschk(1);" type="checkbox"> <span class="ul" onclick=
                    "nf('0000047598_35356');" onmouseout="pcls(this);"
                    onmouseover="ws(this);">Made to Order Deli Core</span>
                </div>
            </td>

            <td class="price"></td>
        </tr>

        <tr class="lun">
            <td class="station">&nbsp;</td>

            <td class="menuitem">
                <div class="menuitem">
                    <input class="chk" id="S1L0000020000047933_06835" onclick=
                    "rptlist(this);" onmouseout="wschk(0);" onmouseover=
                    "wschk(1);" type="checkbox"> <span class="ul" onclick=
                    "nf('0000047933_06835');" onmouseout="pcls(this);"
                    onmouseover="ws(this);">Turkey Chipotle Petite Wrap</span>
                </div>
            </td>

            <td class="price"></td>
        </tr>

        <tr class="lun">
            <td colspan="3" style="height:3px;"></td>
        </tr>

        <tr class="lun">
            <td colspan="3" style="background-color:#c0c0c0; height:1px;"></td>
        </tr>

        <tr class="lun">
            <td class="station">&nbsp;Entrée</td>

            <td class="menuitem">
                <div class="menuitem"><input class="chk" id=
                "S1L0000030000044794_08943" onclick="rptlist(this);"
                onmouseout="wschk(0);" onmouseover="wschk(1);" type="checkbox">
                <span class="ul" onclick="nf('0000044794_08943');" onmouseout=
                "pcls(this);" onmouseover="ws(this);">Steamed
                Corn</span><img alt="Vegan" class="icon" src=
                "images/g_062.gif"><img alt="Mindful Item" class="icon" src=
                "images/m_051.gif"></div>
            </td>

            <td class="price"></td>
        </tr>

        <tr class="lun">
            <td class="station">&nbsp;</td>

            <td class="menuitem">
                <div class="menuitem">
                    <input class="chk" id="S1L0000040000033087_22244" onclick=
                    "rptlist(this);" onmouseout="wschk(0);" onmouseover=
                    "wschk(1);" type="checkbox"> <span class="ul" onclick=
                    "nf('0000033087_22244');" onmouseout="pcls(this);"
                    onmouseover="ws(this);">Cuban Mojo Roasted Pork Loin</span>
                </div>
            </td>

            <td class="price"></td>
        </tr>
    </table>
</body>
</html> '''

soup = BeautifulSoup(html)

counter = ET.Element('counter')
counter.set("name", "#Deli")





for i in soup.findAll('span'):
    dish = ET.SubElement(counter, 'dish')
    name = ET.SubElement(dish, 'name')
    name.text= i.text.replace('\n',' ')

print ET.dump(counter)

你可以这样做:

# -*- coding: utf-8 -*-

soup = BeautifulSoup(html)
title = soup.find('td', class_='station').text.strip()

spans = soup.find_all('span', class_='ul')

# create the root of the XML file
root = ET.Element("counter")
root.set("name", title)

for item in spans:
    # retrieve the text inside the <td class="station">
    text = list(list(item.parents)[2].previous_siblings)[1].text.strip()
    if text == u'Entrée':
        break

    dish = ET.SubElement(root, 'dish')
    name = ET.SubElement(dish, 'name')
    name.text = item.text.rstrip()

tree = ET.ElementTree(root)
tree.write("filename.xml")

这是所需xml文件的内容:

^{pr2}$

非常重要的是在文件的开头包含下面的# -*- coding: utf-8 -*-行,以避免重音问题,有关详细信息,请参见SyntaxError: Non-ASCII character '\xa3' in file when function returns '£'。在

相关问题 更多 >