如果找到数据,则按同级删除带有div类的表

2024-05-18 16:17:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我想创建一个html表,其中包含<div class="...">格式的元素。我想我需要使用:

if found driver.find_element_by_xpath contains(footable-row-detail-name)
get value from /following-sibling which is (class="footable-row-detail-value")

这只是一张桌子。我正在抓取的站点有很多表,有些表没有所有的数据(这就是为什么“如果找到”的原因)

我想用python3来实现这一点。 我希望我解释得很好。一个表的HTML代码:

<div class="footable-row-detail-inner">
<div class="footable-row-detail-row">
    <div class="footable-row-detail-name">
        Discipline(s) thérapeutique(s):
    </div>
    <div class="footable-row-detail-value">
        197. Omeopatia, 202. Linfodrenaggio manuale, 205. Massaggio classico, 664. Riflessoterapia generale
    </div>
</div>
<div class="footable-row-detail-row">
    <div class="footable-row-detail-name">
        Cognome:
    </div>
    <div class="footable-row-detail-value">
        ABBONDANZIERI Katia
    </div>
</div>
<div class="footable-row-detail-row">
    <div class="footable-row-detail-name">
        Via:
    </div>
    <div class="footable-row-detail-value">
        Place du Cirque, 2
    </div>
</div>
<div class="footable-row-detail-row">
    <div class="footable-row-detail-name">
        NPA:
    </div>
    <div class="footable-row-detail-value">
        1204
    </div>
</div>
<div class="footable-row-detail-row">
    <div class="footable-row-detail-name">
        Luogo:
    </div>
    <div class="footable-row-detail-value">
        Genève
    </div>
</div>
<div class="footable-row-detail-row">
    <div class="footable-row-detail-name">
        Tel / Cellulare:
    </div>
    <div class="footable-row-detail-value">
        022 328 23 44
    </div>
</div>
<div class="footable-row-detail-row">
    <div class="footable-row-detail-name">
        Cellulare:
    </div>
    <div class="footable-row-detail-value">
        079 601 92 75
    </div>
</div>
<div class="footable-row-detail-row">
    <div class="footable-row-detail-name">
        Discipline(s) thérapeutique(s):
    </div>
    <div class="footable-row-detail-value">
        <div class="thZone">
            <div class="zCat">
                METHODES DE MASSAGE
            </div>
            <div class="zThr">
                Linfodrenaggio manuale
            </div>
            <div class="zThr">
                Massaggio classico
            </div>
            <div class="zCat">
                METHODES PRESCRIPTIVES
            </div>
            <div class="zThr">
                Omeopatia
            </div>
            <div class="zCat">
                METHODES REFLEXES
            </div>
            <div class="zThr">
                Riflessoterapia generale
            </div>
        </div>
    </div>
</div>

感谢您的帮助。你知道吗


Tags: namedivvalueclassrowdetailthdiscipline
2条回答

使用python3的一个解决方案是html.parser模块!你知道吗

有一个简单的例子可以让你开始:)

这是给我的。我用的是jupyter,一行一行地运行。当元素尚未加载时,您可能会遇到错误,因此如果发生错误,请进行调整。你知道吗

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd


driver = webdriver.Chrome()

driver.get("http://asca.ch/Partners.aspx?lang=it")

cantone = driver.find_element_by_xpath("""//*[@id="ctl00_MainContent_ddl_cantons_Input"]""")

cantone.click()

cantone.send_keys('GE')

cantone.send_keys(Keys.ENTER)

confermo = driver.find_element_by_xpath("""//*[@id="MainContent__chkDisclaimer"]""")

confermo.click()

ricera = driver.find_element_by_xpath("""//*[@id="MainContent_btn_submit"]""")

ricera.click()

toggle = driver.find_elements_by_class_name("""footable-toggle""")
print(toggle)
while not toggle:
    time.sleep(.2)
    toggle = driver.find_elements_by_class_name("""footable-toggle""")

for r in toggle:
    time.sleep(.2)
    r.click()

data = driver.find_elements_by_class_name("""footable-row-detail-cell""")

while not data:
    time.sleep(.2)
    data = driver.find_elements_by_class_name("""footable-row-detail-cell""")

list_df = []
for r in data:
    ratum = r.get_attribute('innerHTML')
    datum = r.get_attribute('innerHTML')\
        .replace("""<div class="footable-row-detail-inner">""","<table>")\
        .replace("""<div class="footable-row-detail-row">""","<tr>")\
        .replace("""<div class="footable-row-detail-name">""","<td>")\
        .replace("""<div class="footable-row-detail-value">""","</td><td>")
    list_df.append(dict(pd.read_html(datum)[0].values.tolist()))

df = pd.DataFrame(list_df)
df.to_csv('data.csv')
print(df)

相关问题 更多 >

    热门问题