使用Pandas提取HTML中的复选框值

2024-10-01 17:26:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用Selenium编写了一些爬虫程序,将爬虫数据与pandas组合显示为一个表

我想在输入复选框中提取value属性,但我不知道如何在pandas中提取

我希望得到大家的帮助。多谢各位

<table border="0" cellspacing="0" cellpadding="0" width="800" class="box-list3">    <tbody><tr>
<tr>
<td nowrap = "" class = "listMT3" width = "2%" align = "center"> select all <br> <input type = "checkbox" class = "noinput" onclick = "selectAll ('selectOne', this ) "name =" selectAll "id =" selectAll "> </ td>
<td nowrap = "" width = "3%" class = "listMT3" align = "center"> Aviation <br> Company </ td>
<td class = "listMT3" align = "center"> departure </ td>
<td class = "listMT3" align = "center"> arrived </ td>
<td nowrap = "" class = "listMT3" align = "center"> flights </ td>
<td class = "listMT3" align = "center"> Applicable <br> Class </ td>
<td class = "listMT3" align = "center"> Fare Price </ td>
<td class = "listMT3" align = "center" nowrap = ""> Backpoint </ td>
<td class = "listMT3" align = "center" nowrap = ""> Ration </ td>
<td class = "listMT3" align = "center"> Policy <br>
Type </ td>
<td class = "listMT3" align = "center"> Itinerary <br>
Type </ td>
<td class = "listMT3" align = "center"> Passengers <br>
Type </ td>
<td class = "listMT3" align = "center" nowrap = ""> Ticketing <br> How to </ td>
<td class = "listMT3" width = "15%" align = "center"> Remarks </ td>
<td class = "listMT3" align = "center"> Action </ td>
    <form action = "/ liantuo / manage / newManagePolicy.in" name = "policy" method = "post"> </ form>
    <input type = "hidden" name = "id">
<input type = "hidden" name = "selectIdJson">
     <input type = "hidden" name = "operation">
     <input type = "hidden" name = "multiSelect">
    <input type = "hidden" name = "multiSelectOperation">
    <input type = "hidden" name = "auditingPass">
     
Ranch
</ tr>
<tr onmouseover="DoHL(event);" onmouseout="DoLL(event);" valign="top" class="">
    <td class="listMB3" align="center" id="selectOne1">
        <input type="checkbox" class="noinput" name="selectOne" value="248110357" id="selectOne248110357" plus=""
               passengertype="0" businessunit="0" webpolicycode="" routetype="1" allsubprid="248110357"
               allsubincreaseordecreaseids="">
        <input type="hidden" id="ticketPrice_248110357" value="">
    </td>
    <td class="listMB3" align="center" id="airlinehn1">
        <input type="hidden" id="airline248110357" name="airline248110357" value="QW">
        QW
    </td>
    <td class="listMB3" style="width:85px;" align="center" id="depart1">
        TAO
    </td>
    <td class="listMB3" style="width:85px;" align="center" id="arrival1">
        NTG
    </td>
    <td class="listMB3" align="center" width="60px" id="flight1">
        &nbsp;
    </td>
    <td class="listMB3" align="center" style="width:85px;" id="seatClass1">
        <span id="seatClassSeats1">Z</span>&nbsp;
    </td>
    <td class="listMB3" align="center" style="width:85px;">
        <span id="ticketPrice1"></span>&nbsp;
    </td>
    <td class="listMB3" nowrap="">
        Local:
        <span id="distributorCommisionId1"><font color="red"> 0.0 %</font></span>
        <br>Platform:
        <span id="commonCommisionId1"><font color="red"> 7.3 %</font></span>
    </td>
    <td class="listMB3" nowrap="">
        Local:
        <span id="distributorSolid1"><font color="red">0.0</font></span>
        <br>Platform:
        <span id="commonSolid1"><font color="red">0.0</font></span>
    </td>
    <td class="listMB3" nowrap="">
            <span id="policyTypeId1">
            B2B_ET<br> <span class="red">General policy</span>                          </span>
        <input type="hidden" id="lowPolicy248110357" name="lowPolicy248110357" value="$policy.kryoMap.lowPolicy">
        <br><span class="red" id="policyLowTypeId1"></span>
        <br>
        <span id="officeNo1"></span>
    </td>
    <td class="listMB3" nowrap="">
        one way
    </td>
    <td class="listMB3" nowrap="" align="center" id="passengerTypeId1">
        adult
    </td>
    <td class="listMB3" nowrap="" align="center" id="autoId1">
        manual
    </td>
    <td class="listMB3" id="commentListId1">Valid period:2020-01-01 to 2020-03-28
        <br>Ticketing period: 2020-01-01 to2020-03-28 <br>
    </td>
    <td class="listMB3" align="left" nowrap="" id="operationId1">
        <input type="button" onclick="openEditPolicy(248110357,this,1,'',1,event );sc1()" value=""
               style="background:url(../images/Icon_5.gif) no-repeat;width:45px;height:16px;border:0;cursor:pointer">
        <input type="button" onclick="openEditPolicy(248110357,this,1,'',2,event );sc1()" value=""
               style="background:url(../images/Icon_6.gif) no-repeat;width:45px;height:16px;border:0;cursor:pointer">
        <br>
        <a href="javascript:deleteAll(248110357)" title="delete"><img src="../images/Icon_4.gif" border="0"></a>
        <a href="javascript:cancelAuditingAll(248110357)" title="取消审核"><img
                src="../images/currency-ico/path_icon3_qx.jpg" border="0"></a>
        <br>
        <div>
        </div>
        <div>
        </div>
    </td>
</tr>

这是我现在用表填充模型的代码

        get_flight_detail_elements = etree.HTML(driver.page_source)
        flight_table = get_flight_detail_elements.xpath('//*[@id="container_box"]/div[1]/div[2]/table[6]')
        flight_table = etree.tostring(flight_table[0], encoding='utf-8').decode()
        flight_table_text = pd.read_html(flight_table, encoding='utf-8', header=0)[0]
        remove_table_text = flight_table_text.fillna('')
        #remove_table_text = remove_table_text.loc[:, ~flight_table_text.columns.str.contains('^全选')]
        remove_table_text = remove_table_text.loc[:, ~remove_table_text.columns.str.contains('^操作')]
        remove_table_text = (remove_table_text.to_dict(orient='records'))
        return JsonResponse(data=remove_table_text, safe=False)

没有从“全选”代码中提取输入值

     {
         "Ticketing Method": "Manual",
         "select all": "",
         "Face Price": "",
         "Arrival": "NTG",
         "Departure": "TAO",
         "Remarks": "Valid period: 2020-01-01 to 2020-03-28 Ticketing period: 2020-01-01 to 2020-03-28",
         "Trip type": "One way",
         "Flight": "",
         "Passenger type": "Adult",
         "Airline": "QW",
         "Rebate": "Local: 0.0% Platform: 7.3%",
         "Policy Type": "B2B_ET General Policy",
         "Rated": "Local: 0.0 Platform: 0.0",
         "Applicable Class": "Z"
     },

这是我想要的数据:

value = "248110357"
<input type = "checkbox" class = "noinput" name = "selectOne" value = "248110357" id = "selectOne248110357" plus = "" passengertype = "0" businessunit = "0" webpolicycode = "" routetype = "1" allsubprid = "248110357" allsubincreaseordecreaseids = "">

Tags: textnamebridinputtypetableclass
1条回答
网友
1楼 · 发布于 2024-10-01 17:26:46
        <td class="listMB3" align="center" id="selectOne1">
            <input type="checkbox" class="noinput" name="selectOne" value="248110357" id="selectOne248110357" plus="" passengertype="0" businessunit="0" webpolicycode="" routetype="1" allsubprid="248110357" allsubincreaseordecreaseids="">
                                    <input type="hidden" id="ticketPrice_248110357" value="">
        </td>
        <td class="listMB3" align="center" id="airlinehn1">
                        <input type="hidden" id="airline248110357" name="airline248110357" value="QW">
            QW
        </td>
                <td class="listMB3" style="width:85px;" align="center" id="depart1">
        TAO                                 </td>   

                <td class="listMB3" style="width:85px;" align="center" id="arrival1">
            NTG                             </td>

pandas无法提取input for me value=“248110357”中value属性的值

        <td class="listMB3" align="center" id="selectOne1">
            <input type="checkbox" class="noinput" name="selectOne" value="248110357" id="selectOne248110357" plus="" passengertype="0" businessunit="0" webpolicycode="" routetype="1" allsubprid="248110357" allsubincreaseordecreaseids="">
                                    <input type="hidden" id="ticketPrice_248110357" value="">
        </td>

相关问题 更多 >

    热门问题