Python:如何解析从tab得到的字符串片段

2024-09-28 21:04:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我不知道如何从桌子上取绳子。你知道吗

以下是html td:

<tr>
<td class="">1</td><td><a 
 href="http://www.canalturf.com/courses_fiche_cheval.php? 
 idcheval=227579&amp;idcourse=173937" target="_blank" title="La fiche du 
 cheval EQUEMAUVILLE "><strong>EQUEMAUVILLE  (H4)</strong><br> 
 <small>4s3s1s6s5h(17)2h2h3h7h</small><div class="pedigree hidden"> 
 <small>SAINT DES SAINTS - MISS ACADEMY</small></div></a>
</td>
<td><div style="width:30px; height:35px; overflow:hidden"><img 
 src="http://www.canalturf.com/interface/casaques/173937.png" 
 style="width:100%; position:relative; top:0px"></div>
</td>
<td><a href="http://www.canalturf.com/courses_fiche_jockey.php? 
idjockey=3080&amp;date=2018-08-04" target="_blank" title="La fiche du 
 jockey/driver D. GALLON"><strong>D. GALLON</strong></a><br><a 
 href="http://www.canalturf.com/courses_fiche_entraineur.php? 
 identraineur=171&amp;date=2018-08-04" target="_blank" title="La fiche de 
 l'entraineur F.NICOLLE"><small>F.NICOLLE</small></a></td><td>71.0 kg
</td>
<td 
 class="text-center bord-lft">9</td><td class="text-center bord-lft text- 
 success">-40%</td><td class="text-center bord-lft"><a 
 href="https://eule1.pmu.fr/dynclick/pmu/?eaf- 
 publisher=ACQHIPPIQUECANALTURF_CANALTURF&amp;eaf- 
 name=ACQHIPPIQUECANALTURF_CANALTURF_2010_WEB_AFF_FILROUGE&amp;eaf- 
 creative=ACQ_H_DESKTOP_ETIRELIRE_BANNIERE&amp;eaf- 
 creativetype=BANNIERE&amp;eseg-name=ia-affilie&amp;eseg- 
 item=a_Canalturfb_TEXTEc_aid&amp; 
 mediaplan=2010_WEB_AFF_FILROUGE&amp;eurl=https%3A%2F%2Fwww.Fturf%2Fouver 
 ture-compte%2Fstandard%2F%3F2%26hippique- 
 tirelire%26ns_mchannel%3DAFF%26ns_source%3DACQHIPPIQUECANALTURF_CANALTURF" 
 target="_blank" onclick="handleOutboundLinkClicks('PMUClic', 'pmuCote', 
 '1');">5.4</a></td><td class="text-center bord-lft"><a 
 href="https://www.zeturf.fr/fr/inscription?
 pid=88&amp;affutm_source=Affiliation&amp;utm_medium=Canalturf&amp;u 
 tm_campaign=ZT_FR_Affiliation_Filrouge_Logo_2018" target="_blank" 
 onclick="handleOutboundLinkClicks('ZTClic', 'ztCote', '1');">5</a></td><td 
 class="text-center bord-lft"><a 
 href="http://wlbetclicfr.adsrv.eacdn.com/C.ashx? 
 btag=a_920b_260c_&amp;affid=590&amp;siteid=920&amp;adid=260&amp;c=turf" 
 target="_blank" onclick="handleOutboundLinkClicks('BTClic', 'btCote', 
 '1');">-- 
 </a>
</td>
<td class="text-center bord-lft"><a 
 href="http://media.unibet.fr/redirect.aspx? 
 pid=32884&amp;bid=2223" target="_blank" 
 onclick="handleOutboundLinkClicks('UNClic', 'unCote', '1');">4.8</a>
</td>

以下是我得到的值:

1,EQUEMAUVILLE  (H4)4s3s1s6s5h(17)2h2h3h7hSAINT DES SAINTS - MISS ACADEMY,AP,F. OUVRIEJ. FOIN,2700m,47,+28%,60,67.6,67.78,--

以下是我想要的:

1,EQUEMAUVILLE,H4,4s3s1s6s5h(17)2h2h3h7h,SAINT DES SAINTS,MISS ACADEMY,AP,F. OUVRIEJ. FOIN,2700m,47,+28%,60,67.6,67.78,--

我得到这样的值:

table = soup2.find("table", attrs={"id":"TablePartants"})

headers = [th.text for th in table.select("tr th")]

with open("out.csv", "w") as f:
  wr = csv.writer(f,lineterminator = '\n')
  #wr.writerow(headers)
  wr.writerows([[td.text for td in row.find_all("td")] for row in table.select("tr")])

我必须用正则表达式吗?或者我可以先用特殊的html标记解析它们吗? 我有点困惑。你知道吗

如果有人能帮助我,我将非常感激。 谢谢。你知道吗


Tags: textcomhttptargetwwwclasstdsmall