我试图从所有站点中刮取一张表,但我停止了,因为我只刮取了一个单元格,而a不知道我的问题出在哪里。我需要从表that how i need中的所有行中删除前两个单元格。以及如何将此代码手动修改到其他表
from bs4 import BeautifulSoup
import requests
URL_TO = 'https://en.wikipedia.org/wiki/Rammstein_discography'
response = requests.get(URL_TO)
soup = BeautifulSoup(response.text,'html.parser')
soup.prettify()
table = soup.find("table", { "class" : "wikitable plainrowheaders" })
for row in table.findAll("tr"):
cells = row.findAll("td")
bells = row.findAll("th")
print(cells, bells)
我的输出:
[<td>
<ul><li>Released: 17 May 2019</li>
<li>Label: Universal</li>
<li>Format: CD, LP, DL</li></ul>
</td>, <td>1</td>, <td>5</td>, <td>1</td>, <td>1</td>, <td>1</td>, <td>1</td>, <td>1</td>, <td>1</td>, <td>1</td>, <td>2</td>, <td>1</td>, <td>3</td>, <td>9
</td>, <td>
<ul><li>FRA: 50,000 <sup class="reference" id="cite_ref-chartsinfrance_45-0"><a href="#cite_note-chartsinfrance-45">[45]</a></sup></li>
<li>GER: 260,000<sup class="reference" id="cite_ref-chartsinfrance_45-1"><a href="#cite_note-chartsinfrance-45">[45]</a></sup></li>
<li>US: 25,000<sup class="reference" id="cite_ref-46"><a href="#cite_note-46">[46]</a></sup></li>
<li>WW: 900,000<sup class="reference" id="cite_ref-47"><a href="#cite_note-47">[47]</a></sup></li></ul>
</td>, <td>
<ul><li>BVMI: 5× Gold<sup class="reference" id="cite_ref-musikindustrie_23-4"><a href="#cite_note-musikindustrie-23">[23]</a></sup></li>
<li>BEL: Gold<sup class="reference" id="cite_ref-48"><a href="#cite_note-48">[48]</a></sup></li>
<li>SNEP: Gold<sup class="reference" id="cite_ref-snep_44-1"><a href="#cite_note-snep-44">[44]</a></sup></li>
<li>IFPI AUT: 2× Platinum<sup class="reference" id="cite_ref-IFPIAUT_30-4"><a href="#cite_note-IFPIAUT-30">[30]</a></sup></li></ul>
</td>] [<th scope="row"><a href="/wiki/Untitled_Rammstein_album" title="Untitled Rammstein album">Untitled</a>
</th>]
我需要:
[Herzeleid]
[Released: 24 September 1995
Label: Motor, Slash
Format: CD, CS, LP, DL]
您可以使用
pandas
来执行表刮取上面的0表示第一条记录
Herzeleid
Out[26]: 'Released: 24 September 1995 Label: Motor, Slash Format: CD, CS, LP, DL'
您可以使用
相关问题 更多 >
编程相关推荐