我已经解析了下面的string
来美化组以从中提取数据,但是我不能得到一些数据。尝试过不同的方法。我设法找出了<a>
标记之间的文本、链接和每个链接之外的文本。在
<html>
<body>
<p align="left">
<font face="Arial, Helvetica, sans-serif" size="2">
<b>
<font size="4">
GOVERNOR:
</font>
</b>
<br/>
</font>
<font face="Arial, Helvetica, sans-serif" size="2">
<a href="http://governor.alabama.gov/">
<strong>
Robert
Bentley (R)*
</strong>
</a>
- Ex-Morgan County Commissioner & State Correctional Officer
<strong>
<br/>
<a href="http://www.facebook.com/stacy.george.3139">
Stacy George
(R)
</a>
- Ex-Morgan County Commissioner & State Correctional Officer
<br/>
Bob Starkey (R) - Retired Businessman, '10 State Rep. Candidate & '12 Scottsboro Mayor Candidate
<br/>
<a href="http://www.bassforbama.com/">
Kevin Bass (D)
</a>
- Businessman & Ex-Pro Baseball Player
<br/>
<a href="http://www.parkergriffithforcongress.com/">
Parker Griffith
(D)
</a>
- Ex-Congressman, Ex-State Sen., Physician & Ex-Republican
</strong>
</font>
</p>
</body>
</html>
这是我用beauthulsoup实现的
^{pr2}$上面的代码打印出如下内容:
> Robert
Bentley (R)*
http://governor.alabama.gov/
> Stacy George
(R)
http://www.facebook.com/stacy.george.3139
- Ex-Morgan County Commissioner & State Correctional Officer
> Kevin Bass (D)
http://www.bassforbama.com/
- Businessman & Ex-Pro Baseball Player
> Parker Griffith
(D)
http://www.parkergriffithforcongress.com/
- Ex-Congressman, Ex-State Sen., Physician & Ex-Republican
遗漏了第三项
Bob Starkey (R) - Retired Businessman, '10 State Rep. Candidate & '12 Scottsboro Mayor Candidate
请问我怎么用BeautifulSoup解决这个问题?
我试过用find_all("br")
来做,但是它不能作为br
标记返回NoneType
。在
抓取每个链接之外的所有文本节点:
打印:
^{pr2}$相关问题 更多 >
编程相关推荐