尝试使用Beautifulsoup查找多个span标记之间的所有文本

YANGON Standing among the party seeing off Myanmar's new president as he left for Russia on Wednesday was leading businessman Htun Myint Naing, better known as Steven Law. Only the day before, the United States had added six of his companies to the Treasury's blacklist, a move that is unlikely to hamper the tycoon's business empire significantly. President Barack Obama's sanctions policy on Myanmar, updated on Tuesday, aims to strike a balance between targeting individuals without undermining development or deterring U.S. businesses eying the country as it opens up to global trade. Underlining how tricky that balance is, Law may actually gain commercially from the latest changes, even if they do make it harder for him to portray himself as an internationally accepted businessman close to the new democratic government. "Though (sanctions) are not meant to have a blanket effect on the country, their intended targets often play outsize roles ... controlling critical infrastructure impacting trade and business for ordinary citizens," said Nyantha Maw Lin, managing director at consultancy Vriens & Partners in Yangon. On Tuesday, Washington eased some restrictions on Myanmar but also strengthened measures against Law by adding six firms connected to him and his conglomerate, Asia World, to the Treasury blacklist. Yet the blacklisting, which attracted considerable attention in Myanmar, looks like a formality given that the companies were already covered by sanctions, because they were owned 50 percent or more by Law or Asia World. Law was sanctioned in 2008 for alleged ties to Myanmar's military. More important for Law was the U.S. decision to further ease restrictions on trading through his shipping port and airports, extending a temporary six month allowance set in December to an indefinite one. PORTS BACK IN FAVOR Law is one of the most powerful and well-connected businessmen in Myanmar with close ties to China. He is not, however, universally popular at home or abroad because of alleged ties to the military, which ruled Myanmar with an iron fist until 2011.

import requests from bs4 import BeautifulSoup z = requests.get("http://www.reuters.com/article/us-myanmar-usa-sanctions-idUSKCN0Y92RK/") url2 = 'http://www.reuters.com/article/us-myanmar-usa-sanctions-idUSKCN0Y92RK' response2 = requests.get(url2) soup2 = BeautifulSoup(response2.content, "html.parser") first_sentence = soup2.p.get_text() print(first_sentence) second_sentence = soup2.p.find_all_next() print(second_sentence)

3条回答

网友

1楼 · 编辑于 2024-09-27 22:23:32

您可以使用CSS选择器#articleText p返回中的所有元素，其中id等于“articleText”：

>>> import requests
>>> from bs4 import BeautifulSoup
>>> url2 = 'http://www.reuters.com/article/us-myanmar-usa-sanctions-idUSKCN0Y92RK'
>>> response2 = requests.get(url2)
>>> soup2 = BeautifulSoup(response2.content, "html.parser")
>>> for sentence in soup2.select("#articleText p"):
...     print(sentence.get_text())
...     print()
... 
YANGON Standing among the party seeing off Myanmar's new president as he left for Russia on Wednesday was leading businessman Htun Myint Naing, better known as Steven Law.

Only the day before, the United States had added six of his companies to the Treasury's blacklist, a move that is unlikely to hamper the tycoon's business empire significantly.

President Barack Obama's sanctions policy on Myanmar, updated on Tuesday, aims to strike a balance between targeting individuals without undermining development or deterring U.S. businesses eying the country as it opens up to global trade.

Underlining how tricky that balance is, Law may actually gain commercially from the latest changes, even if they do make it harder for him to portray himself as an internationally accepted businessman close to the new democratic government.

......
......

网友

2楼 · 编辑于 2024-09-27 22:23:32

你可以试试：soup2.p.find_all_next（text=True）

像这样：

second_sentence = soup2.p.find_all_next(text=True)

for item in second_sentence:

       print(item.split('\n'))

网友

3楼 · 编辑于 2024-09-27 22:23:32

您的问题可能是find_all_next()方法返回出现在起始元素（之前匹配的）之后的所有匹配项，并且由于您没有指定要匹配的标记，所以它匹配所有内容。在

如果您将其更改为soup2.p.find_all_next("p")，您将得到页面上所有剩余的标记，然后可以通过使用类似的方法遍历这些标记（或者如果愿意，可以显式地分配它们）

soup2 = BeautifulSoup(response2.content, "html.parser")
first_sentence = soup2.p.get_text()
print(first_sentence)
for sentence in soup2.p.find_all_next("p")
    print(sentence.get_text())

如果只删除附加变量并使用findAll()，则更简单：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章