使用beautifulsoup完成webscrapping

html = requests.get("http://thehill.com/…/365407-sean-diddy-combs-wants-to-buy-c…").content news_soup = BeautifulSoup(html, "html.parser") a_text = news_soup.find_all('p') y = a_text[1].find_all('a').string

1条回答

网友

1楼 · 发布于 2024-09-26 22:51:45

您可以使用嵌套列表理解来查找带有段落标记的所有链接，并使用encode("ascii", 'ignore')对unicode进行解码：

import urllib
from bs4 import BeautifulSoup as soup
s = soup(str(urllib.urlopen('http://thehill.com/blogs/blog-briefing-room/365407-sean-diddy-combs-wants-to-buy-carolina-panthers-and-sign-kaepernick').read()), 'lxml')
all_text = [i.text.encode("ascii", 'ignore') for i in s.find_all('p')]
all_paragraphs = filter(None, [[b.text.encode("ascii", 'ignore') for b in i.find_all('a')] for i in s.find_all('p')])
print(all_text)
print(all_paragraphs)

输出：

['Hip hop mogul Sean Diddy Combs said Sunday night hes interested in buying the Carolina Panthers and signing quarterback Colin Kaepernick, who has been unemployed this season after kneeling during the national anthem in 2016.', 'Panthers owner Jerry Richardson announced Sunday he would be selling the team after the 2017 season, just hours after Sports Illustrated published accusations of sexual misconduct from former employees. Richardson also allegedly used a racial slur about a team scout.', 'Diddy took to Twitter soon after the Panthers announced the upcoming sale, declaring his desire to own a team and increase diversity among NFL ownership.', 'I would like to buy the @Panthers. Spread the word. Retweet!', 'There are no majority African American NFL owners. Lets make history.', '', 'Kaepernick respondedSundaymorning, saying I want in on the ownership group!', 'I want in on the ownership group! Lets make it happen!, 'Other athletes, including NBA starStephen Curryandformer NFL playerGreg Jennings,responded to Combs saying they were interested in part-owning the team.', "Former league MVP Cam Newton is the team's current quarterback.", 'Kaepernick has been a free agent since the end of the 2016 season, when he made headlinesfor kneeling during the national anthem before games to protest issues of racial inequality.', 'President TrumpDonald John TrumpHouse Democrat slams Donald Trump Jr. for serious case of amnesia after testimony Skier Lindsey Vonn: I dont want to represent Trump at Olympics Poll: 4 in 10 Republicans think senior Trump advisers had improper dealings with Russia MORE hascriticized Kaepernick directly, saying the NFL should have suspended him for the demonstration. He has since taken aim at other players who have knelt or sat during the anthem during the 2017 season.', '- This story was updated at 11:03 A.M. EST.', 'View the discussion thread.', 'The Hill 1625 K Street, NW Suite 900 Washington DC 20006 | 202-628-8500 tel | 202-628-8503 fax', 'The contents of this site are 2017 Capitol Hill Publishing Corp., a subsidiary of News Communications, Inc.']
[['Sports Illustrated'], ['@Panthers'], ['Stephen Curry', 'former NFL player'], ['President Trump', 'Donald John Trump', 'House Democrat slams Donald Trump Jr. for serious case of amnesia after testimony', 'Skier Lindsey Vonn: I dont want to represent Trump at Olympics', 'Poll: 4 in 10 Republicans think senior Trump advisers had improper dealings with Russia', 'MORE', 'criticized Kaepernick directly', 'knelt or sat'], ['View the discussion thread.']]

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用beautifulsoup完成webscrapping

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >