在python中只取出纯格式的段落值

2024-09-28 17:03:35 发布

您现在位置:Python中文网/ 问答频道 /正文

有谁能帮我在python的

标记中以简单的形式查找数据,我已经解释了这个问题

Zale\u2019s largest shareholder, TIG, is highlighting Bank of America\u2019s conflict of interest in a sale to Signet Jewelers. That and other factors may lead shareholders to vote down the deal, Steven M. Davidoff writes in the Deal Professor. Read more…</a></p>\n</div>\n</article>"}

我想要像这样的输出

Zale largest shareholder, TIG, is highlighting Bank of America conflict of interest in a sale to Signet Jewelers. That and other factors may lead shareholders to vote down the deal, Steven M. Davidoff writes in the Deal Professor.

此代码

import urllib2
import re
response = urllib2.urlopen('http:')
print "Response:", response
regex = '<div class=\"entry-content\">(.*?)</div>'
pattern =re.compile(regex)
# Get all data
html = response.read()
splitsource = re.findall(pattern,html)
print "this is the",splitsource

但我已经空了

splisource = []

请帮忙


Tags: ofthetoindivreisresponse
1条回答
网友
1楼 · 发布于 2024-09-28 17:03:35

这将从html中的段落中获取文本:

import requests
from bs4 import BeautifulSoup
url="http://stackoverflow.com/questions/23715844/taking-out-only-paragraph-value-in-plain-form-in-python"
r = requests.get(url)
soup = BeautifulSoup(r.text)
for t in soup.find_all("p"):
    print t.getText()

看看beautifulsoup docs

相关问题 更多 >