去掉一个反斜杠

2024-06-26 12:57:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我对goose提取的文本有一个小的正则表达式问题。你知道吗

我用Goose从html页面中提取了干净的文本,Goose给出的输出很好,但是有一个小问题。我得到下面的字符串。你知道吗

    My name is Sam\'s, I like to play \'football\'

The actual text looks like 

    My name is Sam's, I like to play 'football'

I am trying to get rid of the backslash. When I try the below code for the text extracted by goose, somehow the code doesn't work, however, if I input the text myself the code works perfectly.

I tried the below code

re.sub(r"\\","",text) or
text.replace("\\","")
text.decode()

请查找以下代码:

from goose import Goose
url = 'http://economictimes.indiatimes.com/news/politics-and-    nation/swach-bharat-drives-draws-inspiration-from-mahatma-    gandhi/articleshow/49203355.cms'
g = Goose()
article = g.extract(url=url)
text=article.cleaned_text

print text
.....International School here on Friday, Gandhi\'s 146th birth anniversary.Gurjit Singh said that apart from Gandhi\'s birth anniversary,....

text=re.sub(r"\\","",text)
print text
.....International School here on Friday, Gandhi\'s 146th birth anniversary.Gurjit Singh said that apart from Gandhi\'s birth anniversary,....

我该如何去掉反斜杠。你知道吗


Tags: thetotextnamefrom文本urlmy