在双引号中提取数据的有效方法

atag = '<a href="Networking-denial-of-service.aspx">Next Page →</a>' start = 0 end = 0 for i in range(len(atag)): if atag[i] == '"' and start==0: start = i elif atag[i] == '"' and end==0: end = i nxtlink = atag[start+1:end]

2条回答

网友

1楼 · 编辑于 2024-09-29 23:26:44

我的问题和前面写的完全一样——如何在两个双引号之间获取数据。我同意这样的评论：一个HTMLParser可能更好。。。在

使用正则表达式可能会有所帮助，特别是如果您希望找到多个正则表达式。例如，这是一组可能的代码

import re
string_with_quotes = 'Some "text" "with inverted commas"\n "some text \n with a line break"'

Find_double_quotes = re.compile('"([^"]*)"', re.DOTALL|re.MULTILINE|re.IGNORECASE) # Ignore case not needed here, but can be useful.

list_of_quotes = Find_double_quotes.findall(string_with_quotes)

list_of_quotes

['text', 'with inverted commas', 'some text \n with a line break']

如果有奇数个双引号，则忽略最后一个双引号。如果没有找到，则生成一个空列表。在

各种参考文献

http://www.regular-expressions.info/对于学习正则表达式非常有用

Regex - Does not contain certain Characters告诉我如何不做一个字符

https://docs.python.org/2/library/re.html#re.MULTILINE告诉你什么re.多行以及雷多尔（下面）做。在

网友

2楼 · 编辑于 2024-09-29 23:26:44

您标记了这个beautifulGroup，所以我不明白您为什么需要regex，如果您想要来自所有锚定的href，那么您可以使用css select 'a[href]'，它只会找到具有href属性的锚定标记：

h = '''<a href="Networking-denial-of-service.aspx">Next Page →</a>'''

soup = BeautifulSoup(h)

print(soup.select_one('a[href]')["href"])

或查找：

^{pr2}$

如果您有多个：

for  a in soup.select_one('a[href]'):
    print a["href"]

或者：

for  a in  soup.find_all("a", href=True):
     print a["href"]

您还可以指定希望href具有前导符“：

 soup.select_one('a[href^="]')

相关问题更多 >

编程相关推荐

热门问题

热门文章