如何在Python中解析html源代码中的javascript代码？

import requests import bs4 import re url = 'https://www.khanacademy.org/computing/computer-programming/programming/drawing-basics/pt/making-drawings-with-code' response = requests.get(url) soup = bs4.BeautifulSoup(response.text,'html.parser') # by the way I am not sure if this is the right way to parse the link item = soup.find(string=re.compile('contentId')) # with this line I can get directly to the exact javascript tag that I need print(item) # but as you can see, it's a pretty big string, and I need to parse it to get the desired data. But you can find that the desired data "xe7fd4c285496ab91" is in it.

1条回答

网友

1楼 · 发布于 2024-09-30 01:27:53

您可以只在文本上使用regex，而不必搜索脚本

import re
import requests

r = requests.get('https://www.khanacademy.org/computing/computer-programming/programming/drawing-basics/pt/making-drawings-with-code')
p = re.compile(r'contentId":"((?:(?!").)*)')  
i = p.findall(r.text)[0]
print(i)

正则表达式

相关问题更多 >

编程相关推荐

热门问题

热门文章