使用beautifulsoup从<script>提取数据

url = "www.example.com" html = urllib.request.urlopen(url) soup = BeautifulSoup(html, "html.parser") # get the script tag data and convert soup into a string data = str(soup.find("script")) # cut the <script> tag and some other things from the beginning and end to get valid JSON cut = data[27:-13] # load the data as a json dictionary jsoned = json.loads(cut)

2条回答

网友

1楼 · 编辑于 2024-09-30 05:23:50

>>> import re
>>> soup.find_all(re.compile("\[(.*?)\]"))

你可以用regex

您必须创建一个只接受[]之间文本的regex规范

here a link of common regex usage within beautifulsoup

here the regex to extract from between square brackets

网友

2楼 · 编辑于 2024-09-30 05:23:50

使用.text获取<script>标记内的内容，然后替换dataLayer =

raw_data = soup.find("script")
raw_data = raw_data.text.replace('dataLayer =', '')
json_dict = json.loads(raw_data)

编程相关推荐

如何从用户处接受整数并将其存储在java的int数组中
java RecyclerView+SQLite从数据库中删除行并调用。notifyItemRemoved临时读取底部的行
使用docker将my angularjs与java maven集成时面临的角度问题
java特定的枚举可以有属性吗
java多线程Jsoup代理身份验证？
java JTextPane获取组件值
java Maven wildflymavenplugin定制独立版。xml
java试图将项目添加到对话框中，但无法显示
java HQL内联查询
java Liquibase与多模块项目

相关问题更多 >

编程相关推荐

热门问题

热门文章