靓汤不能提供si的数据

import urllib from urllib.request import urlopen from bs4 import BeautifulSoup import re import json import time outfile = open('/Users/Luca/Desktop/test/farm_data.text','w') my_list = list() site = "https://farm.ewg.org/addrsearch.php?stab2=NY&fullname=A&b=1&page=0" my_list.append(site) site = "https://farm.ewg.org/addrsearch.php?stab2=NY&fullname=B&b=1&page=0" my_list.append(site) site = "https://farm.ewg.org/addrsearch.php?stab2=NY&fullname=C&b=1&page=0" my_list.append(site) for item in my_list: time.sleep( 5 ) html = urlopen(item) bsObj = BeautifulSoup(html.read(), "html.parser") nameList = bsObj.prettify().split('.') count = 0 for name in nameList: print (name[2:]) outfile.write(name[2:] + ',' + item + '\n')

1条回答

网友

1楼 · 发布于 2024-09-29 22:18:59

有问题的网站可能不允许网络垃圾，这就是为什么你得到：

HTTPError: HTTP Error 403: Forbidden

你可以伪装成浏览器代理来欺骗你的用户代理。下面是一个如何使用奇妙的^{}模块的例子。在发出请求时，您将传递一个User-Agent头

import requests

url = "https://farm.ewg.org/addrsearch.php?stab2=NY&fullname=A&b=1&page=0"
html = requests.get(url, headers={'User-Agent' : 'Mozilla/5.0'}).text
bsObj = BeautifulSoup(html, "html.parser")
print(bsObj)

输出：

<!DOCTYPE doctype html>    
<html class="no-js" lang="en" prefix="og: http://ogp.me/ns#" xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://ogp.me/ns/fb#">
<head>
<meta charset="utf-8"/>
.
.
.

现在可以将此代码按摩到循环中

相关问题更多 >

编程相关推荐

热门问题

热门文章