这是我的代码,我使用Python来获取信息,我使用代理、头、会话来模拟,但是我一直得到501
# -*- coding: utf-8 -*-
import requests
from pyquery import PyQuery as pq
from goose import Goose
from goose.text import StopWordsChinese
import json
import time
class ItSlaw(object):
def __init__(self):
self.url = 'XXXX'
self.headers = {'XXXX'}
self.result = None
self.keyword = None
self.session = requests.Session()
def reset(self, keyword):
self.keyword = keyword
self.result = None
def fetch(self):
url = self.url.format(keyword='self.keyword',keywordcopy='self.keyword')
res = []
time.sleep(3)
proxies = {"http": "14.111.148.1"}
r = self.session.get(url, proxies=proxies)
print r.status_code
completed_url = 'http://www.itslaw.com/' + 'url'
g = Goose({'stopwords_class': StopWordsChinese})
article = g.extract(url=completed_url)
content = article.cleaned_text
res.append()
self.result = res
return self.result
def get_result(self):
return self.result
可以使用selenium:
使用
pip
安装selenium for Python。 对于Linux(Ubuntu/Debian),它看起来:sudo apt-get install python-pip
sudo pip install selenium
(!)你必须在谷歌上搜索如何为你的操作系统做这件事
相关问题 更多 >
编程相关推荐