我正在尝试处理此页:
https://play.google.com/store/movies/details?id=3B6EBBD94D13B4DCMV
我使用以下代码阅读HTML:
from BeautifulSoup import BeautifulSoup as BS
import requests
def read_html(url):
try:
res = requests.get(url)
if res.status_code == 200:
html_content = res.content
soup = BS(html_content)
return _get_type(soup)
else:
print res.status_code
except ValueError, e:
print e
def _get_type(soup):
"""Read Movie."""
mydivs = soup.findAll("span", {"class": "DBzzzb"})
if mydivs:
return 'AVAILABLE'
mydivs = soup.findAll("span", {"class": "DBzzzb"})
if mydivs:
return 'PREORDER'
mydivs = soup.findAll("div", {"class": "Wc4pU"})
if mydivs:
return 'NOT_AVAILABLE'
return 'INVALID'
我的条件从不匹配:soup.findAll("div", {"class": "Wc4pU"}
即使那里实际上有HTML代码:
<div class="Wc4pU">We'll notify you on your wishlist when movies become available</div>
源HTML:
view-source:https://play.google.com/store/movies/details?id=3B6EBBD94D13B4DCMV
您需要指定一个解析器:
这也使得这个过程快得多。你知道吗
相关问题 更多 >
编程相关推荐