我对Python非常陌生,非常想了解更多。我现在正在做的课程给了我一个任务。。。在
python app_fetcher.py <app_id>
。然后将元数据存储在当前目录中的文件夹中(例如./<app_id>
)我已经开始了这一点,但不确定如何真正着手做网页抓取部分的脚本。有人能给我建议吗。我不知道该使用什么库或调用什么函数。我在网上看过,但都需要安装额外的软件包。以下是我目前所拥有的一切,任何帮助都将不胜感激!!!。。。在
# Function to crawl Google Play Store and obtain data
def web_crawl(app_id):
import os, sys, urllib2
try:
# Obtain the URL for the app
url = "https://play.google.com/store/apps/details?id=" + app_id
# open url for reading
response = urllib2.urlopen(url)
# Get path of py file to store txt file locally
fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
# Open file to store app metadata
with open(fpath + "\web_crawl.txt", "w") as f:
f.write("Google Play Store Web Crawler \n")
f.write("Metadata for " + app_id + "\n")
f.write("*************************************** \n")
f.write("Icon: " + "\n")
f.write("Title: " + "\n")
f.write("Description: " + "\n")
f.write("Screenshots: " + "\n")
# Added subtitle
f.write("Subtitle: " + "\n")
# Close file after write
f.close()
except urllib2.HTTPError, e:
print("HTTP Error: ")
print(e.code)
except urllib2.URLError, e:
print("URL Error: ")
print(e.args)
# Call web_crawl function
web_crawl("com.cmplay.tiles2")
我建议你用美容素。首先,使用以下代码
使用soup对象可以使用选择器从页面中提取元素
阅读更多信息:https://www.crummy.com/software/BeautifulSoup/bs4/doc/
相关问题 更多 >
编程相关推荐