为什么我不能爬过这一页？

# Load the required modules import urllib from bs4 import BeautifulSoup import pandas as pd # Open up the page url = "http://www.multiclick.co.kr/sub/gamepatch/gamerank.html" web_page = urllib.request.Request( url, data = None, headers={'User-Agent': ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) " "AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/35.0.1916.47 Safari/537.36")}) type(web_page) web_page = urllib.request.urlopen(web_page) # Parse the page soup = BeautifulSoup(web_page, "html.parser") print(soup) # Get the table # Get the columns # Get the rows # Stack them altogether # Save it as a csv form

1条回答

网友

1楼 · 发布于 2024-06-28 20:04:19

Ais@mx0说，与其获取main页面，不如获取ajax调用，例如：

import csv
import requests

link = "http://ws.api.thelog.co.kr/service/info/rank/2018-10-18"

req = requests.get(link)
content = req.json()
with open('ranks.csv', 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
    # write column titles
    csv_writer.writerow(['gameRank', 'gameName', 'gameTypeName', 'gameShares', 'publisher', 'gameRankUpDown'])
    # write values
    for row in content["list"]:
        csv_writer.writerow(list(row.values()))

相关问题更多 >

编程相关推荐

热门问题

热门文章