当bs4和其他python库不工作时，如何刮取动态网页？

2条回答

网友

1楼 · 编辑于 2024-10-08 18:22:06

是的，这是一个有趣的问题，实际上可以欺骗许多人当网络抓取数据。。。问题是图表是在JavaScript中的文档就绪后加载的，您可以了解有关文档就绪here的更多信息。但本质上，图表是在加载所有HTML、CSS和JS之后呈现的，并且数据绑定到数据属性

我创建了一个代码示例，它使用NodeJS Express server返回JSON中所有图表中的数据。本质上，它点击URL，指向图表所在的类，然后查找包含图表所有数据的data-*attr。这样，当基于JavaScript的图表呈现出现这些情况时，您就可以使用和分叉工作代码

带有NodeJS和Python的GitHub repo解决方案：https://github.com/joehoeller/dynamic-chart-parser-for-webscraping

网友

2楼 · 编辑于 2024-10-08 18:22:06

页面上的六个图表中的每一个都填充了来自各个API调用的数据，这些API调用可以在浏览器的网络设置下找到。您可以自己向这些端点发送请求并解析响应：

import urllib.parse, requests, json
headers = {'authority': 'www.eafo.eu', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"', 'accept': 'application/json, text/javascript, */*; q=0.01', 'x-requested-with': 'XMLHttpRequest', 'sec-ch-ua-mobile': '?0', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36', 'sec-fetch-site': 'same-origin', 'sec-fetch-mode': 'cors', 'sec-fetch-dest': 'empty', 'referer': 'https://www.eafo.eu/alternative-fuels/electricity/charging-infra-stats', 'accept-language': 'en-US,en;q=0.9', 'cookie': 'yearFilter=2020; activeSubMenu=electricity; subMenuActiveItem=charging_infra_stats; fuelFilter=Electricity; _ga=GA1.2.1782486955.1628797896; _gid=GA1.2.47726291.1628797896; _gat_gtag_UA_129775638_1=1'}
params = (('compare', 'false'),)
urls = ['https://www.eafo.eu/normal-and-fast-charge-points/-1/-1/-1/false/false/nvt?compare=false', 'https://www.eafo.eu/charging-positions-per-10-evs/-1/-1/-1/false/false/nvt?compare=false', 'https://www.eafo.eu/normal-power-charging-positions/-1/-1/-1/false/false/nvt?compare=false', 'https://www.eafo.eu/fillingstations-electricity-top-5/-1/-1/-1/false/false/nvt?compare=false', 'https://www.eafo.eu/fast-charging/-1/-1/-1/false/false/nvt?compare=false', 'https://www.eafo.eu/top-5-countries-charging-positions-per-10-evs/-1/-1/-1/false/false/nvt?compare=false'] 
data = [[urllib.parse.urlparse(url).path.split('/')[1], json.loads(requests.get(url, headers=headers, params=params).text)] for url in urls]
result = {a:[[i['c'][0]['v'], i['c'][1]['v']] for i in b['data']['rows']] for a, b in data}

输出：

{'normal-and-fast-charge-points': [[2008, 0], [2009, 0], [2010, 0], [2011, 13], [2012, 257], [2013, 751], [2014, 1474], [2015, 3396], [2016, 5190], [2017, 8723], [2018, 11138], [2019, 15136], [2020, 24987]], 'charging-positions-per-10-evs': [['2008', 0], ['2009', 0], ['2010', '14'], ['2011', '6'], ['2012', '3'], ['2013', '4'], ['2014', '5'], ['2015', '5'], ['2016', '5'], ['2017', '5'], ['2018', '6'], ['2019', '7'], ['2020', '9']], 'normal-power-charging-positions': [['2008', 0], ['2009', 0], ['2010', 400], ['2011', 2379], ['2012', 10250], ['2013', 17093], ['2014', 24917], ['2015', 44786], ['2016', 70012], ['2017', 97287], ['2018', 107446], ['2019', 148880], ['2020', 199250]], 'fillingstations-electricity-top-5': [['Netherlands', 66461], ['France', 45413], ['Germany', 43633], ['Sweden', 13564], ['Italy', 13214]], 'fast-charging': [['2008', 0], ['2009', 0], ['2010', 0], ['2011', 13], ['2012', 257], ['2013', 751], ['2014', 1474], ['2015', 3396], ['2016', 5190], ['2017', 8723], ['2018', 11138], ['2019', 15136], ['2020', 24987]], 'top-5-countries-charging-positions-per-10-evs': [['Latvia', '3.15'], ['Slovakia', '4.34'], ['Croatia', '5.14'], ['Estonia', '5.31'], ['Netherlands', '5.71']]}

以更清晰的JSON格式：

t = {' '.join(map(str.capitalize, a.split('-'))):b for a, b in result.items()}
print(json.dumps(t, indent=4))

输出：

{
    "Normal And Fast Charge Points": [
        [
            2008,
            0
        ],
        [
            2009,
            0
        ],
        [
            2010,
            0
        ],
        [
            2011,
            13
        ],
        [
            2012,
            257
        ],
        [
            2013,
            751
        ],
        [
            2014,
            1474
        ],
        [
            2015,
            3396
        ],
        [
            2016,
            5190
        ],
        [
            2017,
            8723
        ],
        [
            2018,
            11138
        ],
        [
            2019,
            15136
        ],
        [
            2020,
            24987
        ]
    ],
    "Charging Positions Per 10 Evs": [
        [
            "2008",
            0
        ],
        [
            "2009",
            0
        ],
        [
            "2010",
            "14"
        ],
        [
            "2011",
            "6"
        ],
        [
            "2012",
            "3"
        ],
        [
            "2013",
            "4"
        ],
        [
            "2014",
            "5"
        ],
        [
            "2015",
            "5"
        ],
        [
            "2016",
            "5"
        ],
        [
            "2017",
            "5"
        ],
        [
            "2018",
            "6"
        ],
        [
            "2019",
            "7"
        ],
        [
            "2020",
            "9"
        ]
    ],
    "Normal Power Charging Positions": [
        [
            "2008",
            0
        ],
        [
            "2009",
            0
        ],
        [
            "2010",
            400
        ],
        [
            "2011",
            2379
        ],
        [
            "2012",
            10250
        ],
        [
            "2013",
            17093
        ],
        [
            "2014",
            24917
        ],
        [
            "2015",
            44786
        ],
        [
            "2016",
            70012
        ],
        [
            "2017",
            97287
        ],
        [
            "2018",
            107446
        ],
        [
            "2019",
            148880
        ],
        [
            "2020",
            199250
        ]
    ],
    "Fillingstations Electricity Top 5": [
        [
            "Netherlands",
            66461
        ],
        [
            "France",
            45413
        ],
        [
            "Germany",
            43633
        ],
        [
            "Sweden",
            13564
        ],
        [
            "Italy",
            13214
        ]
    ],
    "Fast Charging": [
        [
            "2008",
            0
        ],
        [
            "2009",
            0
        ],
        [
            "2010",
            0
        ],
        [
            "2011",
            13
        ],
        [
            "2012",
            257
        ],
        [
            "2013",
            751
        ],
        [
            "2014",
            1474
        ],
        [
            "2015",
            3396
        ],
        [
            "2016",
            5190
        ],
        [
            "2017",
            8723
        ],
        [
            "2018",
            11138
        ],
        [
            "2019",
            15136
        ],
        [
            "2020",
            24987
        ]
    ],
    "Top 5 Countries Charging Positions Per 10 Evs": [
        [
            "Latvia",
            "3.15"
        ],
        [
            "Slovakia",
            "4.34"
        ],
        [
            "Croatia",
            "5.14"
        ],
        [
            "Estonia",
            "5.31"
        ],
        [
            "Netherlands",
            "5.71"
        ]
    ]
}

相关问题更多 >

编程相关推荐

热门问题

热门文章

当bs4和其他python库不工作时，如何刮取动态网页？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >