页面刮伤,布局笨拙

2024-09-28 22:30:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在试图刮的页面是https://agco.maps.arcgis.com/apps/webappviewer/index.html?id=bef894bc0876448fba26333f1de8d370。从这个页面,我试图获得5种类型的数据,分配器的名称,地址,邮政(例如N4K 5N6),状态和公告日期

我的问题是我不知道如何进入这一页。我知道我想要的数据在“_description”下,但我不知道table元素是什么以及类。当我在页面上搜索时,我得到了这些令人尴尬的方框部分,这让我想,这一点是分开的吗?或者是表元素和头被打包到很远的地方,这就是我找不到它们的原因?任何关于我如何进入这一页的想法都会很好

from bs4 import BeautifulSoup
import requests
import pandas as pd


url = https://agco.maps.arcgis.com/apps/webappviewer/index.html?id=bef894bc0876448fba26333f1de8d370


r = requests.get(url)

soup = BeautifulSoup(r.text, 'html.parser')


cannabis =  soup.find( Would put the header here) 


Tags: apps数据httpsimportcomid元素index
1条回答
网友
1楼 · 发布于 2024-09-28 22:30:59
import requests
import json
import pandas as pd


def main(url):
    params = {
        "f": "json",
        "returnGeometry": "true",
        "spatialRel": "esriSpatialRelIntersects",
        "geometry": json.dumps({
            "xmin": -10018754.17139695,
            "ymin": 5009377.085700974,
            "xmax": -8766409.899972957,
            "ymax": 6261721.3571249675,
            "spatialReference": {
                "wkid": 102100,
                "latestWkid": 3857
            }
        }),
        "geometryType": "esriGeometryEnvelope",
        "inSR": "102100",
        "outFields": "*",
        "outSR": "102100",
        "resultType": "tile"
    }
    allin = []
    r = requests.get(url, params=params)
    for x in r.json()['features']:
        allin.append((x['attributes']))
    df = pd.DataFrame(allin)
    print(df)
    df.to_csv('data.csv', index=False)


main('https://services9.arcgis.com/8LLh665FxwX7bxLB/arcgis/rest/services/AGCOCannabisActive/FeatureServer/1/query')

输出:

     OBJECTID  ...                   PublicNoticeDate
0      147437  ...  From May 13, 2020 to May 27, 2020
1      148101  ...                                  .
2      147508  ...  From Mar 27, 2020 to Apr 10, 2020
3      147176  ...                                  .
4      147840  ...  From Dec 07, 2020 to Dec 21, 2020
..        ...  ...                                ...
780    147940  ...  From Jan 12, 2021 to Jan 26, 2021
781    147691  ...  From Sep 08, 2020 to Sep 22, 2020
782    147791  ...  From Aug 28, 2020 to Sep 11, 2020
783    147250  ...  From May 09, 2020 to May 23, 2020
784    147201  ...  From Mar 07, 2020 to Mar 21, 2020

[785 rows x 13 columns]

相关问题 更多 >