使用sodapi Python从数据集中导入所有行

2024-10-01 15:44:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试导入以下数据集并将其存储在数据框中:https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh/data

我使用以下代码:

 r = requests.get('https://data.nasa.gov/resource/gh4g-9sfh.json')
 meteor_data = r.json()
 df = pd.DataFrame(meteor_data)
 print(df.shape)

结果数据帧只有1000行。我需要它有所有的45716行。我该怎么做?你知道吗


Tags: 数据代码httpsjsondfdataspacerequests
2条回答

查看docs on the $limit parameter

The $limit parameter controls the total number of rows returned, and it defaults to 1,000 records per request.

Note: The maximum value for $limit is 50,000 records, and if you exceed that limit you'll get a 400 Bad Request response.

所以你只是得到默认数量的记录回来。你知道吗

在一个API调用中,您将无法获得超过50000条记录-这将需要使用$limit和$offset进行多个调用

尝试:

https://data.nasa.gov/resource/gh4g-9sfh.json$limit=50000

Why am I limited to 1,000 rows on SODA API when I have an App Key

像这样设置限制

import pandas as pd
from sodapy import Socrata

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.nasa.gov", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.nasa.gov,
#                  MyAppToken,
#                  userame="user@example.com",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("gh4g-9sfh", limit=2000)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

相关问题 更多 >

    热门问题