如何处理在Pandas中返回dictlike对象列表的JSON?

2024-09-27 07:24:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用collegefootballdata.com的API获取分数和下注线的数据。我想使用下注线来推断预期赢率%,然后将其与实际结果进行比较(我觉得我的团队输掉了太多我们最喜欢的比赛,我想测试一下)。此代码检索一个游戏作为示例

parameters = {
"gameId": 401112435,
"year": 2019
}
response = requests.get("https://api.collegefootballdata.com/lines", params=parameters)

JSON输出如下所示:

[
{
    "awayConference": "ACC",
    "awayScore": 28,
    "awayTeam": "Virginia Tech",
    "homeConference": "ACC",
    "homeScore": 35,
    "homeTeam": "Boston College",
    "id": 401112435,
    "lines": [
        {
            "formattedSpread": "Virginia Tech -4.5",
            "overUnder": "57.5",
            "provider": "consensus",
            "spread": "4.5"
        },
        {
            "formattedSpread": "Virginia Tech -4.5",
            "overUnder": "57",
            "provider": "Caesars",
            "spread": "4.5"
        },
        {
            "formattedSpread": "Virginia Tech -4.5",
            "overUnder": "58",
            "provider": "numberfire",
            "spread": "4.5"
        },
        {
            "formattedSpread": "Virginia Tech -4.5",
            "overUnder": "56.5",
            "provider": "teamrankings",
            "spread": "4.5"
        }
    ],
    "season": 2019,
    "seasonType": "regular",
    "week": 1
}
]

然后,我将以下内容加载到熊猫数据帧中:

def jstring(obj):
    # create a formatted string of the Python JSON object
    text = json.dumps(obj, sort_keys=True, indent=4)
    return text

json_str = jstring(response.json())
df = pd.read_json(json_str)

这将创建一个带有“lines”列的数据框架,该列以字符串形式包含JSON的整个lines部分。最后,我想使用块中的“spread”值,其中“provider”=“consensus”。其他一切都与我无关。我试过用你的名字来炸这个专栏

df = df.explode('lines')

这给了我4行,每个游戏都有这样的内容(如预期的那样):

{'formattedSpread': 'Virginia Tech -4.5', 'overUnder': '57.5', 'provider': 'consensus', 'spread': '4.5'}

这就是我被困的地方。我只想保留“provider”为“consenses”的行,而且我还需要使用“spread”作为分析中的单独变量/列。我已经尝试了第二次爆炸,df.split,df.replace将{to[和爆炸为一个列表,都没有用。感谢您的帮助


Tags: 数据comjson游戏dfprovidertechlines
1条回答
网友
1楼 · 发布于 2024-09-27 07:24:15

这可能就是你要找的-

编辑:处理特殊情况

import pandas as pd
import requests

params = {
    "gameId": 401112435,
    "year": 2019,
}

r = requests.get("https://api.collegefootballdata.com/lines", params=params)

df = pd.DataFrame(r.json()) # Create a DataFrame with a lines column that contains JSON
df = df.explode('lines') # Explode the DataFrame so that each line gets its own row
df = df.reset_index(drop=True) # After explosion, the indices are all the same - this resets them so that you can align the DataFrame below cleanly

def fill_na_lines(lines):
    if pd.isna(lines):
        return {k: None for k in ['provider', 'spread', 'formattedSpread', 'overUnder']}
    return lines

df.lines = df.lines.apply(fill_na_lines)

lines_df = pd.DataFrame(df.lines.tolist()) # A separate lines DataFrame created from the lines JSON column
df = pd.concat([df, lines_df], axis=1) # Concatenating the two DataFrames along the vertical axis.

# Now you can filter down to whichever rows you need.
df = df[df.provider == 'consensus']

The documentation on joining DataFrames in different ways is probably useful.

相关问题 更多 >

    热门问题