从Pandas系列中提取“url”值

df_myposts['image_versions2.candidates'][1][0]['url'] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-64-3f0532195cb7> in <module> ----> 1 df_myposts['image_versions2.candidates'][1][0]['url'] TypeError: 'float' object is not subscriptable

3条回答

网友

1楼 · 编辑于 2024-09-30 04:35:35

我们可以在这里使用list comprehension和^{}来提取URL标记：

df.fillna('None', inplace=True)

df['image_url'] = [
    d['image_versions2.candidates']['url'] if d['image_versions2.candidates'] != 'None' else 'None' for idx, d in df.iterrows()
]

print(df)
                          image_versions2.candidates   image_url
0  {'width': 750, 'height': 498, 'url': 'https:/X...  https:/XXX
1                                               None        None
2  {'width': 750, 'height': 498, 'url': 'https:/Y...  https:/YYY
3  {'width': 750, 'height': 498, 'url': 'https:/Z...  https:/ZZZ

网友

2楼 · 编辑于 2024-09-30 04:35:35

使用@amanb的设置数据帧

df = pd.DataFrame({
    'a':[1,2,3],
    'b':[
        [{'width': 750, 'height': 498, 'url': 'https:/XXX'}],
        [{'width': 750, 'height': 498, 'url': 'https:/YYY'}],
        None
    ]
})

{{cd2>可以使用一个元素的列表。然后使用to_dict和{}

^{pr2}$

为了得到

   width  height         url
0    750     498  https:/XXX
1    750     498  https:/YYY

您可以使用join添加到df

df.join(pd.DataFrame.from_dict(df.b.dropna().str[0].to_dict(), orient='index'))

   a                                                  b  width  height         url
0  1  [{'width': 750, 'height': 498, 'url': 'https:/...  750.0   498.0  https:/XXX
1  2  [{'width': 750, 'height': 498, 'url': 'https:/...  750.0   498.0  https:/YYY
2  3                                               None    NaN     NaN         NaN

或者你可以替换柱子

df.assign(b=pd.DataFrame.from_dict(df.b.dropna().str[0].to_dict(), orient='index').url)

   a           b
0  1  https:/XXX
1  2  https:/YYY
2  3         NaN

我的实际建议

但我最喜欢的是用pd.io.json.json_normalize来代替字典的魔力。在

df.assign(b=pd.io.json.json_normalize(df.b.dropna().str[0]).url)

   a           b
0  1  https:/XXX
1  2  https:/YYY
2  3         NaN

网友

3楼 · 编辑于 2024-09-30 04:35:35

使用：

df = pd.DataFrame({'a':[1,2,3], 'b':[[{'width': 750, 'height': 498, 'url': 'https:/XXX'}], [{'width': 750, 'height': 498, 'url': 'https:/YYY'}], None]})
# df.dropna(inplace = True) #drop rows with null values
# to preserve rows with NaN, first replace NaN values with a scalar/dict value
df.fillna('null', inplace=True)
df['c'] = df['b'].apply(lambda x: [y['url'] if isinstance(x, list) else 'null' for y in x])
df['c'] = df['c'].apply(lambda x:x[0]) #get only the url from the list

#Output:
    a                        b                                   c
0   1   [{'width': 750, 'height': 498, 'url': 'https:/...   https:/XXX
1   2   [{'width': 750, 'height': 498, 'url': 'https:/...   https:/YYY
2   3                       null                                null

我的实际建议

相关问题更多 >

编程相关推荐

热门问题

热门文章