当pandas中的值为null时，不使用pandas的to_列表

网友

1楼 · 编辑于 2024-07-02 19:50:13

如果元组具有不同数量的元素，则更通用的解决方案是创建如下所示的自定义函数

def create_columns_from_tuple(df, tuple_col):
    
    # get max length of tuples
    max_len = df[tuple_col].apply(lambda x: 0 if x is None else len(x)).max()
    
    # select rows with non-empty tuples
    df_full = df.loc[df[tuple_col].notna()]
    
    # create dataframe with exploded tuples
    df_full_exploded = pd.DataFrame(df_full[tuple_col].tolist(),
                                    index=df_full.index, 
                                    columns=[tuple_col + str(n) for n in range(1, max_len+1)])
    
    # merge the two dataframes by index
    result = df.merge(df_full_exploded, left_index=True, right_index=True, how='left')
    
    return result

在此函数中，传递数据帧和元组列的名称。该函数将自动创建与元组最大长度相同的列

create_columns_from_tuple(df, tuple_col='b')
#      a       b   b1   b2
# 0  NaN    None  NaN  NaN
# 1  1.0  (1, 2)  1.0  2.0
# 2  2.0  (3, 4)  3.0  4.0

如果元组的元素数不同：

df = pd.DataFrame({'a':[None,1, 2], 'b':[None, (1,2,42), (3,4)]}) 
create_columns_from_tuple(df, tuple_col='b')
#      a           b   b1   b2    b3
# 0  NaN        None  NaN  NaN   NaN
# 1  1.0  (1, 2, 42)  1.0  2.0  42.0
# 2  2.0      (3, 4)  3.0  4.0   NaN

网友

2楼 · 编辑于 2024-07-02 19:50:13

您可以首先^{}列b中的NaN值，然后从列b中的其余元素创建一个新的数据帧，并将生成的数据帧分配给列b1和b2：

b = df['b'].dropna()
df[['b1', 'b2']] = pd.DataFrame(b.tolist(), index=b.index)

>>> df

     a       b   b1   b2
0  NaN    None  NaN  NaN
1  1.0  (1, 2)  1.0  2.0
2  2.0  (3, 4)  3.0  4.0

网友

3楼 · 编辑于 2024-07-02 19:50:13

令我惊讶的是，this solution by piR²在您的案例中也起作用：

df["x"], df["y"] = df.b.str

输出：

     a       b    x    y
0  NaN    None  NaN  NaN
1  1.0  (1, 2)  1.0  2.0
2  2.0  (3, 4)  3.0  4.0

话虽如此，但有一个未来的警告，所以这不是一个长期的解决方案

相关问题更多 >

编程相关推荐

热门问题

热门文章

当pandas中的值为null时，不使用pandas的to_列表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >