如何将列中的列表转换为垂直形状？

import pandas as pd pd.DataFrame({ 'col1':['fruit', 'veicle', 'animal'], 'col2':['apple', 'bycicle', 'cat'], 'col3':[1,4,2], 'list':[ [10, 20], [1.2, 3.0, 2.75], ['tommy', 'tom'] ] })

|col1 |col2 |col3|list | |------|-------|----|-------| |fruit |apple | 1|10 | |fruit |apple | 1|20 | |viecle|bycicle| 4|1.2 | |viecle|bycicle| 4|3.0 | |viecle|bycicle| 4|2.75 | |animal|cat | 2|'tommy'| |animal|cat | 2|'tom |

3条回答

网友

1楼 · 编辑于 2024-09-30 01:19:28

前几天从piR那里学到了这个很酷的技巧，使用了np.repeat和{}：

idx = np.arange(len(df)).repeat(df.list.str.len(), 0)    
out = df.iloc[idx, :-1].assign(list=np.concatenate(df.list.values))
print(out)

     col1     col2  col3   list
0   fruit    apple     1     10
0   fruit    apple     1     20
1  veicle  bycicle     4    1.2
1  veicle  bycicle     4    3.0
1  veicle  bycicle     4   2.75
2  animal      cat     2  tommy
2  animal      cat     2    tom

性能

小

^{pr2}$

大

df_test = pd.concat([df] * 10000)

# Bharath
%timeit df_test.set_index(['col1','col2','col3'])['list'].apply(pd.Series).stack()\
              .reset_index().drop('level_3',axis=1)
1 loop, best of 3: 7.09 s per loop

# Mine
%%timeit 
idx = np.arange(len(df_test)).repeat(df_test.list.str.len(), 0)    
out = df_test.iloc[idx, :-1].assign(list=np.concatenate(df_test.list.values))
10 loops, best of 3: 123 ms per loop

作为1号线，巴拉斯的回答很短，但很慢。这里有一个改进，它使用dataframe构造函数而不是df.apply，在大数据上实现200x的加速：

idx = df.set_index(['col1', 'col2', 'col3']).index
out = pd.DataFrame(df.list.values.tolist(), index=idx).stack()\
                .reset_index().drop('level_3', 1).rename(columns={0 : 'list'})

print(out)

     col1     col2  col3   list
0   fruit    apple     1     10
1   fruit    apple     1     20
2  veicle  bycicle     4    1.2
3  veicle  bycicle     4      3
4  veicle  bycicle     4   2.75
5  animal      cat     2  tommy
6  animal      cat     2    tom

小

100 loops, best of 3: 4.7 ms per loop

大

10 loops, best of 3: 28.9 ms per loop

网友

2楼 · 编辑于 2024-09-30 01:19:28

您可以设置前三列的索引，然后将pd.Series应用于list的列，然后将它们堆叠起来。在

df.set_index(['col1','col2','col3'])['list'].apply(pd.Series).stack().reset_index().drop('level_3',axis=1)

输出：

^{pr2}$

网友

3楼 · 编辑于 2024-09-30 01:19:28

下面是你完成这项任务的大致方法。这不是确切的解决方案，但会让您了解如何完成任务：

original_df = <your dataframe to start>
new_empty_df = pd.DataFrame()
# now go through each row of the original df
for i in range(original_df.shape[0]):
    row_Series = original_df.iloc[i]
    row_list = row_Series['list']
    for item in row_list:
         new_empty_df.append({'col1':row_Series['col1'],
                              'col2':row_Series['col2'],
                               'list':item})

小

大

小

大

相关问题更多 >

编程相关推荐

热门问题

热门文章