使用pandas将行拆分为多行

2条回答

网友
1楼 · 编辑于 2024-09-24 22:29:37

你可以试试这个
N = 4 df_new = pd.DataFrame(df_original.values.reshape(-1, N)) df_new.columns = ['slotNew{:}'.format(i + 1) for i in range(N)]
代码将数据提取到numpy.ndarray，对其进行整形，并创建所需维度的新数据集。你知道吗
示例：
import numpy as np import pandas as pd df0 = pd.DataFrame(np.arange(48 * 3).reshape(-1, 48)) df0.columns = ['slot{:}'.format(i + 1) for i in range(48)] print(df0) # slot1 slot2 slot3 slot4 ... slot45 slot46 slot47 slot48 # 0 0 1 2 3 ... 44 45 46 47 # 1 48 49 50 51 ... 92 93 94 95 # 2 96 97 98 99 ... 140 141 142 143 # # [3 rows x 48 columns] N = 4 df = pd.DataFrame(df0.values.reshape(-1, N)) df.columns = ['slotNew{:}'.format(i + 1) for i in range(N)] print(df.head()) # slotNew1 slotNew2 slotNew3 slotNew4 # 0 0 1 2 3 # 1 4 5 6 7 # 2 8 9 10 11 # 3 12 13 14 15 # 4 16 17 18 19
另一种方法
N = 4 df1 = df0.stack().reset_index() df1['i'] = df1['level_1'].str.replace('slot', '').astype(int) // N df1['j'] = df1['level_1'].str.replace('slot', '').astype(int) % N df1['i'] -= (df1['j'] == 0) - df1['level_0'] * 48 / N df1['j'] += (df1['j'] == 0) * N df1['j'] = 'slotNew' + df1['j'].astype(str) df1 = df1[['i', 'j', 0]] df = df1.pivot(index='i', columns='j', values=0)

网友
2楼 · 编辑于 2024-09-24 22:29:37

制作块后使用pandas.explode。给定df：
import pandas as pd df = pd.DataFrame([np.arange(1, 49)], columns=['slot%s' % i for i in range(1, 49)]) print(df) slot1 slot2 slot3 slot4 slot5 slot6 slot7 slot8 slot9 slot10 ... \ 0 1 2 3 4 5 6 7 8 9 10 ... slot39 slot40 slot41 slot42 slot43 slot44 slot45 slot46 slot47 \ 0 39 40 41 42 43 44 45 46 47 slot48 0 48
用chunks除：
def chunks(l, n): """Yield successive n-sized chunks from l. Source: https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks """ n_items = len(l) if n_items % n: n_pads = n - n_items % n else: n_pads = 0 l = l + [np.nan for _ in range(n_pads)] for i in range(0, len(l), n): yield l[i:i + n] N = 4 new_df = pd.DataFrame(list(df.apply(lambda x: list(chunks(list(x), N)), 1).explode())) print(new_df)
输出：
0 1 2 3 0 1 2 3 4 1 5 6 7 8 2 9 10 11 12 3 13 14 15 16 4 17 18 19 20 ...
与numpy.reshape相比，这种方法的优势在于它可以处理N不是一个因子的情况：
N = 7 new_df = pd.DataFrame(list(df.apply(lambda x: list(chunks(list(x), N)), 1).explode())) print(new_df)
输出：
0 1 2 3 4 5 6 0 1 2 3 4 5 6 7.0 1 8 9 10 11 12 13 14.0 2 15 16 17 18 19 20 21.0 3 22 23 24 25 26 27 28.0 4 29 30 31 32 33 34 35.0 5 36 37 38 39 40 41 42.0 6 43 44 45 46 47 48 NaN

相关问题更多 >

编程相关推荐

热门问题

热门文章