使用不同长度的行重塑数据帧

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 A17 a b 1 AUG) NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN nn6 c d 2 POS) e f 2 Hi) AZV NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN JFK a b 4 UUI) c v 8 Yo) t f 9 po)

3条回答

网友

1楼 · 编辑于 2024-06-13 20:40:09

还有其他选择，如切片列和追加，但这非常简单

output = []
for index, row in df.iterrows():
    r = row.dropna().values
    if len(r) <= 4:
        output.append([index,*r])
    else:
        for x in np.reshape(r, (int(len(r)/4),4)):
            output.append([index,*x])
            
pd.DataFrame(output).set_index(0)

网友

2楼 · 编辑于 2024-06-13 20:40:09

另一个不需要对行进行迭代的选项（如果有很多行，则迭代速度可能非常慢）是执行以下操作

[ins] In [1]: df
Out[1]: 
     0    1    2     3    4    5    6    7
A17  a    b    1  AUG)  NaN  NaN  NaN  NaN
nn6  c    d    2  POS)    e    f    2  HI)
AVZ     NaN  NaN   NaN  NaN  NaN  NaN  NaN

[ins] In [2]: joined = df.apply(lambda x: ' '.join([str(xi) for xi in x]), axis=1)
[ins] In [4]: split = joined.str.split(')', expand=True).reset_index(drop=False).melt(id_vars='index')

[ins] In [6]: split.drop('variable', axis=1, inplace=True)

[ins] In [7]: split
Out[7]: 
  index                        value
0   A17                    a b 1 AUG
1   nn6                    c d 2 POS
2   AVZ  nan nan nan nan nan nan nan
3   A17              nan nan nan nan
4   nn6                     e f 2 HI
5   AVZ                         None
6   A17                         None
7   nn6                             
8   AVZ                         None

[ins] In [8]: sel = split['value'].str.strip().str.len() > 0

[ins] In [9]: split = split.loc[sel, :]

[ins] In [9]: split
Out[9]: 
  index                        value
0   A17                    a b 1 AUG
1   nn6                    c d 2 POS
2   AVZ  nan nan nan nan nan nan nan
3   A17              nan nan nan nan
4   nn6                     e f 2 HI

[ins] In [10]: out = split['value'].str.strip().str.split(' ', expand=True)

[ins] In [11]: out.index = split['index']

[ins] In [12]: out
Out[12]: 
         0    1    2    3     4     5     6
index                                      
A17      a    b    1  AUG  None  None  None
nn6      c    d    2  POS  None  None  None
AVZ    nan  nan  nan  nan   nan   nan   nan
A17    nan  nan  nan  nan  None  None  None
nn6      e    f    2   HI  None  None  None

然后把第4列放到第6列，这很简单。我添加了一些输出，以便您可以看到每个步骤中发生了什么

网友

3楼 · 编辑于 2024-06-13 20:40:09

我认为一种有效的方法是将数据帧分成4个相等的部分，然后沿着索引重新合并

这里的问题是我们可以在concat语句中动态重命名的列名

import numpy as np 
lst = np.array_split([i for i in range(len(df.columns))],4)

[array([0, 1, 2, 3]),
 array([4, 5, 6, 7]),
 array([ 8,  9, 10, 11]),
 array([12, 13, 14])]

dfs = pd.concat( [
        df.iloc[:,i].rename(columns=
                            dict(zip(df.iloc[:,i].columns,range(4)))
                            )
    
        for i in lst
    ]).dropna(how='all')

 print(dfs)

 0  1    2     3
A17  a  b  1.0  AUG)
nn6  c  d  2.0  POS)
JFK  a  b  4.0  UUI)
nn6  e  f  2.0   Hi)
JFK  c  v  8.0   Yo)
JFK  t  f  9.0   po)

这里唯一的区别是，由于是na，您缺少所需输出中的一行

我们可以使用combine_first进行联合，以获得两个数据帧之间的增量

dfs = dfs.combine_first(df.iloc[:,:0])

print(dfs)

       0    1    2     3
A17    a    b  1.0  AUG)
AZV  NaN  NaN  NaN   NaN
JFK    a    b  4.0  UUI)
JFK    c    v  8.0   Yo)
JFK    t    f  9.0   po)
nn6    c    d  2.0  POS)
nn6    e    f  2.0   Hi)

相关问题更多 >

编程相关推荐

热门问题

热门文章