需要将pandas dataframe列中的可变长度数据拆分为多个列

2024-09-29 17:16:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两列数据帧,如下所示:

ITEM        REFNUMS
1   00000299    0036701923024762922029229294652954429569295832...
2   00000655    NaN
24  00001791    00016027123076000158004563065131972
25  00001805    00016027123076000158004563065131972
26  00001813    00016027123076000158004563065131972
27  00001821    00016027123076000158004563065131972
28  00001937    0142530521316303164702509000510012201310027820...

我想将REFNUMS列拆分为可分割的部分,并在可能的情况下添加到现有的数据帧中,因为我需要保留行索引和匹配项。当不是NaN时,REFNUMS中的数据是可被5整除的长度,因此例如第1行是78组5。在

^{pr2}$

那么

0         NaN
1        78.0
2         NaN

感谢您对如何做到这一点的任何建议。在


Tags: 数据情况nanitem建议pr2行是refnums
1条回答
网友
1楼 · 发布于 2024-09-29 17:16:31

IIUC,您可以使用str.extractall获取5位数的组,清理列,然后连接:

In [168]: r = df.REFNUMS.str.extractall("(\d{1,5})").unstack()

In [169]: r.columns = r.columns.droplevel(0)

In [170]: df.join(r)
Out[170]: 
    ITEM                                            REFNUMS      0      1      2      3      4      5      6      7      8     9
1    299  0036701923024762922029229294652954429569295832...  00367  01923  02476  29220  29229  29465  29544  29569  29583     2
2    655                                                NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN
24  1791                00016027123076000158004563065131972  00016  02712  30760  00158  00456  30651  31972   None   None  None
25  1805                00016027123076000158004563065131972  00016  02712  30760  00158  00456  30651  31972   None   None  None
26  1813                00016027123076000158004563065131972  00016  02712  30760  00158  00456  30651  31972   None   None  None
27  1821                00016027123076000158004563065131972  00016  02712  30760  00158  00456  30651  31972   None   None  None
28  1937  0142530521316303164702509000510012201310027820...  01425  30521  31630  31647  02509  00051  00122  01310  02782     0

相关问题 更多 >

    热门问题