pandas中的Concatenate和shift列适用

2024-09-29 00:15:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我得到了一个数据框,看起来像下面的示例

col_a col_b col_c col_d extra1 extra2 extra3
a     a     a     a     b      c      d 
a     a     a     a     b      c      d 
a     a     a     b     c      d      Nan 
a     a     a     b     c      d      Nan 
a     a     b     c     d      Nan    Nan
a     a     b     c     d      Nan    Nan  
a     b     c     d     Nan    Nan    Nan
a     b     c     d     Nan    Nan    Nan 

我得把它变成这样:

^{pr2}$

因此,根据NaN的位置(extra1 2或3),我总是必须移动conining NaN列之前的最后3个col,并将前面的列连接到col_a中


Tags: 数据示例colnanpr2extra2extra1extra3
3条回答

使用:

#if necessary convert string `Nan` to missing values
df = df.replace('Nan', np.nan)

df = df.apply(lambda x: x.shift(x.isnull().sum()), axis=1)
print (df)
  col_a col_b col_c col_d extra1 extra2 extra3
0     a     a     a     a      b      c      d
1     a     a     a     a      b      c      d
2   NaN     a     a     a      b      c      d
3   NaN     a     a     a      b      c      d
4   NaN   NaN     a     a      b      c      d
5   NaN   NaN     a     a      b      c      d
6   NaN   NaN   NaN     a      b      c      d
7   NaN   NaN   NaN     a      b      c      d

df1 = df.iloc[:, -3:]
df1.insert(0, 'a', df.iloc[:, :-3].add(' ').fillna('').sum(axis=1))
df1.columns = df.columns[:4]
print (df1)
      col_a col_b col_c col_d
0  a a a a      b     c     d
1  a a a a      b     c     d
2    a a a      b     c     d
3    a a a      b     c     d
4      a a      b     c     d
5      a a      b     c     d
6        a      b     c     d
7        a      b     c     d

您需要:

temp = df[['col_a','col_b','col_c','col_d']].eq("a").sum(axis=1)
print(temp)

v = []
for i in temp:
    a_col =  "a"*i
    v.append(a_col)

df['col_a'] = v
df['col_b'] = 'b'
df['col_c'] = 'c'
df['col_d'] = 'd'

df.drop(['ex_1','ex_2','ex_3'],1,inplace=True)
print(df)

输出:

^{pr2}$

您可以使用itertoolsgroupby,这对于分组任务来说很常见。但是,这将使用一个循环(理解),这可能会影响有效性。在

df = pd.DataFrame(
    data = [[' '.join(g) for k,g in groupby(row) if k] for row in df.fillna('').values],
    columns = df.columns[:4]
)

完整示例:

^{pr2}$

退货:

     col_a col_b col_c col_d
0  a a a a     b     c     d
1  a a a a     b     c     d
2    a a a     b     c     d
3    a a a     b     c     d
4      a a     b     c     d
5      a a     b     c     d
6        a     b     c     d
7        a     b     c     d

相关问题 更多 >