将组标题数据移动到行中并删除标题行

2024-09-30 06:24:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个csv,其中包含如下产品数据:

Item,Val1,Val2,Val3,Val4,Val5  
SomeProductName1,,,,,  
SomeProductDetails1,,,,,  
ProductGroupHeader1,,,,,  
ProductInfo1,39,8,6,94,112  
ProductInfo2,32,7,4,94,112  
ProductGroupHeader2,,,,,  
ProductInfo3,39,8,6,94,112  
ProductInfo4,32,7,4,94,112  
SomeProductName2,,,,,  
SomeProductDetails2,,,,,    
ProductGroupHeader21,,,,,  
ProductInfo21,39,8,6,94,112  
ProductInfo22,32,7,4,94,112  
ProductGroupHeader2,,,,,  
ProductInfo23,39,8,6,94,112  
ProductInfo24,32,7,4,94,112  

我需要它,因为:

Item,Val1,Val2,Val3,Val4,Val5  
SomeProductName1, SomeProductDetails1, ProductGroupHeader1,,,,,  
SomeProductName1, SomeProductDetails1, ProductInfo1,39,8,6,94,112  
SomeProductName1, SomeProductDetails1, ProductInfo2,32,7,4,94,112  
SomeProductName1, SomeProductDetails1, ProductGroupHeader2,,,,,  
SomeProductName1, SomeProductDetails1, ProductInfo3,39,8,6,94,112  
SomeProductName1, SomeProductDetails1, ProductInfo4,32,7,4,94,112  
SomeProductName2, SomeProductDetails2, ProductGroupHeader21,,,,,  
SomeProductName2, SomeProductDetails2, ProductInfo21,39,8,6,94,112  
SomeProductName2, SomeProductDetails2, ProductInfo22,32,7,4,94,112  
SomeProductName2, SomeProductDetails2, ProductGroupHeader2,,,,,  
SomeProductName2, SomeProductDetails2, ProductInfo23,39,8,6,94,112  
SomeProductName2, SomeProductDetails2, ProductInfo24,32,7,4,94,112  

本质上,我想从它们各自的行中获取SomeProductNameSomeProductDetails,删除这些行,然后在ProductInfo行中添加两列值

csv有几千行,我最初的想法是根据需要循环更新和删除行

然后,我打算基于ProductName和可能加上ProductDetails来透视数据

我不熟悉熊猫和Python,只是想知道是否有更简单/更有效的方法


Tags: csv数据itemval1val2val3val4val5
1条回答
网友
1楼 · 发布于 2024-09-30 06:24:21

为了满足您的预期输出,您可以使用掩码来实现,其中所有值都是nan,带有filterisna。假设结构严格,可以使用shift查找名称和详细信息行。然后concat使用whereffill创建的名称和详细信息列来df,并仅选择所需的行

#get the rows with nan in all values columns
m = df.filter(like='Val').isna().all(1)
# get the rows with ProductName, it is where 
# all val are nan and also where all val are nan two rows later (GroupHeader rows)
name = m&m.shift(-2)
# get the rows with ProductDetails, it is where 
# all val are nan the row before (ProductName rows) 
# and also all val are nan one row later (GroupHeader rows)
details = m & m.shift(-1) & m.shift(1)

# you can create the dataframe wth concat, 
# use where to and ffill to keep name and details on followinf rows
df_ = (pd.concat([df['Item'].where(name).ffill().rename('Item_name'), 
                  df['Item'].where(details).ffill().rename('Item_details'), 
                  df], 
                 axis=1)
          [~(name|details)] #remove rows with only name and details
      )

你得到了什么

print (df_)
           Item_name         Item_product                  Item  Val1  Val2  \
2   SomeProductName1  SomeProductDetails1   ProductGroupHeader1   NaN   NaN   
3   SomeProductName1  SomeProductDetails1          ProductInfo1  39.0   8.0   
4   SomeProductName1  SomeProductDetails1          ProductInfo2  32.0   7.0   
5   SomeProductName1  SomeProductDetails1   ProductGroupHeader2   NaN   NaN   
6   SomeProductName1  SomeProductDetails1          ProductInfo3  39.0   8.0   
7   SomeProductName1  SomeProductDetails1          ProductInfo4  32.0   7.0   
10  SomeProductName2  SomeProductDetails2  ProductGroupHeader21   NaN   NaN   
11  SomeProductName2  SomeProductDetails2         ProductInfo21  39.0   8.0   
12  SomeProductName2  SomeProductDetails2         ProductInfo22  32.0   7.0   
13  SomeProductName2  SomeProductDetails2   ProductGroupHeader2   NaN   NaN   
14  SomeProductName2  SomeProductDetails2         ProductInfo23  39.0   8.0   
15  SomeProductName2  SomeProductDetails2         ProductInfo24  32.0   7.0   

    Val3  Val4   Val5  
2    NaN   NaN    NaN  
3    6.0  94.0  112.0  
4    4.0  94.0  112.0  
5    NaN   NaN    NaN  
6    6.0  94.0  112.0  
7    4.0  94.0  112.0  
10   NaN   NaN    NaN  
11   6.0  94.0  112.0  
12   4.0  94.0  112.0  
13   NaN   NaN    NaN  
14   6.0  94.0  112.0  
15   4.0  94.0  112.0  

编辑,要将groupheader添加为列,可以创建一个类似的掩码,然后在concat中以相同的方式使用它:

#rows where all values are nan but not next row
groupHeader = m&(~m).shift(-1)

df_ = (pd.concat([df['Item'].where(name).ffill().rename('Item_name'), 
                  df['Item'].where(details).ffill().rename('Item_details'), 
                  df['Item'].where(groupHeader).ffill().rename('Item_group'), #add this
                  df], 
                 axis=1)
          [~(name|details|groupHeader)] #remove also the rows with groupHeader only
      )

相关问题 更多 >

    热门问题