基于输入行的长\u到宽\u方法

name no city tr1_0 tr2_0 tr3_0 tr1_1 tr2_1 tr3_1 tr1_2 tr2_2 tr3_2 John 11 edi boa 51 110 cof 52 220 Rick 12 new cof 61 100 dcu 61 750 Mat t1 nyc

name no city tr1 tr3 tr2 0 John 11 edi boa 110 51 1 John 11 edi cof 220 52 2 Rick 12 new cof 100 61 3 Rick 12 new dcu 750 61 4 Matt 13 wil nan nan nan

df1 = pd.read_fwf(inputFileName, widths=widths, names=names, dtype=str, index_col=False ) feature_models = [col for col in df1.columns if re.match("tr[0-9]_[0-9]",col) is not None] features = list(set([ re.sub("_[0-9]","",feature_model) for feature_model in feature_models])) ub("_[0-9]","",feature_model) for feature_model in feature_models])) df1 = pd.wide_to_long(df1,i=['name', 'no', df1 = pd.wide_to_long(df1,i=['name', 'no', 'city',],j='ModelID',stubnames=features,sep="_")

name no city tr1 tr3 tr2 0 John 11 edi boa 110 51 . 1 John 11 edi cof 220 52 . 2 John 11 edi nan nan nan . 3 Rick 12 new cof 100 61 . 4 Rick 12 new dcu 750 61 . 5 Rick 12 new nan nan nan . 6 Matt 13 wil nan nan nan .

1条回答

网友

1楼 · 发布于 2024-09-30 14:38:26

您可以将此替代解决方案与^{}和^{}一起使用：

df1 = df1.set_index(['name', 'no', 'city'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(1, dropna=False).reset_index(level=3, drop=True)

mask = df1.index.duplicated() & df1.isnull().all(axis=1)

df1 = df1[~mask].reset_index()
print (df1)
   name  no city  tr1   tr2    tr3
0  John  11  edi  boa  51.0  110.0
1  John  11  edi  cof  52.0  220.0
2  Rick  12  new  cof  61.0  100.0
3  Rick  12  new  dcu  61.0  750.0
4   Mat  t1  nyc  NaN   NaN    NaN

使用您的解决方案：

df1 = pd.wide_to_long(df1,i=['name', 'no', 'city'],j='ModelID',stubnames=features,sep="_")

对于具有重复MultiIndex值的remove NaN，可以使用^{}过滤：

#remove counting level
df1 = df1.reset_index(level=3, drop=True)
mask = df1.index.duplicated() & df1.isnull().all(axis=1)
df1 = df1[~mask].reset_index()

详细信息：

通过^{}检查重复：

print (df1.index.duplicated())
[False  True False  True False  True]

然后按^{}检查每行的所有True值：

print (df1.isnull().all(axis=1))
name  no  city
John  11  edi     False
          edi     False
Rick  12  new     False
          new     False
Mat   t1  nyc      True
          nyc      True
dtype: bool

为bitwise AND按&链：

mask = df1.index.duplicated() & df1.isnull().all(axis=1)
print (mask)
name  no  city
John  11  edi     False
          edi     False
Rick  12  new     False
          new     False
Mat   t1  nyc     False
          nyc      True
dtype: bool

通过~反转布尔掩码：

print (~mask)
name  no  city
John  11  edi      True
          edi      True
Rick  12  new      True
          new      True
Mat   t1  nyc      True
          nyc     False
dtype: bool

相关问题更多 >

编程相关推荐

热门问题

热门文章