Pandas数据帧列标题到d的标签

Content A B C D E zxy 1 2 1 wvu 1 2 1 tsr 1 2 2 qpo 1 1 1 nml 2 2 kji 1 1 2 hgf 1 2 edc 1 2 1

3条回答

网友

1楼 · 编辑于 2024-09-26 17:59:36

完整解决方案：

# first: clear all whitespace before and after a char, fine for all columns
for col in df.columns:
    df[col] = df[col].str.strip()

# fill na with 0
df.fillna(0, inplace=True)

# replace '' with 0
df.replace('', 0, inplace=True)

# convert to int, this must only be done on the specific columns with the numeric data
# this list is the column names as you've presented them, if they are different in the real data,
# replace them
for col in ['A', 'B', 'C', 'D', 'E']:
    df = df.astype({col: 'int16'})

print(df.info())

# you should end up with something like this.
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 6 columns):
Content    8 non-null object
A          8 non-null int16
B          8 non-null int16
C          8 non-null int16
D          8 non-null int16
E          8 non-null int16
dtypes: int16(5), object(1)
memory usage: 272.0+ bytes
"""

我们可以做^{}，注意这里，我将空格视为np.nan，如果你的数据中是一个真正的空白，请更改最后一行

^{pr2}$

网友

2楼 · 编辑于 2024-09-26 17:59:36

您也可以按如下方式进行操作：

# melt the two dimensional representation to
# a more or less onedimensional representation
df_flat= df.melt(id_vars=['Content'])
# filter out all rows which belong to empty cells
# the following is a fail-safe method, that should
# work for all datatypes you might encouter in your
# columns
df_flat= df_flat[~df_flat['value'].isna() & df_flat['value'] != 0]
df_flat= df_flat[~df_flat['value'].astype('str').str.strip().isin(['', 'nan'])]
# join the variables used per original row
df_flat.groupby(['Content']).agg({'variable': lambda ser: ', '.join(ser)})

输出如下：

^{pr2}$

给出以下输入数据：

import pandas as pd
import io

raw="""idx Content  A  B  C  D  E          
0   zxy      1  2     1                    
1   wvu      1     2  1                  
2   tsr      1  2        2               
3   qpo         1  1  1                  
4   nml            2  2                      
5   kji      1     1     2               
6   hgf            1     2               
7   edc      1  2     1           """

df= pd.read_fwf(io.StringIO(raw))
df.drop(['idx'], axis='columns', inplace=True)

编辑：我刚刚在阅读完之后删除了'idx'，创建了一个类似于原始数据帧的结构，并添加了一些可以处理不同数据类型的故障保护代码（melt方法下面的两行）。如果对缺失值的实际表示方式了解得更多，代码就可以简化。在

网友

3楼 · 编辑于 2024-09-26 17:59:36

下面是另一种使用np.where和groupby的方法：

r, c = np.where(df>0)

df['Labels'] = pd.Series(df.columns[c], index=df.index[r]).groupby(level=[0, 1]).agg(', '.join)

输出：

^{pr2}$

完整解决方案：

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas数据帧列标题到d的标签

完整解决方案：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >