将数组列表转换为数据帧

2024-09-29 07:21:04 发布

您现在位置:Python中文网/ 问答频道 /正文

您好,我有一个数据集,如下所示:

array([['1;"Female";133;132;124;"118";"64.5";816932'],
       ['2;"Male";140;150;124;".";"72.5";1001121'],
       ['3;"Male";139;123;150;"143";"73.3";1038437'],
       ['4;"Male";133;129;128;"172";"68.8";965353'],
       ['5;"Female";137;132;134;"147";"65.0";951545'],
       ['6;"Female";99;90;110;"146";"69.0";928799'],
       ['7;"Female";138;136;131;"138";"64.5";991305']], dtype=object)

我想将is转换为具有以下列的数据帧

columns = ["Gender";"FSIQ";"VIQ";"PIQ";"Weight";"Height";"MRI_Count"]

NB:从数组列表中,行值分隔符是分号(;)。请帮助我将其组织到具有列名和数组中的行值的数据帧中


Tags: columns数据objectis数组genderarraymale
2条回答

为新列创建DataFrame^{}expand=True

a = np.array([['1;"Female";133;132;124;"118";"64.5";816932'],
       ['2;"Male";140;150;124;".";"72.5";1001121'],
       ['3;"Male";139;123;150;"143";"73.3";1038437'],
       ['4;"Male";133;129;128;"172";"68.8";965353'],
       ['5;"Female";137;132;134;"147";"65.0";951545'],
       ['6;"Female";99;90;110;"146";"69.0";928799'],
       ['7;"Female";138;136;131;"138";"64.5";991305']], dtype=object)

df = pd.DataFrame(a)[0].str.split(';', expand=True)
df.columns = ['ID',"Gender","FSIQ","VIQ","PIQ","Weight","Height","MRI_Count"]

最后一些数据清理-通过^{}删除"",并通过^{}^{}将列转换为数字:

df['Gender'] = df['Gender'].str.strip('"')
c = ["ID", "FSIQ","VIQ","PIQ","Weight","Height","MRI_Count"]
df[c] = df[c].apply(lambda x: pd.to_numeric(x.str.strip('"'), errors='coerce'))
print (df)
  ID  Gender  FSIQ  VIQ  PIQ  Weight  Height  MRI_Count
0  1  Female   133  132  124   118.0    64.5     816932
1  2    Male   140  150  124     NaN    72.5    1001121
2  3    Male   139  123  150   143.0    73.3    1038437
3  4    Male   133  129  128   172.0    68.8     965353
4  5  Female   137  132  134   147.0    65.0     951545
5  6  Female    99   90  110   146.0    69.0     928799
6  7  Female   138  136  131   138.0    64.5     991305

另一个可能的解决办法是使用^{}^{}。仅使用^{cd4>}字符^{}数组中的每个元素:

from io import StringIO

# Setup
a = np.array([['1;"Female";133;132;124;"118";"64.5";816932'],
       ['2;"Male";140;150;124;".";"72.5";1001121'],
       ['3;"Male";139;123;150;"143";"73.3";1038437'],
       ['4;"Male";133;129;128;"172";"68.8";965353'],
       ['5;"Female";137;132;134;"147";"65.0";951545'],
       ['6;"Female";99;90;110;"146";"69.0";928799'],
       ['7;"Female";138;136;131;"138";"64.5";991305']])

columns = ["Gender", "FSIQ", "VIQ", "PIQ", "Weight", "Height", "MRI_Count"]

df = pd.read_csv(StringIO('\n'.join(a.ravel())), header=None,
                 sep=';', names=columns, na_values=['.'])

[外]

   Gender  FSIQ  VIQ  PIQ  Weight  Height  MRI_Count
1  Female   133  132  124   118.0    64.5     816932
2    Male   140  150  124     NaN    72.5    1001121
3    Male   139  123  150   143.0    73.3    1038437
4    Male   133  129  128   172.0    68.8     965353
5  Female   137  132  134   147.0    65.0     951545
6  Female    99   90  110   146.0    69.0     928799
7  Female   138  136  131   138.0    64.5     991305

pandas应该能很好地解释数据类型

print(df.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 1 to 7
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
 -                     -  
 0   Gender     7 non-null      object 
 1   FSIQ       7 non-null      int64  
 2   VIQ        7 non-null      int64  
 3   PIQ        7 non-null      int64  
 4   Weight     6 non-null      float64
 5   Height     7 non-null      float64
 6   MRI_Count  7 non-null      int64  
dtypes: float64(2), int64(4), object(1)
memory usage: 448.0+ bytes

相关问题 更多 >