使用dict拆分datafram时出错

2024-09-27 21:23:40 发布

您现在位置:Python中文网/ 问答频道 /正文

培训数据如下:

p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g
e,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,n,g
e,b,s,w,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,n,m    

第一栏是关于这种蘑菇是否可食用的标签 我想把这些数据按食用与否分成两部分。 我的代码如下:

mushdf = pd.read_csv('agaricus-lepiota.data') #load in two data for mushroom and iris
mushdf.columns = ['edible?','cap-shape','cap-surface','cap-color','bruises?','odor',
                    'gill-attachment','gill-spacing','gill-size','gill-color',
                    'stalk-shape','stalk-root','stalk-surface-above-ring','stalk-surface-below-ring',
                    'stalk-color-above-ring','stalk-color-below-ring','veil-type','veil-color',
                    'ring-number','ring-type','spore-print-color','population','habitat']
print(mushdf)
mushdic = {key: mushdf for (key, mushdf) in mushdf.groupby('edible?')}
for key in mushdic:
    print(f'mushdic[{key}]')
    print(mushdic[key])
    print('-'*50)

问题是,当我删除第2行到第6行中的mushdf.columns时,这段代码可以工作。但是,当我执行mushdf.columns操作时,终端返回错误消息

同样的方法和另一列是好的。例如,mushdic = {key: mushdf for (key, mushdf) in mushdf.groupby('bruises?')}运行正常

我对此一无所知

Traceback (most recent call last):
  File "e:\Visual Studio Project\LiMing\vs2017_python\.vscode\helloworld.py", line 11, in <module>
    mushdic = {key: mushdf for (key, mushdf) in mushdf.groupby('edible?')}
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\generic.py", line 7894, in groupby
    **kwargs
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\groupby\groupby.py", line 2522, in groupby
    return klass(obj, by, **kwds)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\groupby\groupby.py", line 391, in __init__
    mutated=self.mutated,
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\pandas\core\groupby\grouper.py", line 621, in _get_grouper
    raise KeyError(gpr)
KeyError: 'edible?'
The terminal process terminated with exit code: 1

Tags: keyinpyforlinefilecolorprint
1条回答
网友
1楼 · 发布于 2024-09-27 21:23:40

^{}表示csv文件中的第一行是头。由于您的csv文件没有头文件,您需要在导入过程中说明这一点。您还应该在此处传递列名:

mushdf = pd.read_csv('agaricus-lepiota.data', header=None, names=[
                'edible?','cap-shape','cap-surface','cap-color','bruises?','odor',
                'gill-attachment','gill-spacing','gill-size','gill-color',
                'stalk-shape','stalk-root','stalk-surface-above-ring','stalk-surface-below-ring',
                'stalk-color-above-ring','stalk-color-below-ring','veil-type','veil-color',
                'ring-number','ring-type','spore-print-color','population','habitat'])

相关问题 更多 >

    热门问题