我正在尝试创建一个从github url绘制的数据帧。然后,数据帧将github文件中的Age列排序到新的数据帧中,Age_12列的Age值介于(1-12)之间,Age_TEEN列的Age值介于(13-19)之间。但是,当我将代表年龄12和年龄青少年值的数据分配给新数据框中的列时,我最终得到了它们的NaN值?我试着切换列的位置,年龄12有时会产生正确的值,但另一个不会,反之亦然
这是我的密码:
#Reads url for Github
url = 'https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/titanic/train.csv'
#Creates dataframe from Raw Github Link
data = pd.read_csv(url, error_bad_lines=False)
AGE_12 = data[data['Age'].between(1,12)]
AGE_TEEN = data[data['Age'].between(13,19)]
pasUpto19 = pd.DataFrame()
pasUpto19 = pasUpto19.assign(PCLASS=data['Pclass'],AGE_12=AGE_12['Age'],AGE_TEEN=AGE_TEEN['Age'])
print(pasUpto19)
它的输出是:
PCLASS AGE_12 AGE_TEEN
0 3 NaN NaN
1 1 NaN NaN
2 3 NaN NaN
3 1 NaN NaN
4 3 NaN NaN
.. ... ... ...
886 2 NaN NaN
887 1 NaN 19.0
888 3 NaN NaN
889 1 NaN NaN
890 3 NaN NaN
如果我做了一些愚蠢的事情,请提前道歉,我对python和使用熊猫非常陌生
pasUpto19 = pasUpto19.dropna(axis=0, how='all')
将从新数据帧中删除所有nan值相关问题 更多 >
编程相关推荐