Python Pandas:使用某个字段的函数创建数据帧

2024-09-30 20:18:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图创建一个数据帧,其中一个字段是用函数计算的。为此,我使用以下代码:

import pandas as pd

def didSurvive(sex):
    return int(sex == "female")


titanic_df = pd.read_csv("test.csv")
submission = pd.DataFrame({
    "PassengerId": titanic_df["PassengerId"],
    "Survived": didSurvive(titanic_df["Sex"])
})
submission.to_csv('titanic-predictions.csv', index=False)

运行此代码时,出现以下错误:

D:\Documents\kaggle\titanic>python predictor.py
File "predictor.py", line 3
def didSurvive() {
^
SyntaxError: invalid syntax
D:\Documents\kaggle\titanic>python predictor.py
D:\Documents\kaggle\titanic>python predictor.py
D:\Documents\kaggle\titanic>python predictor.py
Traceback (most recent call last):
File "predictor.py", line 10, in
"Survived": didSurvive(titanic_df["Sex"])
File "predictor.py", line 4, in didSurvive
return int(sex == "female")
File "C:\Python34\lib\site-packages\pandas\core\series.py", line 92, in wrapper
"{0}".format(str(converter)))
TypeError: cannot convert the series to
D:\Documents\kaggle\titanic>

我想我正在尝试对一系列布尔函数而不是单个布尔函数运行int()。我该怎么解决这个问题?在


Tags: csv函数inpydflinepredictordocuments
2条回答

要转换序列的数据类型,可以使用astype()函数,这应该可以:

def didSurvive(sex):
    return (sex == "female").astype(int)

也可以在从csv文件导入期间重新格式化数据

titanic_df = pd.read_csv("test.csv", converters={'Sex':didSurvive})
submission = pd.DataFrame(titanic_df, columns=['PassengerId', 'Sex'])

相关问题 更多 >