Python:如何在CSV文件中填充缺失的值?

2024-09-28 22:21:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有CSV数据,必须用Python分析。数据中有一些丢失的值。数据样本如下:

样本

ID,ID_TYPE,OB_DATE,VERSION_NUM,MET_DOMAIN_NAME,OB_END_CTIME,OB_DAY_CNT,SRC_ID,REC_ST_IND,PRCP_AMT,OB_DAY_CNT_Q,PRCP_AMT_Q,METO_STMP_TIME,MIDAS_STMP_ETIME,PRCP_AMT_J
90, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,24109,1011,0,0,6, 2006-01-17 09:04,0,
150, RAIN, 2006-01-01 00:00,1, DLY3208,900,1,30747,1011,0,0,6, 2006-01-09 13:21,3,
174, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,24775,1011,0.2,0,6, 2006-01-17 09:04,0,
498, RAIN, 2006-01-01 00:00,0, WADRAIN,900,1,1622,1012,0.1,0,1, 2006-01-17 09:04,0,
498, RAIN,,1, WADRAIN,900,31,1622,1022,58.3,0,22576, 2006-03-15 11:41,0,
898, RAIN, 2006-01-01 00:00,0, WADRAIN,900,6,1624,1012,18.5,0,20001,,0,
898, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,1624,1022,0.4,0,2576, 2006-03-15 11:41,0,
996, RAIN, 2006-01-01 00:00,1, WAMRAIN,900,31,24953,1011,53.5,0,6, 2006-01-31 13:51,0,
997, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,24953,1011,1.6,0,6, 2006-02-02 12:28,0,
1045, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,1628,1011,1.1,0,6, 2006-01-17 09:04,0,
1103, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,24772,1011,2.5,0,6, 2006-01-17 09:04,0,
1358, RAIN, 2006-01-01 00:00,0, WADRAIN,900,11,1633,1012,17.7,0,20001,,0,
1358, RAIN,,1, WADRAIN,900,31,1633,1022,42.5,0,22576, 2006-03-15 11:41,0,
1545, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,1636,1011,2,0,6, 2006-01-17 09:04,0,
1584, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,315,1014,2.4,0,2306, 2006-03-15 11:41,0,
1858, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,1645,1011,0.2,0,6, 2006-01-17 09:04,0,
2247, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,24781,1011,0.5,0,6, 2006-01-17 09:04,0,
3066, RAIN,,1, WADRAIN,900,1,1655,1011,0.6,0,6, 2006-02-02 12:28,0,
3067, RAIN, 2006-01-01 00:00,0, WADRAIN,900,7,1655,1012,11,0,20001, 2006-01-26 15:08,0,
3067, RAIN, 2006-01-01 00:00,1, WADRAIN,900,31,1655,1022,57.5,0,22576, 2006-03-15 11:41,0,
3507, RAIN, 2006-01-01 00:00,0, WADRAIN,900,2,1657,1012,15.8,0,20001,,0,
3507, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,1657,1022,0.9,0,2576, 2006-04-13 13:28,0,
4802, RAIN,,0, WADRAIN,900,6,1663,1012,18,0,20001, 2006-01-17 09:04,0,
4802, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,1663,1022,0.9,0,2576, 2006-03-15 11:41,0,
4941, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,1664,1011,0.5,0,6, 2006-01-17 09:04,1,
4942, RAIN, 2006-01-01 00:00,1, WADRAIN,900,1,1664,1011,1.2,0,6, 2006-02-02 12:28,0,

数据中有一些缺失的OB_DATEMETO_STMP_TIME,我想在这些字段中填充缺失的值。在

这里的基本问题是:

  1. 缺失值的插补是什么?我们可以用什么方法来做?在

我在谷歌上搜索了很多,但我不清楚归责的概念。在

  1. 在不使用任何外部库的情况下,我们如何在Python中实现它?在

如果使用外部库,那么就可以了,但这是不使用任何外部库的一种可能方法。在


Tags: 数据方法iddatetime样本daycnt
1条回答
网友
1楼 · 发布于 2024-09-28 22:21:27

我是一个初学者,我希望这有帮助!在

import pandas as pd
dataset=pd.read_csv('filename/path')
from sklearn.preprocessing import Imputer
imputer=Imputer(missing_values='Nan',strategy='mean',axis=0)
X=dataset.iloc[:,2].values
Y=dataset.iloc[:,-3].values
#lets do second column first
imputer=imputer.fit(X[:,2])
X[:,2]=imputer.transform(X[:,2])
# third last column
imputer=imputer.fit(Y[:,-3])
Y[:,-3]=imputer.transform(Y[:,-3])

相关问题 更多 >