使用Pandas分析不同列上的日期和julian day

0 1 2 3 4 5 6 7 8 9 10 11 0 42 2012 106 1200 -0.325 0.576 -0.295 31.73 14.80 1096 99.3 55 1 42 2012 106 1200 -0.372 0.499 -0.236 31.74 14.80 1097 99.3 56 2 42 2012 106 1200 -0.372 0.456 -0.212 31.74 14.80 1096 99.3 57 3 42 2012 106 1200 -0.312 0.736 -0.095 31.70 14.81 1097 99.3 58 4 42 2012 106 1200 -0.352 0.707 -0.035 31.66 14.78 1094 99.3 59 5 42 2012 106 1200 -0.518 0.662 -0.152 31.66 14.79 1094 99.3 60 6 42 2012 106 1200 -0.516 0.531 -0.249 31.78 14.79 1094 99.3 61

0 4 5 6 7 8 9 10 11 1_2_3 2012 106 1200 42 -0.325 0.576 -0.295 31.73 14.80 1096 99.3 55 2012 106 1200 42 -0.372 0.499 -0.236 31.74 14.80 1097 99.3 56 2012 106 1200 42 -0.372 0.456 -0.212 31.74 14.80 1096 99.3 57 2012 106 1200 42 -0.312 0.736 -0.095 31.70 14.81 1097 99.3 58 2012 106 1200 42 -0.352 0.707 -0.035 31.66 14.78 1094 99.3 59 2012 106 1200 42 -0.518 0.662 -0.152 31.66 14.79 1094 99.3 60 2012 106 1200 42 -0.516 0.531 -0.249 31.78 14.79 1094 99.3 61

1条回答

网友

1楼 · 发布于 2024-10-05 12:16:55

像那样使用dateparser将非常低效。除非您有ISO日期，通常最好在之后解析。在

但在这里。关键在于date_parser接受的参数数目与您传递的参数数目相同（在本例中为3）。在

In [12]: dateparse = lambda a,b,c: datetime.datetime.strptime(' '.join([a,b,c]), '%Y %j %H%M')

In [13]: pd.read_csv(StringIO(data), 
     header=None, 
     parse_dates=[[2,3,4]], 
     sep='\s+',skiprows=1, 
     date_parser=dateparse)
Out[13]: 
                2_3_4  0   1      5      6      7      8      9    10    11  12
0 2012-04-15 12:00:00  0  42 -0.325  0.576 -0.295  31.73  14.80  1096  99.3  55
1 2012-04-15 12:00:00  1  42 -0.372  0.499 -0.236  31.74  14.80  1097  99.3  56
2 2012-04-15 12:00:00  2  42 -0.372  0.456 -0.212  31.74  14.80  1096  99.3  57
3 2012-04-15 12:00:00  3  42 -0.312  0.736 -0.095  31.70  14.81  1097  99.3  58
4 2012-04-15 12:00:00  4  42 -0.352  0.707 -0.035  31.66  14.78  1094  99.3  59
5 2012-04-15 12:00:00  5  42 -0.518  0.662 -0.152  31.66  14.79  1094  99.3  60
6 2012-04-15 12:00:00  6  42 -0.516  0.531 -0.249  31.78  14.79  1094  99.3  61

这里还有一些方法

^{pr2}$

强制返回字符串，联接并解析。在

In [47]: pd.to_datetime(df['1'].astype(str) + ' ' + df['2'].astype(str) + ' ' + df['3'].astype(str), format='%Y %j %H%M')
Out[47]: 
0   2012-04-15 12:00:00
1   2012-04-15 12:00:00
2   2012-04-15 12:00:00
3   2012-04-15 12:00:00
4   2012-04-15 12:00:00
5   2012-04-15 12:00:00
6   2012-04-15 12:00:00
dtype: datetime64[ns]

另一种方法。在

In [48]: pd.to_datetime(df['1'],format='%Y') + pd.to_timedelta(df['2'],unit='d') + pd.to_timedelta(df['3']/100,unit='h') + pd.to_timedelta(df['3']%100,unit='m') - Timedelta('1d')
Out[48]: 
0   2012-04-15 12:00:00
1   2012-04-15 12:00:00
2   2012-04-15 12:00:00
3   2012-04-15 12:00:00
4   2012-04-15 12:00:00
5   2012-04-15 12:00:00
6   2012-04-15 12:00:00
dtype: datetime64[ns]

相关问题更多 >

编程相关推荐

热门问题

热门文章