因此,总结如下:
我有两个数据帧在同一日历年,但不同的时间平均每天。DF1每分钟一次,DF2每天一次
指令
我现在正在考虑使用一个“days-since”标准days=1,2,3…我将DF2中的所有列放入DF1中,并在当天的每一行中重复该日期,然后移到第二天
下面是将每小时数据与每分钟数据关联起来的代码示例
选项卡分隔示例:
Date Pressure Temperature Salinity Density
9/12/2014 20.67517553 9.467564621 34.75207884 1026.945064
9/13/2014 20.50534192 9.081091137 34.77935638 1027.028736
我希望将此关联到:
Date seawater_pressure seawater_temperature seawater_conductivity practical_salinity density lat lon Years
9/12/2014 0:00 177.859887 4.574663842 3.307338475 34.90723316 1028.476924 59.97533 -39.48183 2014.697222
9/12/2014 0:03 214.3598333 4.397781667 3.292384278 34.89887436 1028.659543 59.97533 -39.48183 2014.697222
9/12/2014 0:06 264.5863333 4.208137222 3.276747278 34.88825043 1028.905126 59.97533 -39.48183 2014.697222
9/12/2014 0:09 314.3161111 4.1242 3.271341056 34.88661059 1029.14336 59.97533 -39.48183 2014.697222
9/13/2014 21:00 2608.358764 1.83854382 3.163967753 34.87050646 1039.841076 59.97533 -39.48183 2014.7
9/13/2014 21:03 2571.051778 2.073876111 3.1833685 34.87381988 1039.643173 59.97533 -39.48183 2014.7
9/13/2014 21:06 2520.0315 2.334582222 3.204682722 34.87920389 1039.3818 59.97533 -39.48183 2014.7
9/13/2014 21:09 2469.559944 2.569326667 3.224910833 34.89808956 1039.136967 59.97533 -39.48183 2014.7
9/13/2014 21:12 2419.662011 2.67413743 3.23247419 34.90147888 1038.90175 59.97533 -39.48183 2014.7
我有很多数据。下面的代码是混乱的分裂,但我认为,天,因为过滤器将工作得更好
import os
import xarray as xr
import pandas as pd
import re, time, random
data = '''Date pressure temperature density
9/12/2014 9:00 170 4.0 1028
9/12/2014 10:00 368 4.2 1028.5
9/12/2014 11:00 368 4.2 1028.5'''
da = [[i for i in re.split("[ ][ ]+", l)] for l in data.split("\n")]
df2 = pd.DataFrame(da[1:], columns=da[0])
data='''Date pressure temperature density
9/12/2014 9:00 177.859887 4.574663842 1028.477
9/12/2014 9:01 214.3598333 4.397781667 1028.66
9/12/2014 9:55 264.5863333 4.208137222 1028.905
9/12/2014 10:01 314.3161111 4.1242 1029.143
9/12/2014 10:02 363.8005587 4.02983352 1029.377'''
da = [[i for i in re.split("[ ][ ]+", l)] for l in data.split("\n")]
df1 = pd.DataFrame(da[1:], columns=da[0])
df1.index = pd.to_datetime(df1.Date, format="%d/%m/%Y %H:%M", utc=True)
df2.index = pd.to_datetime(df2.Date, format="%d/%m/%Y %H:%M", utc=True)
df3 = df1.join(df2.resample("1min").pad(), rsuffix="_hourly")
df3
这并不完全是你想要的,但它可能会有所帮助
输出
相关问题 更多 >
编程相关推荐