我有一个天气数据文件,其中有高温,低温,降雨等,我需要打开该文件,并返回数据的基础上从用户输入的年范围。用户输入开始日期和结束日期,然后我将这些数据放入一个列表中,用户可以在年度范围的数据子列表中搜索最高(高温)或最低温度(低温)或最高降雨量(PRCP)。目前我可以搜索字符串,但不知道如何识别高温,例如,在子列表中收集高温,然后找到最高的,然后返回数据。低温和降雨也一样
到目前为止,我掌握的情况如下:
def openFile():
begin = input("Enter your starting year in this format YYYY ")
end = input("Enter your ending year for weather data in this format YYYY ")
lines = tuple(open('/Users/jasontt/test/spokaneweatherdata.txt', 'r'))
#print(lines)
print("")
#print(lines[1])
print("")
result = [i for i in lines if str(begin) in i]
#print("This is begining data ", result)
resultTwo = [i for i in lines if str(end) in i]
#print("This is end of data ", resultTwo)
#Combined list based on years entered
ultimateList = [result + resultTwo]
#Combined list of weather data for years selected
print(ultimateList)
'''
测试数据:
STATION STATION_NAME ELEVATION LATITUDE LONGITUDE DATE PRCP TEMPMAX TEMPMIN
----------------- -------------------------------------------------- ---------- ---------- ---------- -------- -------- -------- --------
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490101 0.00 44 27
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490102 0.00 42 25
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490103 0.15 46 30
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490104 0.03 41 30
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490105 1.14 46 37
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490106 0.00 51 40
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490107 0.00 57 36
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490108 0.00 56 45
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490109 0.00 66 42
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490110 0.00 70 51
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490111 0.03 59 45
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490112 0.04 48 38
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490113 0.00 52 36
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490114 0.00 56 36
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490115 0.00 49 31
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490116 0.00 68 28
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490117 0.00 63 50
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490118 0.04 53 42
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490119 0.01 63 38
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490120 0.00 45 28
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490121 0.97 35 28
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490122 0.29 60 34
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490123 0.14 47 38
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490124 0.01 72 38
GHCND:USW00013741 SPOKANE REGIONAL AIRPORT WA US 366.1 37.31667 -79.96667 19490125 0.05 66 49
很难从复制粘贴的数据示例中分辨出来,但看起来您的文件使用的是“固定宽度”的行格式—行中的每一列从给定位置开始,到给定位置结束。这是一个相当常见的类型的“格式”的日子
因此,您需要在这里写下每个列的名称、开始和结束位置,这样您就可以轻松地将行解析为字段,即:
现在,您可以将文件解析为一系列字段dicts:
您还可以对列值进行筛选、排序等,构建panda数据帧等
请注意,在解析过程中,您可以(也可能希望)将数据转换为正确的类型。有了以上的出发点,你应该可以很容易地做到这一点
相关问题 更多 >
编程相关推荐