在具有多种条件的Pandas中创建箱线图

2024-10-03 11:24:18 发布

您现在位置:Python中文网/ 问答频道 /正文

time xco2 lon lat mask front flag alt type time 2016-07-18 18:00:40 64835.00 400.345876 -77.665768 40.444690 1.00 2.0 0.00 3198.345000 warm 2016-07-18 18:00:50 64845.00 400.694926 -77.679259 40.450737 0.98 2.0 0.00 3199.400000 warm 2016-07-18 18:01:00 64855.00 401.107295 -77.692715 40.456796 0.98 2.0 0.00 3197.810000 warm 2016-07-18 18:01:10 64865.00 401.566160 -77.706165 40.462843 0.95 2.0 0.00 3196.500000 warm 2016-07-18 18:01:20 64875.00 401.752364 -77.719628 40.468837 1.00 2.0 0.00 3197.945000 warm ... ... ... ... ... ... ... ... ... ... 2016-07-18 18:50:30 67825.00 391.580408 -80.799363 41.847582 0.81 NaN 0.00 3158.575000 cold 2016-07-18 18:50:40 67835.00 392.728223 -80.809320 41.851846 1.00 NaN 0.00 3241.930000 cold 2016-07-18 18:50:50 67845.00 392.051042 -80.819123 41.855974 0.43 NaN 1.14 3340.510000 cold 2016-07-18 18:51:00 67855.00 392.827331 -80.828735 41.860006 1.00 NaN 0.00 3428.665000 cold 2016-07-18 18:51:10 67862.95 392.934952 -80.836415 41.863085 1.00 NaN 0.00 3483.171186 cold 304 rows × 9 columns

我有很多天要做,目前我做这件事的方式非常耗时,我需要一个更有效的方式!我需要用cold或warm来分隔数据,我有一列表示它。然后,我需要每个框和胡须的纬度为0.5度。我目前正在为每半度的数据手动创建一个新列。图像是我一直在做的,以及数据设置的快照This is the old way of doing it and many of the days require much more columns

warm=np.arange(41.367440,44.13,0.25) cold=np.arange(44.141705,46.321997,0.25) print(warm) print(cold) xco2_0=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[0]) & (df_layer102['lat'] <= warm[1])] xco2_1=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[1]) & (df_layer102['lat'] <= warm[2])] xco2_2=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[2]) & (df_layer102['lat'] <= warm[3])] xco2_3=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[3]) & (df_layer102['lat'] <= warm[4])] xco2_4=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[4]) & (df_layer102['lat'] <= warm[5])] xco2_5=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[5]) & (df_layer102['lat'] <= warm[6])] xco2_6=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[6]) & (df_layer102['lat'] <= warm[7])] xco2_7=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[7]) & (df_layer102['lat'] <= warm[8])] xco2_8=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[8]) & (df_layer102['lat'] <= warm[9])] xco2_9=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[9]) & (df_layer102['lat'] <= warm[10])] xco2_10=df_layer102['XCO2'].loc()[(df_layer102['lat'] > warm[10]) & (df_layer102['lat'] <= warm[11])] # xco2_11=df_layer10['XCO2'].loc()[(df_layer10['lat'] > warm[11]) & (df_layer10['lat'] <= warm[12])] # xco2_12=df_layer10['XCO2'].loc()[(df_layer10['lat'] > warm[12]) & (df_layer10['lat'] <= warm[13])] # xco2_11=df_layer10['XCO2'].loc()[(df_layer10['lat'] > warm[11]) & (df_layer10['lat'] <= cold[0])] xco2_11=df_layer102['XCO2'].loc()[(df_layer102['lat'] >= cold[0]) & (df_layer102['lat'] <= cold[1])] xco2_12=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[1]) & (df_layer102['lat'] <= cold[2])] xco2_13=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[2]) & (df_layer102['lat'] <= cold[3])] xco2_14=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[3]) & (df_layer102['lat'] <= cold[4])] xco2_15=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[4]) & (df_layer102['lat'] <= cold[5])] xco2_16=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[5]) & (df_layer102['lat'] <= cold[6])] xco2_17=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[6]) & (df_layer102['lat'] <= cold[7])] xco2_18=df_layer102['XCO2'].loc()[(df_layer102['lat'] > cold[7]) & (df_layer102['lat'] <= cold[8])] # xco2_19=df_layer10['XCO2'].loc()[(df_layer1['lat'] > cold[8]) & (df_layer10['lat'] <= cold[9])] # xco2_19=df_avg_up05['xco2_up'].loc()[(df_avg_up05['lat_up'] > num1[5]) & (df_avg_up05['lat_up'] <= num1[6])] # data_group_mid={'35 \u00b0':xco2_35_36, '36 \u00b0':xco2_36_37, '37 \u00b0':xco2_37_38, '38 \u00b0':xco2_38_39, '39 \u00b0':xco2_39_40, '40 \u00b0':xco2_40_41, '41 \u00b0':xco2_41_42} data_group_front={'46.14\u00b0':xco2_18, '45.89\u00b0':xco2_17, '45.64\u00b0':xco2_16, '45.39\u00b0':xco2_15,'45.14\u00b0':xco2_14,'44.89\u00b0':xco2_13,'44.69\u00b0':xco2_12,'44.39\u00b0':xco2_11,'44.11\u00b0':xco2_10,'43.86\u00b0':xco2_9, \ '43.61\u00b0':xco2_8,'43.36\u00b0':xco2_7,'43.11\u00b0':xco2_6,'42.86\u00b0':xco2_5,'42.61\u00b0':xco2_4,'42.36\u00b0':xco2_3,'42.11\u00b0':xco2_2,'41.86\u00b0':xco2_1,'41.61\u00b0':xco2_0} df_xco2_front=pd.DataFrame(data=data_group_front) df_xco2_front.count()

Tags: 数据dfdatananlocavgfrontlat
1条回答
网友
1楼 · 发布于 2024-10-03 11:24:18

方法1:

您可以做的是创建一个新列,用pd.cut来bucket“lat”

df_layer102['lat_bucketed'] = pd.cut(df_layer102['lat'], numpy.append(warm, cold))

这里温暖和寒冷是结合在一起的,因为它们是不重叠的,并且已经有一列指示coldwarm。但你可以一个一个地挑


方法2:

这也可以使用手动完成

df_layer102['lat_bucketed'] = ((df_layer102['lat'] - df_layer102['lat'].min())/0.5).astype(int)

这将为您提供一个包含bucket索引的列(例如0、1、2等)


然后,使用seaborn,您可以

import seaborn as sns
sns.set(style="whitegrid")
ax = sns.boxplot(x="lat_bucketed", y="XCO2", data=df_layer102)

相关问题 更多 >