获得每日最大值会产生奇怪的结果问题的回答

获得每日最大值会产生奇怪的结果

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

您的数据在最后一列中不是数字，这是一个问题。你知道吗 解决方案是使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_numeric.html" rel="nofollow noreferrer">^{<cd1>}</a>将坏数据转换成<code>NaN</code>： 为了更好地使用数据帧，还可以为列名添加参数<code>names</code>到<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html" rel="nofollow noreferrer">^{<cd4>}</a>。你知道吗 <pre><code>import pandas as pd from pandas.compat import StringIO temp=u"""02/01/2016;05:15:00;10.800 02/01/2016;05:30:00;10.300 02/01/2016;05:45:00;9.200 02/01/2016;06:00:00;9.200 02/01/2016;06:15:00;8.900 02/01/2016;06:30:00;8.900 02/01/2016;06:45:00;9.400 03/01/2016;07:00:00;9.000 03/01/2016;07:15:00;9.200 03/01/2016;07:30:00;11.100 04/01/2016;07:45:00;13.000 04/01/2016;08:00:00;14.400 04/01/2016;08:15:00;a""" #after testing replace 'StringIO(temp)' to 'filename.csv' df_intraday = pd.read_csv(StringIO(temp), sep=";", names=['date','time','val'], parse_dates=[0]) print (df_intraday) date time val 0 2016-02-01 05:15:00 10.800 1 2016-02-01 05:30:00 10.300 2 2016-02-01 05:45:00 9.200 3 2016-02-01 06:00:00 9.200 4 2016-02-01 06:15:00 8.900 5 2016-02-01 06:30:00 8.900 6 2016-02-01 06:45:00 9.400 7 2016-03-01 07:00:00 9.000 8 2016-03-01 07:15:00 9.200 9 2016-03-01 07:30:00 11.100 10 2016-04-01 07:45:00 13.000 11 2016-04-01 08:00:00 14.400 12 2016-04-01 08:15:00 a </code></pre> <hr/> <pre><code>df_daily = df_intraday.groupby('date', as_index=False)['val'].max() print (df_daily) date val 0 2016-02-01 9.400 1 2016-03-01 9.200 2 2016-04-01 a #check dtypes - object is obviusly string print (df_intraday['val'].dtypes) object df_intraday['val'] = pd.to_numeric(df_intraday['val'], errors='coerce') print (df_intraday) date time val 0 2016-02-01 05:15:00 10.8 1 2016-02-01 05:30:00 10.3 2 2016-02-01 05:45:00 9.2 3 2016-02-01 06:00:00 9.2 4 2016-02-01 06:15:00 8.9 5 2016-02-01 06:30:00 8.9 6 2016-02-01 06:45:00 9.4 7 2016-03-01 07:00:00 9.0 8 2016-03-01 07:15:00 9.2 9 2016-03-01 07:30:00 11.1 10 2016-04-01 07:45:00 13.0 11 2016-04-01 08:00:00 14.4 12 2016-04-01 08:15:00 NaN print (df_intraday['val'].dtypes) float64 </code></pre> <hr/> <pre><code>#simplier way for aggregating max df_daily = df_intraday.groupby('date', as_index=False)['val'].max() print (df_daily) date val 0 2016-02-01 10.8 1 2016-03-01 11.1 2 2016-04-01 14.4 </code></pre>

获得每日最大值会产生奇怪的结果

1 个回答

相关Python问题