擅长:python、mysql、java
<p>假设您有这样的数据帧结构:</p>
<pre><code>import pandas as pd
import numpy as np
df = pd.DataFrame([['Alice', '2012-03-05', 23],
['Fred', '2012-03-05', 23],
['Bob', '2012-12-12', 00]],
columns=('Employee', 'Date', 'Time'))
# Here you have:
Employee Date Time
0 Alice 2012-03-05 23
1 Fred 2012-03-05 23
2 Bob 2012-12-12 0
# convert to a date
df['DateTime']=pd.to_datetime(df['Date'])
# make it index
df2=df.set_index('DateTime')
# group by date and time
g = df2.groupby([pd.TimeGrouper('D'), 'Time'])
# get counts:
print(g.count())
#Here you have:
Employee Date
DateTime Time
2012-03-05 23 2 2
2012-12-12 0 1 1
# to get inverted values:
print(1/g.count())
Employee Date
DateTime Time
2012-03-05 23 0.5 0.5
2012-12-12 0 1.0 1.0
</code></pre>
<p>当然,最好将<code>Time</code>作为<code>DateTime</code>列的一部分。如果你愿意,你可以练习一下:)</p>
<p>这种方法相当快:在我的笔记本电脑上,对4700万行进行分组大约需要3分钟。在</p>