使用DateTimeIndex计算数据帧中字符串的出现次数

timestamp v IceCreamOrder Location 2018-01-03 02:21:16 Chocolate South 2018-01-03 12:41:12 Vanilla North 2018-01-03 14:32:15 Strawberry North 2018-01-03 15:32:15 Strawberry North 2018-01-04 02:21:16 Strawberry North 2018-01-04 02:21:16 Rasberry North 2018-01-04 12:41:12 Vanilla North 2018-01-05 15:32:15 Chocolate North

2条回答

网友

1楼 · 编辑于 2024-09-30 06:15:15

使用^{}：

df.pivot_table(
    index='timestamp', columns='IceCreamOrder', aggfunc='size'
).fillna(0).astype(int)

^{pr2}$

或^{}：

pd.crosstab(df.timestamp, df.IceCreamOrder)

IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

如果您的timestamp列有次，只需在使用dt.date使用这些操作之前删除它们（如果您不想修改列，也许可以创建一个新序列来用于旋转）：

df.timestamp = df.timestamp.dt.date

网友

2楼 · 编辑于 2024-09-30 06:15:15

将^{}（==）用于比较列string，而聚集{}用于计数True值，因为True是类似于1s的进程：

#convert to datetimes if necessary
inputdf['timestamp'] = pd.to_datetime(inputdf['timestamp'], format='%m/%d/%y')
print (inputdf)
   timestamp IceCreamOrder Location
0 2018-01-02     Chocolate    South
1 2018-01-03       Vanilla    North
2 2018-01-03    Strawberry    North
3 2018-01-03    Strawberry    North
4 2018-01-04    Strawberry    North
5 2018-01-04      Rasberry    North
6 2018-01-04       Vanilla    North
7 2018-01-05     Chocolate    North

mydf = (inputdf.set_index('timestamp')['IceCreamOrder']
               .eq('Strawberry')
               .groupby(pd.Grouper(freq = 'D'))
               .sum())
print (mydf)
timestamp
2018-01-02    0.0
2018-01-03    2.0
2018-01-04    1.0
2018-01-05    0.0
Freq: D, Name: IceCreamOrder, dtype: float64

如果要计算所有types，请将列IceCreamOrder添加到groupby和聚合^{}：

^{pr2}$

mydf1 = (inputdf.set_index('timestamp')
               .groupby([pd.Grouper(freq = 'D'),'IceCreamOrder'])
               .size()
               .unstack(fill_value=0))
print (mydf1)
IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp                                              
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

如果所有datetime都没有times：

mydf1 = (inputdf.groupby(['timestamp', 'IceCreamOrder'])
                .size()
                .unstack(fill_value=0))
print (mydf1)
IceCreamOrder  Chocolate  Rasberry  Strawberry  Vanilla
timestamp                                              
2018-01-02             1         0           0        0
2018-01-03             0         0           2        1
2018-01-04             0         1           1        1
2018-01-05             1         0           0        0

相关问题更多 >

编程相关推荐

热门问题

热门文章