如何将处理datetime df的if语句的for循环转换为列表

2024-10-03 11:20:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图将下面的for循环with if语句转换成一个列表

# Create dictionary to hold results
    trip_counts = {'AM': 0, 'PM': 0}

# Loop over all trips
for trip in onebike_datetimes:
  # Check to see if the trip starts before noon
  if trip['start'].hour < 12:
    # Increment the counter for before noon
    trip_counts["AM"] += 1
  else:
    # Increment the counter for after noon
    trip_counts["PM"] += 1

我试过了

[trip_counts["AM"]+=1 if trip['start'].hour <12 else trip_counts['PM']+= 1 for trip in onebike_datetimes] 

但我一直收到一个syntaxerror


Tags: thetoinforifamstarttrip
3条回答

您可以使用列表理解(实际上,只是一个生成器表达式),但不能使用您的思维方式。构建AMPM的生成器,然后使用该生成器构建Counter实例

from collections import Counter


trip_counts = Counter(("AM" if trip['start'].hour < 12 else "PM") 
                       for trip in onebike_datetimes)

一个独立的演示:

from collections import Counter
from types import SimpleNamespace


onebike_datetimes = [
    {'start': SimpleNamespace(hour=9)},
    {'start': SimpleNamespace(hour=3)},
    {'start': SimpleNamespace(hour=14)},
    {'start': SimpleNamespace(hour=19)},
    {'start': SimpleNamespace(hour=7)},
    ]

trip_counts = Counter(("AM" if trip['start'].hour < 12 else "PM") 
                       for trip in onebike_datetimes)

assert trip_counts["AM"] == 3
assert trip_counts["PM"] == 2

如果这是您正在使用的pandasDataFrame,为什么不过滤值并一次对它们求和呢

类似的方法可能会奏效:

trip_counts['AM'] = len(trip[trip.loc[:, 'hour'] < 12].index)
trip_counts['PM'] = len(trip[trip.loc[:, 'hour'] >= 12].index)


编辑:我刚刚对这里给出的答案进行了一些基准测试,因为有些人认为列表理解会自动更快。

如您所见,在这种情况下,常规for循环或多或少具有最好的性能,仅通过使用Counter与列表理解相匹配,如这里的其他答案之一所述

请注意,我稍微修改了Pandas实现,以匹配我认为数据可能的结构(即,不在数据帧中),因此在每次运行时将数据转换为数据帧可能会有更多的开销

benchmark

生成此图的代码如下所示:

import pandas as pd
import numpy as np
from collections import Counter
from types import SimpleNamespace

import perfplot


def gen_data(n):
    onebike_datetimes = [
    {'start': SimpleNamespace(hour=9)},
    {'start': SimpleNamespace(hour=3)},
    {'start': SimpleNamespace(hour=14)},
    {'start': SimpleNamespace(hour=19)},
    {'start': SimpleNamespace(hour=7)},
    {'start': SimpleNamespace(hour=14)},
    {'start': SimpleNamespace(hour=19)},
    {'start': SimpleNamespace(hour=2)},
    {'start': SimpleNamespace(hour=20)},
    {'start': SimpleNamespace(hour=12)},
    ]*n

    return onebike_datetimes


def use_vanilla_for(a):
#     onebike_datetimes = gen_data(n)
    onebike_datetimes = a

    trip_counts = {'AM': 0, 'PM': 0}

    for trip in onebike_datetimes:
        if trip['start'].hour < 12:
            trip_counts["AM"] += 1
        else:
            trip_counts["PM"] += 1
    return 1    
#     return trip_counts


def use_list_comp(a):
#     onebike_datetimes = gen_data(n)
    onebike_datetimes = a

    trip_counts = {'AM': 0, 'PM': 0}

    l = ["AM" if trip["start"].hour < 12 else "PM" for trip in onebike_datetimes]
    trip_counts = {i: l.count(i) for i in l}
    return 1
#     return trip_counts


def use_counter(a):
#     onebike_datetimes = gen_data(n)
    onebike_datetimes = a

    trip_counts = {'AM': 0, 'PM': 0}

    trip_counts = Counter(("AM" if trip['start'].hour < 12 else "PM") 
                       for trip in onebike_datetimes)
    return 1
#     return trip_counts


def use_pandas(a):
#     onebike_datetimes = gen_data(n)
    onebike_datetimes = a

    trip = pd.DataFrame(list(map(lambda a: a['start'].hour, onebike_datetimes)), columns=['hrs'])

    trip_counts = {'AM': 0, 'PM': 0}

    trip_counts['AM'] = len(trip[trip['hrs'] < 12].index)
    trip_counts['PM'] = len(trip[trip['hrs'] >= 12].index)
    return 1
#     return trip_counts

perfplot.show(
    setup=lambda n: gen_data(n),
    kernels=[
        lambda a: use_vanilla_for(a),
        lambda a: use_list_comp(a),
        lambda a: use_counter(a),
        lambda a: use_pandas(a),
    ],
    labels=["vanilla_for", "list_comp", "counter", "dataframe"],
    n_range=[2 ** k for k in range(10)],
    xlabel="len(a)",
)

更清楚的是,保持你的for循环

如果你真的想利用列表理解,你可以这样做:

l = ["AM" if trip["start"].hour < 12 else "PM" for trip in onebike_datetimes]
am_count = l.count("AM")
trip_counts = {"AM": am_count, "PM": len(l) - am_count}

(如果使用此选项,则不需要初始化trip_counts

相关问题 更多 >