避免迭代以获得大Pandas的发生次数

for row in df_stops: # number of early arrivals / total number of arrivals @ that stop row['PercentEarly'] = df_arrivals.loc[df_arrivals['StopNum'] == row['StopNum'] and df_arrivals['OnTimeStatus'] < 0].count() / df_arrivals.loc[df_arrivals['StopNum'] == row['StopNum']].count() # same idea for on time and late arrivals

RouteNumber ScheduledUnix StopNumber OnTimeStatus 0 44 1511977533 40888 0 1 44 1511979273 40888 0 2 44 1511979273 40888 0 3 44 1511980353 40888 0 4 44 1511979273 40888 0 5 44 1511980353 40888 1 ... ... ... ... ... 67538 85 1512005100 40900 0 67539 85 1512008700 40900 0 67540 85 1512008700 40900 -1 67541 85 1512008700 40900 0 67542 85 1512012300 40900 0

3条回答

网友

1楼 · 编辑于 2024-06-26 11:23:00

要回答有关事件计数的问题，请执行以下操作：

我要做的是：

#This represents all early, ontime, and late arrivals. If you want to grab per stopnum then you need to groupby first (see below)
#Define a specific stop num and store as stop_num = the number
early, ontime, late = df_arrivals[df_arrivals.stop_number == stop_num].OnTimeStatus.value_counts()[-1], df_arrivals.OnTimeStatus.value_counts()[0], df_arrivals.OnTimeStatus.value_counts()[1]

total_stops = len(df_stops[df_stops.StopNumber == stop_num])
EarlyPercent= early/total_stops
OntimePercent= ontime/total_stops
LatePercent= late/total_stops

现在请记住，这只是每一个stop num。实际上，我不认为有一种方法可以避免在这种情况下没有过于复杂的代码（链接等）的迭代。你知道吗

df_stops['PercentEarly'] = ''
df_stops['PercentOntime'] = ''
df_stops['PercentLate'] = ''

for stop_num in df_arrivals.stop_number.tolist():
    early, ontime, late = df_arrivals[df_arrivals.stop_number == stop_num].OnTimeStatus.value_counts()[-1], df_arrivals.OnTimeStatus.value_counts()[0], df_arrivals.OnTimeStatus.value_counts()[1]
    total_stops = len(df_stops[df_stops.StopNumber == stop_num])
    EarlyPercent= early/total_stops
    OntimePercent= ontime/total_stops
    LatePercent= late/total_stops
    df_stops.loc[df_stops.StopNumber == stop_num, 'PercentEarly'] =EarlyPercent
    df_stops.loc[df_stops.StopNumber == stop_num, 'PercentOnTime'] = OntimePercent
    df_stops.loc[df_stops.StopNumber == stop_num, 'PercentLate'] =LatePercent

网友

2楼 · 编辑于 2024-06-26 11:23:00

你可以使用groupby

for stops in df_arrivals.groupby('StopNum'):
    stop[1].groupby('OnTimeStatus').count()

它现在能像预期的那样工作吗？你知道吗

网友

3楼 · 编辑于 2024-06-26 11:23:00

我从来没有想过如何不用迭代就完成它。我还决定存储早/准时/晚的数量，而不是百分比。以下是我的解决方案，即使有上万个条目，它似乎也相当快：

# find the number of arrivals, make a series, and merge it with the stops DataFrame
df_stop_counts = df_arrivals['stopNumber'].value_counts().reset_index()
df_stop_counts.columns = ['StopNumber', 'NumArrivals']
df_stops = pd.merge(df_stops, df_stop_counts, left_on='stopNumber', right_on='StopNumber')

# iterate over all the stops and find the number of early/on-time/late arrivals
for index, row in df_stops.iterrows():
    df_stops.at[index, 'NumEarly'] = len(df_arrivals.loc[(df_arrivals['stopNumber'] == index) & (df_arrivals['OnTimeStatus'] == -1)])
    df_stops.at[index, 'NumOnTime'] =  len(df_arrivals.loc[(df_arrivals['stopNumber'] == index) & (df_arrivals['OnTimeStatus'] == 0)])
    df_stops.at[index, 'NumLate'] =  len(df_arrivals.loc[(df_arrivals['stopNumber'] == index) & (df_arrivals['OnTimeStatus'] == 1)])

相关问题更多 >

编程相关推荐

热门问题

热门文章