数据帧列中出现计数模式问题的回答

数据帧列中出现计数模式

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

这是一个非常好的问题我想强调事件的时间间隔的力量——基于发布的序列，人们对行为和可预测性有很多洞察。考虑到这一点，我写了一个很长的答案，希望能解释一些数据操作的核心原则 1。创建自定义函数以执行计算： （假设您只应用于一个列表–我建议在调试或测试时提取一个列表） <pre class="lang-py prettyprint-override"><code>def event_metrics(my_list, look_for = "Accept", exclude_zeros=True, simple=True): """ Simple mode: Returns the average number of `items` before `look_for` Non-Simple mode: Returns a dictionary with the mean, median, and max number of `items` before `look_for` my_list: a list of values look_for: An item in the list which constitutes the "event" Example: "accept" from a list of "accept" and "reject" exclude_zeros: exclude metrics for when `look_for` occurs back to back simple: operate in simple mode or non-simple mode """ # Instantiate a counter list my_counter = [] n = 0 # Loop through the list for x in my_list: # If a match, add n to the list and reset if x==look_for: my_counter.append(n) n=0 # Otherwise, continue else: n+=1 # Sometimes you might want to append the final n at conclusion of the loop # You could do that with the following code: # if x!=look_for: # my_counter.append(n) # You may not want to include back-to-back events if exclude_zeros: my_counter = [x for x in my_counter if x>0] # You can return a specific metric such as mean if simple: return np.mean(my_counter) # Or you can pass several metrics as a dictionary and convert to a series my_metrics = { "mean":np.mean(my_counter), "median":np.median(my_counter), "max":np.max(my_counter) } return my_metrics </code></pre> 2。将此自定义函数应用于df: <ul> <li>简单模式：返回单个值的数组–将其视为新列。 </li> <li>非简单模式：返回字典数组–使用<code>pd.to_Series</code>转换为多列。使用<code>pd.merge</code>添加到原始<code>df</code></李> </ul> <pre class="lang-py prettyprint-override"><code># Simple Mode df["sequence_of_selection"].apply(event_metrics, simple=True) # Non-Simple Mode temp_df = df["sequence_of_selection"].apply(event_metrics, simple=False)\ .apply(pd.Series)\ # Convert to its own df .add_prefix("rej_") # Add a prefix to your column names df.merge(temp_df,left_index=True,right_index=True) </code></pre>

数据帧列中出现计数模式

1 个回答

相关Python问题