使用计数对数据帧应用条件排除

2024-10-04 09:21:11 发布

男 | 程序猿一只，喜欢编程写python代码。

我在熊猫中有以下数据帧：

import pandas as pd
example_data = [{'ticker': 'aapl', 'loc': 'us'}, {'ticker': 'mstf', 'loc': 'us'}, {'ticker': 'baba', 'loc': 'china'}, {'ticker': 'ibm', 'loc': 'us'}, {'ticker': 'db', 'loc': 'germany'}]
df = pd.DataFrame(example_data)
print df

loc ticker
0       us   aapl
1       us   mstf
2    china   baba
3       us    ibm
4  germany     db

我想创建一个新的数据帧，这样每一行都是从原始的df创建的，但是loc计数大于2的行被排除在外。也就是说，通过循环使用旧的df来创建新的df，计算前面的loc行的数量，并基于此计数包括/排除该行。你知道吗

下面的代码给出了所需的输出。你知道吗

country_counts = {}
output = []
for row in df.values:
    if row[0] not in country_counts:
        country_counts[row[0]] = 1
    else:
        country_counts[row[0]] +=1
    if country_counts[row[0]] <= 2:
        output.append({'loc': row[0], 'ticker': row[1]})
new_df = pd.DataFrame(output)   
print new_df

loc ticker
0       us   aapl
1       us   mstf
2    china   baba
3  germany     db

输出不包括原始df中的第4行，因为其loc计数大于2（即3）。你知道吗

有没有更好的方法来执行这种类型的操作？非常感谢您的帮助。你知道吗

Tags： df db country loc row pd 计数 us

1条回答

网友

1楼 · 发布于 2024-10-04 09:21:11

groupby和.head怎么样：

In [90]: df.groupby('loc').head(2)
Out[90]: 
       loc ticker
0       us   aapl
1       us   mstf
2    china   baba
4  germany     db

另外，请注意列名，因为loc与.loc方法冲突。你知道吗

使用计数对数据帧应用条件排除

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用计数对数据帧应用条件排除

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >