熊猫比较数据框的行并根据条件返回集合

2024-06-14 19:21:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧:

[in] print(testing_df.head(n=5))
print(product_combos1.head(n=5))

[out]
                     product_id  length
transaction_id                         
001                      (P01,)       1
002                  (P01, P02)       2
003             (P01, P02, P09)       3
004                  (P01, P03)       2
005             (P01, P03, P05)       3

             product_id  count  length
0            (P06, P09)  36340       2
1  (P01, P05, P06, P09)  10085       4
2            (P01, P06)  36337       2
3            (P01, P09)  49897       2
4            (P02, P09)  11573       2

我想返回频率最高的product_combos行,它们是len(testing_df + 1),并且包含testing_df字符串。例如,事务id 001我想返回product_combos[3](尽管只有P09)

对于第一部分(仅根据长度进行比较),我尝试:

# Return the product combos values that are of the appropriate length and the strings match
for i in testing_df['length']:
    for k in product_combos1['length']:
        if (i)+1 == (k):
            matches = list(k) 

但是,这将返回错误:

TypeError: 'numpy.int64' object is not iterable

Tags: theiniddfproducttestinglengthhead
2条回答

只需使用.append()方法。我还建议将“matches”设置为顶部的空列表,这样在重新运行单元格时就不会出现重复

# Setup

testing_df = pd.DataFrame(columns = ['product_id','length'])
testing_df.product_id = [('P01',),('P01', 'P02')]
testing_df.length = [1,2]
product_combos1 = pd.DataFrame(columns = ['product_id','count','length'])
product_combos1.length = [3,1]
product_combos1.product_id = [('P01',),('P01', 'P02')]
product_combos1.count = [100,5000]

# Matching

matches = []
for i in testing_df['length']:
    for k in product_combos1['length']:
        if i+1 == k:
            matches.append(k)

让我知道这是否有效,或者如果有什么其他的!祝你好运

你不能从那样的不可iterable创建列表。尝试用matches = [k]替换matches = list(k)。 另外,这些括号是多余的-您可以用if i + 1 == k:替换if (i)+1 == (k):

相关问题 更多 >