从groupby operation resu构造一个超集

name_region bahia [10, 11, 12, 1, 2, 3, 4] distrito_federal [9, 10, 11, 12, 1, 2, 3, 4] goias [9, 10, 11, 12, 1, 2, 3, 4] maranhao [10, 11, 12, 1, 2, 3, 4] mato_grosso [9, 10, 11, 12, 1, 2, 3, 4] mato_grosso_do_sul [8, 9, 10, 11, 12, 1, 2, 3]

3条回答

网友

1楼 · 编辑于 2024-09-28 22:07:21

我似乎误解了问题中的数据结构，但由于它可能对类似的案例有用，我将在这里保留此答案以供将来参考。

你可以使用numpy的独特功能。你知道吗

import pandas as pd
import numpy as np

df = pd.DataFrame({"x": [1,3,5], "y": [3,4,5]})

print np.unique(df) # prints [1 3 4 5]

网友

2楼 · 编辑于 2024-09-28 22:07:21

您可以这样使用itertools recipeunique_everseen（它保留顺序）：

>>> [i for i in unique_everseen([z for z in y['months'] for x,y in df.iterrows()])]
[9, 10, 11, 12, 1, 2, 3, 4]

unique_everseen的定义：

import itertools as it
def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB')  > A B C D
    # unique_everseen('ABBCcAD', str.lower)  > A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in it.ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

网友

3楼 · 编辑于 2024-09-28 22:07:21

我不知道是否有一种方法可以在熊猫身上做得更干净，所以如果其他人知道，请回答。。。从类型上看，这似乎是一个时间折叠在那列。你知道吗

我没有在熊猫身上看到折叠操作，所以可能只是一个累积的for循环。。i、 e

all_months = []
for row in df.iterrows():
    months = row['months']
    all_months += [e for e in months if not e in all_months]

仔细想想。。会用set而不是复杂来理解

all_months = set()
for row in df.iterrows():
    months = set(row['months'])
    all_months = all_months.union(months)

嗯，只是看到其他人的答案，还没测试过。。但是看起来好多了！选择那个：）。把这个贴出来以防万一。。。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章