Python中random.sample和random.shuffle的区别是什么

3条回答

网友

1楼 · 编辑于 2024-05-03 01:25:20

在shuffle（）和sample（）之间有两个主要区别：

1）Shuffle将就地更改数据，因此其输入必须是可变序列。相反，sample生成一个新的列表，其输入可以有更多的变化（tuple、string、xrange、bytearray、set等）。

2）样本可以让你做更少的工作（即部分洗牌）。

有趣的是，通过演示可以根据sample（）实现shuffle（）来显示两者之间的概念关系：

def shuffle(p):
   p[:] = sample(p, len(p))

反之亦然，根据shuffle实现sample（）：

def sample(p, k):
   p = list(p)
   shuffle(p)
   return p[:k]

在shuffle（）和sample（）的实际实现中，这两种方法都没有那么有效，但它确实显示了它们的概念关系。

网友

2楼 · 编辑于 2024-05-03 01:25:20

洗牌的来源：

def shuffle(self, x, random=None, int=int):
    """x, random=random.random -> shuffle list x in place; return None.

    Optional arg random is a 0-argument function returning a random
    float in [0.0, 1.0); by default, the standard random.random.
    """

    if random is None:
        random = self.random
    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

样品来源：

def sample(self, population, k):
    """Chooses k unique random elements from a population sequence.

    Returns a new list containing elements from the population while
    leaving the original population unchanged.  The resulting list is
    in selection order so that all sub-slices will also be valid random
    samples.  This allows raffle winners (the sample) to be partitioned
    into grand prize and second place winners (the subslices).

    Members of the population need not be hashable or unique.  If the
    population contains repeats, then each occurrence is a possible
    selection in the sample.

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)
    """

    # XXX Although the documentation says `population` is "a sequence",
    # XXX attempts are made to cater to any iterable with a __len__
    # XXX method.  This has had mixed success.  Examples from both
    # XXX sides:  sets work fine, and should become officially supported;
    # XXX dicts are much harder, and have failed in various subtle
    # XXX ways across attempts.  Support for mapping types should probably
    # XXX be dropped (and users should pass mapping.keys() or .values()
    # XXX explicitly).

    # Sampling without replacement entails tracking either potential
    # selections (the pool) in a list or previous selections in a set.

    # When the number of selections is small compared to the
    # population, then tracking selections is efficient, requiring
    # only a small set and an occasional reselection.  For
    # a larger number of selections, the pool tracking method is
    # preferred since the list takes less space than the
    # set and it doesn't suffer from frequent reselections.

    n = len(population)
    if not 0 <= k <= n:
        raise ValueError, "sample larger than population"
    random = self.random
    _int = int
    result = [None] * k
    setsize = 21        # size of a small set minus size of an empty list
    if k > 5:
        setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
    if n <= setsize or hasattr(population, "keys"):
        # An n-length list is smaller than a k-length set, or this is a
        # mapping type so the other algorithm wouldn't work.
        pool = list(population)
        for i in xrange(k):         # invariant:  non-selected at [0,n-i)
            j = _int(random() * (n-i))
            result[i] = pool[j]
            pool[j] = pool[n-i-1]   # move non-selected item into vacancy
    else:
        try:
            selected = set()
            selected_add = selected.add
            for i in xrange(k):
                j = _int(random() * n)
                while j in selected:
                    j = _int(random() * n)
                selected_add(j)
                result[i] = population[j]
        except (TypeError, KeyError):   # handle (at least) sets
            if isinstance(population, list):
                raise
            return self.sample(tuple(population), k)
    return result

如您所见，在这两种情况下，随机化基本上是由int(random() * n)行完成的。因此，底层算法基本上是相同的。

网友

3楼 · 编辑于 2024-05-03 01:25:20

random.shuffle()将给定的list洗牌到位。它的长度不变。

random.sample()从给定序列中挑选n项而不进行替换（也可以是元组或其他，只要它有一个__len__()），并按随机顺序返回它们。

相关问题更多 >

编程相关推荐

热门问题

热门文章