<p>你问这个问题的原因是你不知道<a href="https://docs.mongodb.com/manual/reference/operator/aggregation/sample/" rel="nofollow">^{<cd1>}</a>运算符是如何工作的。如<a href="https://docs.mongodb.com/manual/reference/operator/aggregation/sample/#behavior" rel="nofollow">documentation</a>中所述</p>
<blockquote>
<p>In order to get N random documents:</p>
<ul>
<li><p>If N is greater than or equal to 5% of the total documents in the collection, $sample performs a collection scan, performs a sort, and then select the top N documents. As such, the $sample stage is subject to the <a href="https://docs.mongodb.com/manual/reference/operator/aggregation/sort/#sort-memory-limit" rel="nofollow">sort memory restrictions</a>.</p></li>
<li><p>If N is less than 5% of the total documents in the collection,
If using WiredTiger Storage Engine, $sample uses a pseudo-random cursor over the collection to sample N documents.
If using MMAPv1 Storage Engine, $sample uses the _id index to randomly select N documents.</p></li>
</ul>
</blockquote>
<p>所以我认为你想得到的随机文档数大于5%。您需要的是将<code>allowDiskUse</code>设置为<code>True</code>。在</p>
<pre><code>collection.aggregate(pipeline, allowDiskUse=True)
</code></pre>