在PyTorch的DataLoader中，getitem的idx是如何工作的？

1条回答

网友

1楼 · 发布于 2024-09-29 22:21:56

定义idx的是sampler或{}，如您所见{a1}（开源项目是您的朋友）。在这个code（和comment/docstring）中，您可以看到sampler和{}之间的区别。如果您查看here，您将看到如何选择索引：

def __next__(self):
    index = self._next_index()

# and _next_index is implemented on the base class (_BaseDataLoaderIter)
def _next_index(self):
    return next(self._sampler_iter)

# self._sampler_iter is defined in the __init__ like this:
self._sampler_iter = iter(self._index_sampler)

# and self._index_sampler is a property implemented like this (modified to one-liner for simplicity):
self._index_sampler = self.batch_sampler if self._auto_collation else self.sampler

请注意这是_SingleProcessDataLoaderIter实现；您可以找到_MultiProcessingDataLoaderIterhere（ofc，使用哪个取决于num_workers值，如您所见here）。回到采样器，假设您的数据集不是_DatasetKind.Iterable，并且您没有提供自定义采样器，这意味着您正在使用（dataloader.py#L212-L215）：

^{pr2}$

让我们看看how the default BatchSampler builds a batch：

def __iter__(self):
    batch = []
    for idx in self.sampler:
        batch.append(idx)
        if len(batch) == self.batch_size:
            yield batch
            batch = []
    if len(batch) > 0 and not self.drop_last:
        yield batch

非常简单：它从取样器获取索引，直到达到所需的批次大小。在

现在的问题是“在PyTorch的DataLoader中，getitem的idx是如何工作的？”可以通过查看每个默认采样器的工作方式来回答。在

SequentialSampler（这是完整的实现非常简单，不是吗？）公司名称：

class SequentialSampler(Sampler):
    def __init__(self, data_source):
        self.data_source = data_source

    def __iter__(self):
        return iter(range(len(self.data_source)))

    def __len__(self):
        return len(self.data_source)

RandomSampler（我们只看__iter__实现）：

def __iter__(self):
    n = len(self.data_source)
    if self.replacement:
        return iter(torch.randint(high=n, size=(self.num_samples,), dtype=torch.int64).tolist())
    return iter(torch.randperm(n).tolist())

因此，由于您没有提供任何代码，我们只能假设：

您正在数据加载器中使用shuffle=True或
您正在使用自定义采样器或
您的数据集是_DatasetKind.Iterable

相关问题更多 >

编程相关推荐

热门问题

热门文章

在PyTorch的DataLoader中，getitem的idx是如何工作的？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >