如何在Windows计算机上引发Pandas索引错误?

2024-09-28 23:27:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用pandas\u重复数据消除库。当我尝试在Windows计算机上运行时,我会遇到这个错误,但同样的代码在Mac上运行良好

import pandas as pd
import pandas_dedupe as pdd
df=pd.read_csv('sample.csv')

df=pdd.dedupe_dataframe(df,['firstname','lastname','gender','zipcode','address'])
df.to_csv('sample_deduped.csv')
df=df[df['cluster id'].isnull() | ~df[df['cluster id'].notnull()].duplicated(subset='cluster id',keep='first')]

df.to_csv('sample_deuped_removed.csv')

以下是您想要查看的日志:

Traceback (most recent call last):
  File "C:/Users/vikas.mittal/Desktop/python projects/untitled2/deduplication.py", line 10, in <module>
    df=pdd.dedupe_dataframe(df,['firstname','lastname','gender','zipcode','address'])
  File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\pandas_dedupe\dedupe_dataframe.py", line 213, in dedupe_dataframe
    sample_size)
  File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\pandas_dedupe\dedupe_dataframe.py", line 72, in _train
    dedupe.consoleLabel(deduper)
  File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\dedupe\convenience.py", line 36, in consoleLabel
    uncertain_pairs = deduper.uncertainPairs()
  File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\dedupe\api.py", line 714, in uncertainPairs
    return self.active_learner.pop()
  File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\dedupe\labeler.py", line 323, in pop
    raise IndexError("No more unlabeled examples to label")
IndexError: No more unlabeled examples to label

Process finished with exit code 1

Tags: csvinpydataframepandasdflineusers