我正在使用pandas\u重复数据消除库。当我尝试在Windows计算机上运行时,我会遇到这个错误,但同样的代码在Mac上运行良好
import pandas as pd
import pandas_dedupe as pdd
df=pd.read_csv('sample.csv')
df=pdd.dedupe_dataframe(df,['firstname','lastname','gender','zipcode','address'])
df.to_csv('sample_deduped.csv')
df=df[df['cluster id'].isnull() | ~df[df['cluster id'].notnull()].duplicated(subset='cluster id',keep='first')]
df.to_csv('sample_deuped_removed.csv')
以下是您想要查看的日志:
Traceback (most recent call last):
File "C:/Users/vikas.mittal/Desktop/python projects/untitled2/deduplication.py", line 10, in <module>
df=pdd.dedupe_dataframe(df,['firstname','lastname','gender','zipcode','address'])
File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\pandas_dedupe\dedupe_dataframe.py", line 213, in dedupe_dataframe
sample_size)
File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\pandas_dedupe\dedupe_dataframe.py", line 72, in _train
dedupe.consoleLabel(deduper)
File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\dedupe\convenience.py", line 36, in consoleLabel
uncertain_pairs = deduper.uncertainPairs()
File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\dedupe\api.py", line 714, in uncertainPairs
return self.active_learner.pop()
File "C:\Users\vikas.mittal\Desktop\python projects\untitled2\venv\lib\site-packages\dedupe\labeler.py", line 323, in pop
raise IndexError("No more unlabeled examples to label")
IndexError: No more unlabeled examples to label
Process finished with exit code 1
目前没有回答
相关问题 更多 >
编程相关推荐