用于精确和可扩展的重复数据消除和实体解析的python库

dedupe的Python项目详细描述


重复数据消除是一种库,它使用机器学习快速对结构化数据执行重复数据消除和实体解析。重复数据消除是dedupe.io的开源引擎

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses
  • link a list with customer information to another with order history, even without unique customer id’s
  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java数组列表故障   Lambda表达式中方法引用的java方法引用   java上载文件并将其内容附加到现有文本文件   java JUnit:如何在非活动测试用例上获取上下文?   java将可比较对象的较低和较大实例提取到专用变量中   java如何将按钮活动链接到另一个Textview活动   错误:java。lang.SecurityExceptionsigner信息与同一包中其他类的签名者信息不匹配   java Geotools库突然从存储库中消失   java如何编写正则表达式来删除字符串中的所有字母字符   java反应式springdatasolr存储库   使用java将一个Zip的内容插入另一个Zip   在公式标记中创建别名时发生java错误   java异常\访问\冲突(0xc0000005)javaCV   Wicket中多文件上传的java FileNotFoundException   java从由“|”分隔的txt文件中获取特定值