垃圾邮件:垃圾邮件过滤服务

spamm的Python项目详细描述


PyPI versionBuild StatusPython VersionspercentagecovRequirements StatusLicense

Author:Tasdik Rahman
Latest version:1.0.3

1概述

spammy:在您的服务中过滤垃圾邮件

2个功能

  • 在自己的数据集上训练分类器,将电子邮件分类为垃圾邮件或火腿
  • 非常简单易用。见usage
  • 一旦分类器被训练,速度就非常快。(见benchmarks
  • 引发了自定义异常,因此当您错过某个内容时,Spammy会以优雅的方式告诉您哪里出错了
  • 用简单的python
  • 编写
  • 建在nltk的巨大肩膀上

3示例

[back to top]

>>>importos>>>fromspammyimportSpammy>>>>>>directory='/home/tasdik/Dropbox/projects/spamfilter/data/corpus3'>>>>>># directory structure>>>os.listdir(directory)['spam','Summary.txt','ham']>>>os.listdir(os.path.join(directory,'spam'))[:5]['4257.2005-04-06.BG.spam.txt','0724.2004-09-21.BG.spam.txt','2835.2005-01-19.BG.spam.txt','2505.2005-01-03.BG.spam.txt','3992.2005-03-19.BG.spam.txt']>>>>>># Spammy object created>>>cl=Spammy(directory,limit=100)>>>cl.train()>>>>>>SPAM_TEXT= \
..."""
... My Dear Friend,
...
... How are you and your family? I hope you all are fine.
...
... My dear I know that this mail will come to you as a surprise, but it's for my
... urgent need for a foreign partner that made me to contact you for your sincere
... genuine assistance My name is Mr.Herman Hirdiramani, I am a banker by
... profession currently holding the post of Director Auditing Department in
... the Islamic Development Bank(IsDB)here in Ouagadougou, Burkina Faso.
...
... I got your email information through the Burkina's Chamber of Commerce
... and industry on foreign business relations here in Ouagadougou Burkina Faso
... I haven'disclose this deal to any body I hope that you will not expose or
... betray this trust and confident that I am about to repose on you for the
... mutual benefit of our both families.
...
... I need your urgent assistance in transferring the sum of Eight Million,
... Four Hundred and Fifty Thousand United States Dollars ($8,450,000:00) into
... your account within 14 working banking days This money has been dormant for
... years in our bank without claim due to the owner of this fund died along with
... his entire family and his supposed next of kin in an underground train crash
... since years ago. For your further informations please visit
... (http://news.bbc.co.uk/2/hi/5141542.stm)
... """>>>cl.classify(SPAM_TEXT)'spam'>>>

分类器的精度

>>>fromspammyimportSpammy>>>directory='/home/tasdik/Dropbox/projects/spammy/examples/training_dataset'>>>cl=Spammy(directory,limit=300)# training on only 300 spam and ham files>>>cl.train()>>>cl.accuracy(directory='/home/tasdik/Dropbox/projects/spammy/examples/test_dataset',label='spam',limit=300)0.9554794520547946>>>cl.accuracy(directory='/home/tasdik/Dropbox/projects/spammy/examples/test_dataset',label='ham',limit=300)0.9033333333333333>>>

note:可以在examples directory

中找到更多示例

4安装

[back to top]

注意:当前仅支持python2

首先安装依赖项

$ pip install nltk==3.2.1, beautifulsoup4==4.4.1

要安装,请使用pip:

$ pip install spammy

或使用简易安装

$ easy_install spammy

或者自己建造(如果必须的话):

$ git clone https://github.com/prodicus/spammy.git
$ python setup.py install

4.1升级

要升级软件包,

$ pip install -U spammy

4.2代理后的安装

如果您在代理服务器后面,则此操作应该有效

$ pip --proxy [username:password@]domain_name:port install spammy

基准

[back to top]

Spammy一经训练,速度就非常快

不相信我?看一看

>>>importtimeit>>>fromspammyimportSpammy>>>>>>directory='/home/tasdik/Dropbox/projects/spamfilter/data/corpus3'>>>cl=Spammy(directory,limit=100)>>>cl.train()>>>SPAM_TEXT_2= \
..."""
... INTERNATIONAL MONETARY FUND (IMF)
... DEPT: WORLD DEBT RECONCILIATION AGENCIES.
... ADVISE: YOUR OUTSTANDING PAYMENT NOTIFICATION
...
... Attention
... A power of attorney was forwarded to our office this morning by two gentle men,
... one of them is an American national and he is MR DAVID DEANE by name while the
... other person is MR... JACK MORGAN by name a CANADIAN national.
... This gentleman claimed to be your representative, and this power of attorney
... stated that you are dead; they brought an account to replace your information
... in other to claim your fund of (US$9.7M) which is now lying DORMANT and UNCLAIMED,
...  below is the new account they have submitted:
...                     BANK.-HSBC CANADA
...                     Vancouver, CANADA
...                     ACCOUNT NO. 2984-0008-66
...
... Be further informed that this power of attorney also stated that you suffered.
... """>>>>>>defclassify_timeit():...result=cl.classify(SPAM_TEXT_2)...>>>timeit.repeat(classify_timeit,number=5)[0.1810469627380371,0.16121697425842285,0.16121196746826172]>>>

6贡献

[back to top]

有关详细信息,请参阅CONTRIBUTING

6.1路线图

  • 包括更多提高精度的算法
  • python3支持

7许可

[back to top]

Spammy由Tasdik Rahman构建,并根据GPLv3获得许可。

spammy Copyright (C) 2016 Tasdik Rahman(prodicus@outlook.com)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

您可以找到许可证文件的完整副本here

8学分

[back to top]

如果你想在博客的某个地方给我荣誉,或者在tweet上大喊一声@tasdikrahman,那么嘿,我会接受的。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何拆分字符串(基于各种分隔符),但不保留空格?   解析。Json格式的txt文件和knime中的java   java Spring rest api为什么在rest api调用的响应中更改了数据类型   升华文本3抛出java。lang.ClassNotFoundException,而记事本++不存在   java Android指纹扫描仪在尝试5次后停止工作?   java Android如何设置精确的重复报警?   java如何使用HTTPGET connect为access API输入用户名和密码   java当测试报告显示没有测试失败时,Gradle为什么说“有失败的测试”?   用Gson实现java获取响应   MapReduce程序中函数错误的java不可映射参数   java spring安全性不符合自动代理的条件   java GWT使用异步回调进行同步/阻塞调用   java奇怪的类数组问题无法在jsp中显示   如何在java中使用PrinterJob使用epl打印条形码   java如何在JTable中居中单元格   将Java Mockito测试转换为Kotlin   html Java正则表达式模式匹配到多个相同标记   testCompile中缺少java Gradle(Android)多项目依赖项   在输入提示后输入字符串时发生java FileNotFoundException