验尸器

adidentifier的Python项目详细描述


#A识别器
[![pypi版本](https://img.shields.io/pypi/pyversions/adidentifier.svg)(https://pypi.python.org/pypi/adidentifier)
[![PYPI](https://img.shields.io/pypi/v/adidentifier.svg)(https://www.pypi.shields.hields.io/pypi/v/adidentifier.svg)(https://pypi.pypi.pypypypypi.org/pypi/adidentifier)安装安装的前提条件:
>


先决条件:
*来自谷歌的RE2库;来自谷歌的RE2;git clone https://github.com/google/re2.git&;cd re2&;make&;make install安装
>>

*python的开发标题
>>;易易易易易易安装PY开发人员


*cython 0.20+(pip install cython)
>>;$pip install cython




>
>先决条件安装完成后,安装如下(pip3针对python3的pip3):
>;$pip install https://github.com/andreasvc/pyresvc/pyre2/archive/master.zip


>或
>;$git clone git://github.com/andreasvc/pyresvc/pyresvc/pyre2.git



>
>2$cd pyrebr/>
>$ make install

then
>$ pip install adidentifier

## Usage

### Import
```python
from adidentifier import AdIdentifier
```
### Initialize
```python
ad = AdIdentifier()
```
## API
### is_finance(text)
Check whether the text or url is relevent to Finance.
```python
测试1=["安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"-"安全的"。utm_source=bd4-pc-ss&utm_medium=bd4SEM&utm_campaign=D1-%BE%BA%C6%B7%B4%CA-YD&utm_content=%BE%BA%C6%B7%B4%CA-%C3%FB%B4%CA&utm_term=p2p%CD%F8%B4%FB"]
for test in test1:
resu = ad.is_finance(text)
print text,"------->>", resu
```
> Output:
```
速贷之家-借钱不担心_2小时到账----->;>true
https://www.aiqianzhan.com/html/register3_bd4.html?utm_source=bd4-pc-ss&utm_medium=bd4SEM&utm_campaign=D1-%BE%BA%C6%B7%B4%CA-YD&utm_content=%BE%BA%C6%B7%B4%CA-%C3%FB%B4%CA&utm_term=p2p%CD%F8%B4%FB ------->> True
```
### is_ad(url)
Check whether the url is relevent to AD
```python
test2 = ["https://ss3.baidu.com/-rVXeDTa2gU2pMbgoY3K/it/u=3778907493,3669893773&fm=202&mola=new&crop=v1",
"https://ss2.bdstatic.com/8_v1bjqh_q23odcf/pacific/upload_25289207_1521622472509.png?X=0&Y=0&H=150&W=242&VH=92.98&VW=150.00&W=150.00&W=242.00,
"http://pagaad2.googlesyndication.com/pagaad/show_adds.js",
"http://w w w.googletagservices.com/tag/js/gpt_mobile.js"]
"http://www.googletagtagtagtagservices.com/tag/js/gpt/gpt_mobile.js"]
对于adtexts2中的文本,adtexts2中的文本:对于adtexts2中的文本,adtexts2中的文本文本,其打印(文本)
resu ad.is-ad."——————————"路透,resu)
`````
>>;输出:
`````
('https://ss3.baidu.com/-rvxedta2gu2p2p2p2p2p2p2pbgoy3k/it/u=377890747493366969893773&;fm=202&;mola=new&;crop=v1','---->;'gt;'true)
('https://ss2.bdstatic.com/8.u v1bjqh-q3odcf/pacific/upload-25289207u 1521622424722509.png?是的x=0&y=0&h=150&w=242&vh=92.98&vw=150.00&oh=150.00&ow=242.00', '------>>', True)
('http://pagead2.googlesyndication.com/pagead/show_ads.js', '------>>', True)
('http://www.googletagservices.com/tag/js/gpt_mobile.js', '------>>', False)
```

### get_target_from_href(href)
Extract t他从超链接指向url。例如:https://www.baidu.com/…%asdd--->;https://www.wdzj.com/…1%e8%b4%b7

``` python
打印广告。从www.baidu.com/baidu.php获取目标?url=0f0000jsnOdydCYpIY2xQXFCV1h5YmZnZh_pWjXI1sMrqQiM8Y55S59-6yXvznN6gm_5K2BIwOl4qzVcr2qRUIZdYnyTM2gOTAL-ed0xhaXP7ZI4XoxPJtWsnc4vPT3Qgcpo8dLTicCsAu_tZqqn5DH0sVytFArXV5kfFxBwLN5Kyia2R0.DD_NR2Ar5Od663rj6t8ae9zC63p_jnNKtAlEuw9zsISgZsIoDgQvTVxQgzdtEZ-LTEuzk3x5I9qxo9vU_5Mvmxgv3IhOj4en5VS8ZutEOOS1j4SrZdSyZxg9tqhZden5o3OOOqhZ1tT5ot_rSEj4en5ovmxgkl32AM-WI69IKX1BSIT7JJZRLL5SPYT5G4MGWWGWGWRKRKRKRKRKW5GWGKL32AM-CFHYUMX5KWWYWWYWWYU9QX7UNYQZUUUNYQZUUUL4F.U1YK0YK0ZDQ1XBYSSKYNKY5TL3V5V5WYWWYNW0GYQNWW0ATQNWW0ATQNWYYWW00KWYWKR5L7JHZL7JJJHZL7JHZL7JHZL7JJHZL7JHZLZL5GWZL5GWZL5GWWWW-T1PW0K0AVG5H00TMFQP1CZ0ANGU型JYKPJMVG1CVGWW4G1CKNH0YG0YG0YG0YG0YG5HCSP5HCSP0KVM1YLPJDKNJNJNJNJNJNJNJNJNKNJNJNJNJNJNJNJ0KNH0KW0KW0KNH0YG0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0AF5HCSP5HCSP5HCSP0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0KW0AFNHNLPWTLRJMKPMVNHFK0ZF-TGFQNHRZPHCyRH0KNJ0DPSK1PYFQRHNHMW-9M10SNJ0SNJ0SNJ0SNJ0SNJ0SNJ0SNJ0SNJ0SNJ0SUAVRWWWWWWWW7WJ9RFK9M1YK99M1YK99M1YK0YKWWWWWWWW7WJ9WJ99WJ9WJ9W9WJ9W9W9W9W9W9W9W9W9M1YK0W9W9W9W9W9W9W9W9W9W9W9W9W9WW9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9W9H00MYWWA7M5HD0UUW5H00MLFW5HFSPHMV&US=0.0.0.0.0.101&Amp;ck=0.0.0.0.0.0.0&shh=www.baidu.com&sht=baidu"
```
>;输出:
``shell
https://www.wdzj.com/zhuti/518lcj/?_pwk=n_4_1_1_1_3_5_4_s%E5%BF%85%E4%BA%89%E8%AF%8D|%E7%BD%91%E8%B4%B7|%E7%BD%91%E8%B4%B7&utm_source=baidu&utm_medium=cpc&tm_content=search&utm_campaign=%E7%BD%91%E8%B4%B7&utm_term=%E7%BD%91%E8%B4%B7
```

### get_domain_from_url(href)
Extract the domain from a url 例如,https://www.asdasd.com/asdasd--->;www.asdasd.com www.asdasd.com

``python
``从网址(https://www.asdasd.com/asdasd)
`````
`>>>;输出:
``shell
``shell
www.asdasd.com www.asdasd.com
````



``自动生成配置信息。
>```ini
[自定义]
uri_keywords=qian,dai,cf,wd,jin
文本_keywords=网贷
广告_filter=https://ss3.baidu.com/*,https://ss2.bdstatic.com/*
````

!!!这是一个金融领域,一个金融领域,一个路径路径路径,一个路径路径路径。省略。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java筛选hibernate中引用的属性   java如何在Bluej中创建以long(integer)为参数的对象   java如何通过JDBC在access中创建新字段   java如何获取格式化日期?   用模板方法模式设计过滤器接口   java编译错误:缺少返回语句   java从JOOQ中的代码生成中排除特定的模式   java小程序生命周期:init()和start()与destroy()和stop()之间的实际区别是什么?   如何在Java中获取类的所有公共静态方法?   匿名onClick方法内的java活动结果   java如何从数千个具有良好性能的寄存器构建excel工作表?   标记Java中多级中断的适当使用   网络化基于Java的Telnet代理服务器,支持TLS。但为什么是java。网SocketException:连接被拒绝   java将magnolia升级到5.6.1,vaadin资源   springcloudnetflixhystrix中的java重写HystrixCommandSpect bean