Python webgrep-tool包_程序模块 - PyPI

类似于grep的web页面工具，具有js deobfocusation和易扩展性等附加功能

webgrep-tool的Python项目详细描述

Platform

简介

这个自包含的工具依赖于著名的grep工具来重新映射网页。它几乎绑定了原始工具的每一个选项，还提供了一些附加功能，比如在重新映射下载的资源之前在图像上去除javascript或appyling ocr。

系统要求

这个脚本在Ubuntu16.04上用Python2.7和Python3.5进行了测试。

它的python逻辑主要使用标准的内置模块，但也使用一些特定的工具或与预处理器相关的模块。它调用grep。

安装

$ sudo pip install webgrep-tool

Behind a proxy ?
Do not forget to add option --proxy=http://[user]:[pwd]@[host]:[port] to your pip command.

快速启动

帮助

$ webgrep --help
usage: webgrep [OPTION]... PATTERN [URL]...

Search for PATTERN in each input URL and its related resources
(images, scripts and style sheets).
By default,
- resources are NOT downloaded
- response HTTP headers are NOT included in grepping ; use '--include-headers'
- PATTERN is a basic regular expression (BRE) ; use '-E' for extended (ERE)
Important note: webgrep does not handle recursion (in other words, it does not
               spider additional web pages).
Examples:
 webgrep example http://www.example.com     # will only grep on HTML code
 webgrep -r example http://www.example.com  # will only grep on LOCAL images, ...
 webgrep -R example http://www.example.com  # will only grep on ALL images, ...

Regexp selection and interpretation:
 -e REGEXP, --regexp REGEXP
                       use PATTERN for matching
 -f FILE, --file FILE  obtain PATTERN from FILE
 -E, --extended-regexp
                       PATTERN is an extended regular expression (ERE)
 -F, --fixed-strings   PATTERN is a set of newline-separated fixed strings
 -G, --basic-regexp    PATTERN is a basic regular expression (BRE)
 -P, --perl-regexp     PATTERN is a Perl regular expression
 -i, --ignore-case     ignore case distinctions
 -w, --word-regexp     force PATTERN to match only whole words
 -x, --line-regexp     force PATTERN to match only whole lines
 -z, --null-data       a data line ends in 0 byte, not newline

Miscellaneous:
 -s, --no-messages     suppress error messages
 -v, --invert-match    select non-matching lines
 -V, --version         print version information and exit
 --help                display this help and exit
 --verbose             verbose mode
 --keep-files          keep temporary files in the temporary directory
 --temp-dir TMP        define the temporary directory (default: /tmp/webgrep)

Output control:
 -m NUM, --max-count NUM
                       stop after NUM matches
 -b, --byte-offset     print the byte offset with output lines
 -n, --line-number     print line number with output lines
 --line-buffered       flush output on every line
 -H, --with-filename   print the file name for each match
 -h, --no-filename     suppress the file name prefix on output
 --label LABEL         use LABEL as the standard input filename prefix
 -o, --only-matching   show only the part of a line matching PATTERN
 -q, --quiet, --silent
                       suppress all normal output
 --binary-files TYPE   assume that binary files are TYPE;
                       TYPE is 'binary', 'text', or 'without-match'
 -a, --text            equivalent to --binary-files=text
 -I                    equivalent to --binary-files=without-match
 -L, --files-without-match
                       print only names of FILEs containing no match
 -l, --files-with-match
                       print only names of FILEs containing matches
 -c, --count           print only a count of matching lines per FILE
 -T, --initial-tab     make tabs line up (if needed)
 -Z, --null            print 0 byte after FILE name

Context control:
 -B NUM, --before-context NUM
                       print NUM lines of leading context
 -A NUM, --after-context NUM
                       print NUM lines of trailing context
 -C NUM, --context NUM
                       print NUM lines of output context

Web options:
 -r, --local-resources
                       also grep local resources (same-origin)
 -R, --all-resources   also grep all resources (even non-same-origin)
 --include-headers     also grep HTTP headers
 --cookie COOKIE       use a session cookie in the HTTP headers
 --referer REFERER     provide the referer in the HTTP headers

Proxy settings (by default, system proxy settings are used):
 -d, --disable-proxy   manually disable proxy
 --http-proxy HTTP     manually set the HTTP proxy
 --https-proxy HTTPS   manually set the HTTPS proxy

Please report bugs on GitHub: https://github.com/dhondta/webgrep

示例

$ ./webgrep -R Welcome https://github.com
      Welcome home, <br>developers

设计原则：

Python构建模块的最大使用。

非标准进口；

如果未安装触发器退出，并显示安装这些的命令

如果未安装，请不要触发退出，显示安装这些命令并继续执行而不使用相关函数

没有模块性（自包含工具的原理），因此可以简单地在/usr/bin中复制它，而依赖项不是非标准导入。

资源处理程序

定义：

resource（正在处理的内容）：网页、图像、javascript、css
handler（如何处理资源）：css未统一、ocr、去模糊、exif数据检索，…

处理程序在代码的# --...-- HANDLERS SECTION --...--中定义。当前可用的处理程序：

图像

exif：使用exiftool
隐写术：使用steghide（密码为空）
字符串：使用strings
ocr：使用tesseract

脚本

javascript美化和除臭：使用jsbeautifier

样式

未统一：使用正则表达式

注意：css文件中的图像也会被处理。

问题管理

如果你想贡献或提交建议，请open an Issue。

如果要生成并提交新处理程序，请打开一个拉取请求。

欢迎加入QQ群-->： 979659372

webgrep-tool 1.13

webgrep-tool的Python项目详细描述

目录

简介

系统要求

安装

快速启动

设计原则：

资源处理程序

问题管理

推荐PyPI第三方库

nesterrr01

pymtattl

drf-friendly-errors-mod

righthook

pycloudfuse

tissuebox

django-supergeneric

logcat-parser

django-C3PO

PyIOboard

runpy2

xcode_releasemaker

alooma-hvac

spcxbutcher

oh-nester

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

webgrep-tool 1.13

webgrep-tool的Python项目详细描述

目录

简介

系统要求

安装

快速启动

设计原则：

资源处理程序

问题管理

推荐PyPI第三方库

nesterrr01

pymtattl

drf-friendly-errors-mod

righthook

pycloudfuse

tissuebox

django-supergeneric

logcat-parser

django-C3PO

PyIOboard

runpy2

xcode_releasemaker

alooma-hvac

spcxbutcher

oh-nester

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签