用于xpath和css选择器的cli解释器
parselcli的Python项目详细描述
关于
parselcli
是parsel包的命令行接口包装器,用于根据web url或本地html文件实时评估css和xpath选择。
Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
用法
$ parsel --help
Usage: parsel [OPTIONS] [URL]
Interactive shell for css and xpath selectors
Options:
-h TEXT request headers, e.g. -h "user-agent=cat
bot"
-xpath start in xpath mode instead of css
-p, --processors TEXT comma separated processors: {}
-f, --file FILENAME input from html file instead of url
-c TEXT compile css and return it
-x TEXT compile xpath and return it
--cache cache requests
--config TEXT config file [default:
/home/dex/.config/parsel.toml]
--embed start in embedded python shell
--shell [ptpython|ipython|bpython|python]
preferred embedded shell; default auto
resolve in order
--help Show this message and exit.
parselcli
从url或磁盘读取xml或html文件,并为xpath或css选择器启动解释器。
默认情况下,它以css解释器模式启动,但可以通过-xpath
命令切换到xpath,并使用-css
切换回xpath。
解释器还具有自动完成功能,并为[进行中的]选择器提供建议。
解释器还支持命令和嵌入python
、ptpython
、ipython
和bpython
外壳。
可以使用-
前缀调用命令。可以通过调用-help
命令找到可用命令的列表(请参见示例部分)。
处理器和命令
parsecli
支持shell中的标志和命令:
$ parsel "https://github.com/granitosaurus/parsel-cli"
> -help
available commands (use -command):
help: show help
debug: show debug info
embed: start interactive python shell
open: open current url in browser tab
view: open current html in browser tab
fetch: download from new url
css: switch to css selectors
xpath: switch to xpath selectors
available flags (use +flag to enable and -flag to disable)
strip: strip every element of trailing and leading spaces
first: take first element when there's only one
collapse: collapse lists when only 1 element
absolute: convert relative urls to absolute
join: join results into one
len: return length of results
处理器可以用+
前缀激活,也可以用-
停用。这些处理器可以在线提供:
> h1::text +strip
['parsel-cli']
或激活整个会话
> +strip
enabled flag: strip
命令的调用方式与有时使用位置参数时一样:
> -fetch "http://some-other-url.com"
downloading "http://some-other-url.com"
> -view
opening document in browser
示例
$ parsel "https://github.com/granitosaurus/parsel-cli"
> h1::text
['\n ', '\n ', '\n\n', 'parsel-cli']
> +join +strip
enabled flag: join
enabled flag: strip
> h1::text
parsel-cli
> h1::text +len
4
> -xpath
switched to xpath
> //h1/text()
parsel-cli
> -css
switched to css
> -embed
>>> locals()
{'sel': <Selector xpath=None data='<html lang="en">\n <head>\n <meta char'>, 'response': <Response [200]>, 'request': <PreparedRequest [GET]>, '_': {...}, '_1': {...}}
>>> response
<Response [200]>
>>>
> -debug
200-https://github.com/granitosaurus/parsel-cli
enabled processors:
Join
Strip
> -help
available commands (use -command):
help: show help
debug: show debug info
embed: start interactive python shell
open: open current url in browser tab
view: open current html in browser tab
fetch: download from new url
css: switch to css selectors
xpath: switch to xpath selectors
available flags (use +flag to enable and -flag to disable)
strip: strip every element of trailing and leading spaces
first: take first element when there's only one
collapse: collapse lists when only 1 element
absolute: convert relative urls to absolute
join: join results into one
len: return length of results
安装
pip install parselcli
或从github安装:
pip install --user git+https://github.com/Granitosaurus/parsel-cli@v0.32.1
配置
parselcli
可以通过$XDG_HOME/parsel.toml
(通常是~/.config/parsel.toml
)中的toml
配置文件进行配置:
# default processors (the +flags)
processors = [ "collapse", "strip",]
# where ptpython history is located
history_file_css = "/home/user/.cache/parsel/history_css"
history_file_xpath = "/home/user/.cache/parsel/history_xpath"
[requests]
# when using --cache flag for using cached responses
cache_expire = 86400
# where sqlite cache file is stored for cache
cache_dir = "/home/user/.cache/parsel/requests.cache"
[requests.headers]
# here headers can be defined for requests to avoid bot detection etc.
User-Agent = "parselcli web inspector"
# e.g. chrome on windows use
# User-Agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"