用于xpath和css选择器的cli解释器

parselcli的Python项目详细描述


关于

parselcliparsel包的命令行接口包装器,用于根据web url或本地html文件实时评估css和xpath选择。

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

asciicast

用法

$ parsel --help                                                                                                      
Usage: parsel [OPTIONS] [URL]

  Interactive shell for css and xpath selectors

Options:
  -h TEXT                         request headers, e.g. -h "user-agent=cat
                                  bot"
  -xpath                          start in xpath mode instead of css
  -p, --processors TEXT           comma separated processors: {}
  -f, --file FILENAME             input from html file instead of url
  -c TEXT                         compile css and return it
  -x TEXT                         compile xpath and return it
  --cache                         cache requests
  --config TEXT                   config file  [default:
                                  /home/dex/.config/parsel.toml]
  --embed                         start in embedded python shell
  --shell [ptpython|ipython|bpython|python]
                                  preferred embedded shell; default auto
                                  resolve in order
  --help                          Show this message and exit.

parselcli从url或磁盘读取xml或html文件,并为xpath或css选择器启动解释器。 默认情况下,它以css解释器模式启动,但可以通过-xpath命令切换到xpath,并使用-css切换回xpath。 解释器还具有自动完成功能,并为[进行中的]选择器提供建议。

解释器还支持命令和嵌入pythonptpythonipythonbpython外壳。 可以使用-前缀调用命令。可以通过调用-help命令找到可用命令的列表(请参见示例部分)。

处理器和命令

parsecli支持shell中的标志和命令:

$ parsel "https://github.com/granitosaurus/parsel-cli"                                                               
> -help                                                                                                              
available commands (use -command):
  help: show help
  debug: show debug info
  embed: start interactive python shell
  open: open current url in browser tab
  view: open current html in browser tab
  fetch: download from new url
  css: switch to css selectors
  xpath: switch to xpath selectors
available flags (use +flag to enable and -flag to disable)
  strip: strip every element of trailing and leading spaces
  first: take first element when there's only one
  collapse: collapse lists when only 1 element
  absolute: convert relative urls to absolute
  join: join results into one
  len: return length of results

处理器可以用+前缀激活,也可以用-停用。这些处理器可以在线提供:

> h1::text +strip
['parsel-cli']

或激活整个会话

> +strip 
enabled flag: strip

命令的调用方式与有时使用位置参数时一样:

> -fetch "http://some-other-url.com"
downloading "http://some-other-url.com"
> -view
opening document in browser

示例

$ parsel "https://github.com/granitosaurus/parsel-cli"                                                               
> h1::text                                                                                                           
['\n  ', '\n  ', '\n\n', 'parsel-cli']
> +join +strip                                                                                                       
enabled flag: join
enabled flag: strip
> h1::text                                                                                                           
parsel-cli
> h1::text +len                                                                                                      
4
> -xpath                                                                                                             
switched to xpath
> //h1/text()                                                                                                        
parsel-cli
> -css                                                                                                               
switched to css
> -embed                                                                                                             
>>> locals()                                                                                                         
{'sel': <Selector xpath=None data='<html lang="en">\n  <head>\n    <meta char'>, 'response': <Response [200]>, 'request': <PreparedRequest [GET]>, '_': {...}, '_1': {...}}


>>> response                                                                                                         
<Response [200]>


>>>                                                                                                                  
> -debug                                                                                                             
200-https://github.com/granitosaurus/parsel-cli
enabled processors:
  Join
  Strip
> -help                                                                                                              
available commands (use -command):
  help: show help
  debug: show debug info
  embed: start interactive python shell
  open: open current url in browser tab
  view: open current html in browser tab
  fetch: download from new url
  css: switch to css selectors
  xpath: switch to xpath selectors
available flags (use +flag to enable and -flag to disable)
  strip: strip every element of trailing and leading spaces
  first: take first element when there's only one
  collapse: collapse lists when only 1 element
  absolute: convert relative urls to absolute
  join: join results into one
  len: return length of results

安装

pip install parselcli

或从github安装:

pip install --user git+https://github.com/Granitosaurus/parsel-cli@v0.32.1

配置

parselcli可以通过$XDG_HOME/parsel.toml(通常是~/.config/parsel.toml)中的toml配置文件进行配置:

# default processors (the +flags)
processors = [ "collapse", "strip",]
# where ptpython history is located
history_file_css = "/home/user/.cache/parsel/history_css"
history_file_xpath = "/home/user/.cache/parsel/history_xpath"

[requests]
# when using --cache flag for using cached responses
cache_expire = 86400
# where sqlite cache file is stored for cache
cache_dir = "/home/user/.cache/parsel/requests.cache"

[requests.headers]
# here headers can be defined for requests to avoid bot detection etc.
User-Agent = "parselcli web inspector"
# e.g. chrome on windows use
# User-Agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java在读取属性文件时获取空指针   java NoSuchMethodError:org。springframework。靴子网状物servlet。错误错误控制器。最新SpringCloudStarter NetflixZuul中的getErrorPath()   java Spring不使用相同的JDBC连接   sqlite DB中带方括号的java数据   如何编译基于Maven的Java项目以从命令行运行它   java如何限制cowndown计时器的操作(例如登录)   java如何使用spring和springboot应用程序配置数据库?我想知道如何回答这类问题?   java中的buildpath不支持java。图书馆路径   java如何使用条目集在树映射上迭代?   java如何将IndexOf与Scanner结合使用?   xml Java SAX解析器进程监视   java在多台远程机器上运行并行junit测试   当我尝试在ListView中动态添加项时,单击按钮时java崩溃