Python parselcli包_程序模块 - PyPI

用于xpath和css选择器的cli解释器

parselcli的Python项目详细描述

关于

parselcli是parsel包的命令行接口包装器，用于根据web url或本地html文件实时评估css和xpath选择。

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

用法

$ parsel --help                                                                                                      
Usage: parsel [OPTIONS] [URL]

  Interactive shell for css and xpath selectors

Options:
  -h TEXT                         request headers, e.g. -h "user-agent=cat
                                  bot"
  -xpath                          start in xpath mode instead of css
  -p, --processors TEXT           comma separated processors: {}
  -f, --file FILENAME             input from html file instead of url
  -c TEXT                         compile css and return it
  -x TEXT                         compile xpath and return it
  --cache                         cache requests
  --config TEXT                   config file  [default:
                                  /home/dex/.config/parsel.toml]
  --embed                         start in embedded python shell
  --shell [ptpython|ipython|bpython|python]
                                  preferred embedded shell; default auto
                                  resolve in order
  --help                          Show this message and exit.

parselcli从url或磁盘读取xml或html文件，并为xpath或css选择器启动解释器。默认情况下，它以css解释器模式启动，但可以通过-xpath命令切换到xpath，并使用-css切换回xpath。解释器还具有自动完成功能，并为[进行中的]选择器提供建议。

解释器还支持命令和嵌入python、ptpython、ipython和bpython外壳。可以使用-前缀调用命令。可以通过调用-help命令找到可用命令的列表（请参见示例部分）。

处理器和命令

parsecli支持shell中的标志和命令：

$ parsel "https://github.com/granitosaurus/parsel-cli"                                                               
> -help                                                                                                              
available commands (use -command):
  help: show help
  debug: show debug info
  embed: start interactive python shell
  open: open current url in browser tab
  view: open current html in browser tab
  fetch: download from new url
  css: switch to css selectors
  xpath: switch to xpath selectors
available flags (use +flag to enable and -flag to disable)
  strip: strip every element of trailing and leading spaces
  first: take first element when there's only one
  collapse: collapse lists when only 1 element
  absolute: convert relative urls to absolute
  join: join results into one
  len: return length of results

处理器可以用+前缀激活，也可以用-停用。这些处理器可以在线提供：

> h1::text +strip
['parsel-cli']

或激活整个会话

> +strip 
enabled flag: strip

命令的调用方式与有时使用位置参数时一样：

> -fetch "http://some-other-url.com"
downloading "http://some-other-url.com"
> -view
opening document in browser

示例

$ parsel "https://github.com/granitosaurus/parsel-cli"                                                               
> h1::text                                                                                                           
['\n  ', '\n  ', '\n\n', 'parsel-cli']
> +join +strip                                                                                                       
enabled flag: join
enabled flag: strip
> h1::text                                                                                                           
parsel-cli
> h1::text +len                                                                                                      
4
> -xpath                                                                                                             
switched to xpath
> //h1/text()                                                                                                        
parsel-cli
> -css                                                                                                               
switched to css
> -embed                                                                                                             
>>> locals()                                                                                                         
{'sel': <Selector xpath=None data='<html lang="en">\n  <head>\n    <meta char'>, 'response': <Response [200]>, 'request': <PreparedRequest [GET]>, '_': {...}, '_1': {...}}


>>> response                                                                                                         
<Response [200]>


>>>                                                                                                                  
> -debug                                                                                                             
200-https://github.com/granitosaurus/parsel-cli
enabled processors:
  Join
  Strip
> -help                                                                                                              
available commands (use -command):
  help: show help
  debug: show debug info
  embed: start interactive python shell
  open: open current url in browser tab
  view: open current html in browser tab
  fetch: download from new url
  css: switch to css selectors
  xpath: switch to xpath selectors
available flags (use +flag to enable and -flag to disable)
  strip: strip every element of trailing and leading spaces
  first: take first element when there's only one
  collapse: collapse lists when only 1 element
  absolute: convert relative urls to absolute
  join: join results into one
  len: return length of results

安装

pip install parselcli

或从github安装：

pip install --user git+https://github.com/Granitosaurus/parsel-cli@v0.32.1

配置

parselcli可以通过$XDG_HOME/parsel.toml（通常是~/.config/parsel.toml）中的toml配置文件进行配置：

# default processors (the +flags)
processors = [ "collapse", "strip",]
# where ptpython history is located
history_file_css = "/home/user/.cache/parsel/history_css"
history_file_xpath = "/home/user/.cache/parsel/history_xpath"

[requests]
# when using --cache flag for using cached responses
cache_expire = 86400
# where sqlite cache file is stored for cache
cache_dir = "/home/user/.cache/parsel/requests.cache"

[requests.headers]
# here headers can be defined for requests to avoid bot detection etc.
User-Agent = "parselcli web inspector"
# e.g. chrome on windows use
# User-Agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"

欢迎加入QQ群-->： 979659372

parselcli 0.33

parselcli的Python项目详细描述

关于

用法

处理器和命令

示例

安装

配置

推荐PyPI第三方库

adw

aicsdaemon

pyreindexer

inveniobase

iou-distributions

STA-distributions

bcj-cffi

tap-zoom

pytorch-pfn-extras

shanes-scrapers

NIPTool

koala-crawler

distromath

dsnd-gaussian-distribution

sci-distributions

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

parselcli 0.33

parselcli的Python项目详细描述

关于

用法

处理器和命令

示例

安装

配置

推荐PyPI第三方库

adw

aicsdaemon

pyreindexer

inveniobase

iou-distributions

STA-distributions

bcj-cffi

tap-zoom

pytorch-pfn-extras

shanes-scrapers

NIPTool

koala-crawler

distromath

dsnd-gaussian-distribution

sci-distributions

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签