Python py-wasapi-client包_程序模块 - PyPI

wasapi data transer api的客户机

py-wasapi-client的Python项目详细描述

py wasapi客户端

[存档它]的客户端wasapi数据传输api。这个客户正在根据ait-specification开发。

要求

Python3.4-3.7

安装

要运行最新的代码，可以下载或克隆wasapi客户端来自GitHub。在py wasapi客户机目录的顶层中，安装时使用：

 $ python setup.py install

或者，最新版本（不保证是最新的代码）可以从PyPi：

安装

 $ pip install py-wasapi-client

安装后，使用以下命令行运行客户端：

 $ wasapi-client --help

给您使用说明：

usage: wasapi-client [-h] [-b BASE_URI] [-d DESTINATION] [-l LOG] [-n] [-v]
                     [--profile PROFILE | -u USER | -t TOKEN]
                     [-c | -m | -p PROCESSES | -s | -r]
                     [--collection COLLECTION [COLLECTION ...]]
                     [--filename FILENAME] [--crawl CRAWL]
                     [--crawl-time-after CRAWL_TIME_AFTER]
                     [--crawl-time-before CRAWL_TIME_BEFORE]
                     [--crawl-start-after CRAWL_START_AFTER]
                     [--crawl-start-before CRAWL_START_BEFORE]

        Download WARC files from a WASAPI access point.

        Acceptable date/time formats are:
         2017-01-01
         2017-01-01T12:34:56
         2017-01-01 12:34:56
         2017-01-01T12:34:56Z
         2017-01-01 12:34:56-0700
         2017
         2017-01

optional arguments:
  -h, --help            show this help message and exit
  -b BASE_URI, --base-uri BASE_URI
                        base URI for WASAPI access; default:
                        https://partner.archive-it.org/wasapi/v1/webdata
  -d DESTINATION, --destination DESTINATION
                        location for storing downloaded files
  -l LOG, --log LOG     file to which logging should be written
  -n, --no-manifest     do not generate checksum files (ignored when used in
                        combination with --manifest)
  -v, --verbose         log verbosely; -v is INFO, -vv is DEBUG
  --profile PROFILE     profile to use for API authentication
  -u USER, --user USER  username for API authentication
  -t TOKEN, --token TOKEN
                        token for API authentication
  -c, --count           print number of files for download and exit
  -m, --manifest        generate checksum files only and exit
  -p PROCESSES, --processes PROCESSES
                        number of WARC downloading processes
  -s, --size            print count and total size of files and exit
  -r, --urls            list URLs for downloadable files only and exit

query parameters:
  parameters for webdata request

  --collection COLLECTION [COLLECTION ...]
                        collection identifier
  --filename FILENAME   exact webdata filename to download
  --crawl CRAWL         crawl job identifier
  --crawl-time-after CRAWL_TIME_AFTER
                        request files created on or after this date/time
  --crawl-time-before CRAWL_TIME_BEFORE
                        request files created before this date/time
  --crawl-start-after CRAWL_START_AFTER
                        request files from crawl jobs starting on or after
                        this date/time
  --crawl-start-before CRAWL_START_BEFORE
                        request files from crawl jobs starting before this
                        date/time

配置

当您使用该工具查询存档时，它是wasapi端点，您需要提供api的用户名和密码。你有提供这些凭据的三个选项。

为用户名提供-u，系统将提示您输入密码。
设置名为“wasapi_user”的环境变量以提供用户名以及一个名为“wasapi_pass”的变量来提供密码。
提供配置中定义的配置文件--profile 文件配置文件应位于~/.wasapi-client。

示例配置文件：

[unt]
username = exampleUser
password = examplePassword

优先顺序是命令行、环境、配置文件。

示例用法

以下命令从爬网下载可用的warc文件使用crawl id256119并将程序输出记录到名为 out.log。程序将提示用户输入用户myusername。下载由一个进程执行。

 $ wasapi-client -u myusername --crawl 256119 --log /tmp/out.log -p 1

下面的命令以类似方式下载，但用户凭据是由配置文件提供。

 $ wasapi-client --profile unt --crawl 256119 --log out.log -p 1

您可以提供api令牌而不是用户凭据。

 $ wasapi-client --token thisistheAPItokenIwasgiven --crawl 256119 --log out.log -p 1

以下命令从爬网下载可用的warc文件在指定时间范围内发生的。详细日志记录正在写入名为out.log的文件。下载是通过4 处理并写入/tmp/wasapi\u warcs/目录。

 $ wasapi-client --profile unt --crawl-start-after 2016-12-22T13:01:00 --crawl-start-before 2016-12-22T15:11:00  -vv --log out.log -p 4 -d /tmp/wasapi_warcs/

以下命令生成所有内容的大小和文件计数可供用户使用。

 $ wasapi-client --profile unt -s

下面的命令向用户提供给定的查询参数。

 $ wasapi-client --profile unt --crawl 256119 -c

以下命令将名为example.warc.gz的文件下载到当前工作目录。

$ wasapi-client --profile unt --filename example.warc.gz

默认情况下，生成清单文件以提供要下载的文件。为每个哈希算法生成一个清单文件由wasapi访问点提供。清单文件被写入下载目的地。如果不需要清单文件，请使用--no清单标志

$ wasapi-client --profile unt --crawl 256119 --log out.log --no-manifest

如果要为可用的WebData文件生成清单文件在不实际下载webdata文件的情况下，使用--manifest标志。

$ wasapi-client --profile unt --crawl 256119 --manifest

如果您想生成一个url列表，其中您的webdata文件可以以后由另一个工具（如wget）下载，而不是 wasapi客户端执行下载，使用--url标志。

$ wasapi-client --profile unt --crawl 256119 --urls

运行测试

$ python setup.py test

或

$ pip install tox
$ tox

欢迎加入QQ群-->： 979659372

py-wasapi-client 1.0.0

py-wasapi-client的Python项目详细描述

py wasapi客户端

要求

安装

配置

示例用法

运行测试

推荐PyPI第三方库

patchworkdocker

llc-tools

smc3rateware

aws-coco

101703312-outlierRemoval

permetrics

EatWhat

pytest-hoverfly-wrapper

txdir

salvo

smartwrappers

magicsound

increc

pyndex-fin

hmvpack-NG

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

py-wasapi-client 1.0.0

py-wasapi-client的Python项目详细描述

py wasapi客户端

要求

安装

配置

示例用法

运行测试

推荐PyPI第三方库

patchworkdocker

llc-tools

smc3rateware

aws-coco

101703312-outlierRemoval

permetrics

EatWhat

pytest-hoverfly-wrapper

txdir

salvo

smartwrappers

magicsound

increc

pyndex-fin

hmvpack-NG

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签