Python hdx-python-utilities包_程序模块 - PyPI

hdx python实用程序

hdx-python-utilities的Python项目详细描述

hdx python实用程序库提供了一系列有用的实用程序：

Easy downloading of files with support for authentication, streaming and hashing
Loading and saving JSON and YAML (inc. with OrderedDict)
Database utilities (inc. connecting through SSH and SQLAlchemy helpers)
Dictionary and list utilities
HTML utilities (inc. BeautifulSoup helper)
Compare files (eg. for testing)
Simple emailing
Easy logging setup
Path utilities
Text processing
Py3-like raise from for Py2
Check valid UUID
Easy building and packaging

这个库是Humanitarian Data Exchange（hdx）项目的一部分。如果你有人道主义相关数据，请上传到HDX。

用法

库中有详细的api文档，可以在这里找到：http://ocha-dap.github.io/hdx-python-utilities/。库的代码在这里：https://github.com/ocha-dap/hdx-python-utilities。

下载文件

帮助下载文件的各种实用程序。默认情况下包括重试。

例如，给定yaml文件extraparams.yml:

mykey:
    basic_auth: "XXXXXXXX"
    locale: "en"

我们可以创建一个下载程序，如下所示，它将使用basic-auth中定义的身份验证并添加参数 locale=en到每个请求（例如，对于get请求http://myurl/lala?param1=p1&locale=en）：

with Download(user_agent='test', extra_params_yaml='extraparams.yml', extra_params_lookup='mykey') as downloader:
    response = downloader.download(url)  # get requests library response
    json = response.json()

    # Download file to folder/filename
    f = downloader.download_file('http://myurl', post=False,
                                 parameters=OrderedDict([('b', '4'), ('d', '3')]),
                                 folder=tmpdir, filename=filename)
    filepath = abspath(f)

    # Read row by row from tabular file
    for row in downloader.get_tabular_rows('http://myurl/my.csv', dict_rows=True, headers=1)
        a = row['col']

如果我们想要一个用户代理，它将用于所有相关的hdx python实用程序方法（以及所有hdx python api方法如果包含该库），则可以配置一次并自动使用：

UserAgent.set_global('test')
with Download() as downloader:
    response = downloader.download(url)  # get requests library response

其他有用功能：

# Build get url from url and dictionary of parameters
Download.get_url_for_get('http://www.lala.com/hdfa?a=3&b=4',
                         OrderedDict([('c', 'e'), ('d', 'f')]))
# == 'http://www.lala.com/hdfa?a=3&b=4&c=e&d=f'

# Extract url and dictionary of parameters from get url
Download.get_url_params_for_post('http://www.lala.com/hdfa?a=3&b=4',
                                 OrderedDict([('c', 'e'), ('d', 'f')]))
# == ('http://www.lala.com/hdfa',
          OrderedDict([('a', '3'), ('b', '4'), ('c', 'e'), ('d', 'f')]))

加载和保存json和yaml

示例：

# Load YAML
mydict = load_yaml('my_yaml.yml')

# Load 2 YAMLs and merge into dictionary
mydict = load_and_merge_yaml('my_yaml1.yml', 'my_yaml2.yml')

# Load YAML into existing dictionary
mydict = load_yaml_into_existing_dict(existing_dict, 'my_yaml.yml')

# Load JSON
mydict = load_json('my_json.yml')

# Load 2 JSONs and merge into dictionary
mydict = load_and_merge_json('my_json1.json', 'my_json2.json')

# Load JSON into existing dictionary
mydict = load_json_into_existing_dict(existing_dict, 'my_json.json')

# Save dictionary to YAML file in pretty format
# preserving order if it is an OrderedDict
save_yaml(mydict, 'mypath.yml', pretty=True, sortkeys=False)

# Save dictionary to JSON file in compact form
# sorting the keys
save_json(mydict, 'mypath.json', pretty=False, sortkeys=False)

数据库实用程序

这些是在sqlalchemy的基础上构建的，并简化了它的设置。

sqlalchemy数据库表必须继承自 hdx.utilities.database例如：

from hdx.utilities.database import Base
class MyTable(Base):
    my_col = Column(Integer, ForeignKey(MyTable2.col2), primary_key=True)

示例：

# Get SQLAlchemy session object given database parameters and
# if needed SSH parameters. If database is PostgreSQL, will poll
# till it is up.
with Database(database='db', host='1.2.3.4', username='user', password='pass',
              driver='driver', ssh_host='5.6.7.8', ssh_port=2222,
              ssh_username='sshuser', ssh_private_key='path_to_key') as session:
    session.query(...)

# Extract dictionary of parameters from SQLAlchemy url
result = Database.get_params_from_sqlalchemy_url(TestDatabase.sqlalchemy_url)

# Build SQLAlchemy url from dictionary of parameters
result = Database.get_sqlalchemy_url(**TestDatabase.params)

# Wait util PostgreSQL is up
Database.wait_for_postgres('mydatabase', 'myserver', 5432, 'myuser', 'mypass')

字典和列表实用程序

示例：

# Merge dictionaries
d1 = {1: 1, 2: 2, 3: 3, 4: ['a', 'b', 'c']}
d2 = {2: 6, 5: 8, 6: 9, 4: ['d', 'e']}
result = merge_dictionaries([d1, d2])
assert result == {1: 1, 2: 6, 3: 3, 4: ['d', 'e'], 5: 8, 6: 9}

# Diff dictionaries
d1 = {1: 1, 2: 2, 3: 3, 4: {'a': 1, 'b': 'c'}}
d2 = {4: {'a': 1, 'b': 'c'}, 2: 2, 3: 3, 1: 1}
diff = dict_diff(d1, d2)
assert diff == {}
d2[3] = 4
diff = dict_diff(d1, d2)
assert diff == {3: (3, 4)}

# Add element to list in dict
d = dict()
dict_of_lists_add(d, 'a', 1)
assert d == {'a': [1]}
dict_of_lists_add(d, 2, 'b')
assert d == {'a': [1], 2: ['b']}
dict_of_lists_add(d, 'a', 2)
assert d == {'a': [1, 2], 2: ['b']}

# Spread items in list so similar items are further apart
input_list = [3, 1, 1, 1, 2, 2]
result = list_distribute_contents(input_list)
assert result == [1, 2, 1, 2, 1, 3]

# Get values for the same key in all dicts in list
input_list = [{'key': 'd', 1: 5}, {'key': 'd', 1: 1}, {'key': 'g', 1: 2},
              {'key': 'a', 1: 2}, {'key': 'a', 1: 3}, {'key': 'b', 1: 5}]
result = extract_list_from_list_of_dict(input_list, 'key')
assert result == ['d', 'd', 'g', 'a', 'a', 'b']

# Cast either keys or values or both in dictionary to type
d1 = {1: 2, 2: 2.0, 3: 5, 'la': 4}
assert key_value_convert(d1, keyfn=int) == {1: 2, 2: 2.0, 3: 5, 'la': 4}
assert key_value_convert(d1, keyfn=int, dropfailedkeys=True) == {1: 2, 2: 2.0, 3: 5}
d1 = {1: 2, 2: 2.0, 3: 5, 4: 'la'}
assert key_value_convert(d1, valuefn=int) == {1: 2, 2: 2.0, 3: 5, 4: 'la'}
assert key_value_convert(d1, valuefn=int, dropfailedvalues=True) == {1: 2, 2: 2.0, 3: 5}

# Cast keys in dictionary to integer
d1 = {1: 1, 2: 1.5, 3.5: 3, '4': 4}
assert integer_key_convert(d1) == {1: 1, 2: 1.5, 3: 3, 4: 4}

# Cast values in dictionary to integer
d1 = {1: 1, 2: 1.5, 3: '3', 4: 4}
assert integer_value_convert(d1) == {1: 1, 2: 1, 3: 3, 4: 4}

# Cast values in dictionary to float
d1 = {1: 1, 2: 1.5, 3: '3', 4: 4}
assert float_value_convert(d1) == {1: 1.0, 2: 1.5, 3: 3.0, 4: 4.0}

# Average values by key in two dictionaries
d1 = {1: 1, 2: 1.0, 3: 3, 4: 4}
d2 = {1: 2, 2: 2.0, 3: 5, 4: 4, 7: 3}
assert avg_dicts(d1, d2) == {1: 1.5, 2: 1.5, 3: 4, 4: 4}

# Read and write lists to csv
l = [[1, 2, 3, 'a'],
     [4, 5, 6, 'b'],
     [7, 8, 9, 'c']]
write_list_to_csv(l, filepath, headers=['h1', 'h2', 'h3', 'h4'])
newll = read_list_from_csv(filepath)
newld = read_list_from_csv(filepath, dict_form=True, headers=1)
assert newll == [['h1', 'h2', 'h3', 'h4'], ['1', '2', '3', 'a'], ['4', '5', '6', 'b'], ['7', '8', '9', 'c']]
assert newld == [{'h1': '1', 'h2': '2', 'h4': 'a', 'h3': '3'},
                {'h1': '4', 'h2': '5', 'h4': 'b', 'h3': '6'},
                {'h1': '7', 'h2': '8', 'h4': 'c', 'h3': '9'}]

## Convert command line arguments to dictionary
args = 'a=1,big=hello,1=3'
assert args_to_dict(args) == {'a': '1', 'big': 'hello', '1': '3'}

HTML实用程序

这些都是建立在美化组的基础上，并简化其设置。

示例：

# Get soup for url with optional kwarg downloader=Download() object
soup = get_soup('http://myurl', user_agent='test')
# user agent can be set globally using:
# UserAgent.set_global('test')
tag = soup.find(id='mytag')

# Get text of tag stripped of leading and trailing whitespace
# and newlines and with &nbsp replaced with space
result = get_text('mytag')

# Extract HTML table as list of dictionaries
result = extract_table(tabletag)

比较文件

比较两个文件：

result = compare_files(testfile1, testfile2)
# Result is of form eg.:
# ["- coal   ,3      ,7.4    ,'needed'\n",
#  '?         ^\n',
#  "+ coal   ,1      ,7.4    ,'notneeded'\n",
#  '?         ^                +++\n']

电子邮件

设置和发送电子邮件的示例：

smtp_initargs = {
    'host': 'localhost',
    'port': 123,
    'local_hostname': 'mycomputer.fqdn.com',
    'timeout': 3,
    'source_address': ('machine', 456),
}
username = 'user@user.com'
password = 'pass'
email_config_dict = {
    'connection_type': 'ssl',
    'username': username,
    'password': password
}
email_config_dict.update(smtp_initargs)

recipients = ['larry@gmail.com', 'moe@gmail.com', 'curly@gmail.com']
subject = 'hello'
text_body = 'hello there'
html_body = """\
<html>
  <head></head>
  <body>
    <p>Hi!<br>
       How are you?<br>
       Here is the <a href="https://www.python.org">link</a> you wanted.
    </p>
  </body>
</html>
"""
sender = 'me@gmail.com'

with Email(email_config_dict=email_config_dict) as email:
    email.send(recipients, subject, text_body, sender=sender)

配置日志

该库提供彩色日志和一个简单的默认设置，这应该足以满足大多数情况。如果你愿意从默认值更改日志配置，您将需要用参数调用setup_logging。

from hdx.utilities.easy_logging import setup_logging
...
logger = logging.getLogger(__name__)
setup_logging(KEYWORD ARGUMENTS)

keyword参数可以是：

Choose	Argument	Type	Value	Default
One of:	logging_config_dict	dict	Logging configuration dictionary
or	logging_config_json	str	Path to JSON Logging configuration
or	logging_config_yaml	str	Path to YAML Logging configuration	Library's internal logging_configuration.yml
One of:	smtp_config_dict	dict	Email Logging configuration dictionary
or	smtp_config_json	str	Path to JSON Email Logging configuration
or	smtp_config_yaml	str	Path to YAML Email Logging configuration

除非使用默认配置，否则不要提供smtp配置dict、smtp配置json或smtp配置yaml。日志配置！

如果使用的是默认日志记录配置，则可以选择使用发送如果发生严重错误，请通过提供smtp配置dict，smtp配置json或 smtp_config_yaml。以下是yaml文件的模板，可以作为smtp_config_yaml参数传递：

handlers:
    error_mail_handler:
        toaddrs: EMAIL_ADDRESSES
        subject: "RUN FAILED: MY_PROJECT_NAME"

除非重写，否则默认smtp处理程序的邮件服务器mailhost是localhost和from地址 fromaddr是<；noreply@localhost>；。

要在文件中使用日志记录，只需将下面的行添加到每个python文件：

logger = logging.getLogger(__name__)

然后像这样使用记录器：

logger.debug('DEBUG message')
logger.info('INFORMATION message')
logger.warning('WARNING message')
logger.error('ERROR message')
logger.critical('CRITICAL error message')

路径实用程序

示例：

# Gets temporary directory from environment variable
# TEMP_DIR and falls back to os function
temp_folder = get_temp_dir()

# Gets temporary directory from environment variable
# TEMP_DIR and falls back to os function,
# optionally appends the given folder, creates the
# folder and on exiting, deletes the folder
with temp_dir('papa') as tempdir:
    ...

# Get current directory of script
dir = script_dir(ANY_PYTHON_OBJECT_IN_SCRIPT)

# Get current directory of script with filename appended
path = script_dir_plus_file('myfile.txt', ANY_PYTHON_OBJECT_IN_SCRIPT)

文本处理

示例：

## Replace multiple strings in a string simultaneously
a = 'The quick brown fox jumped over the lazy dog. It was so fast!'
result = multiple_replace(a, {'quick': 'slow', 'fast': 'slow', 'lazy': 'busy'})
assert result == 'The slow brown fox jumped over the busy dog. It was so slow!'

# Extract words from a string sentence into a list
result = get_words_in_sentence("Korea (Democratic People's Republic of)")
assert result == ['Korea', 'Democratic', "People's", 'Republic', 'of']

# Find matching text in strings
a = 'The quick brown fox jumped over the lazy dog. It was so fast!'
b = 'The quicker brown fox leapt over the slower fox. It was so fast!'
c = 'The quick brown fox climbed over the lazy dog. It was so fast!'
result = get_matching_text([a, b, c], match_min_size=10)
assert result == ' brown fox  over the  It was so fast!'

从

示例：

# Raise an exception from another exception on Py2 or Py3
except IOError as e:
    raisefrom(IOError, 'My Error Message', e)

有效Uuid

示例：

assert is_valid_uuid('jpsmith') is False
assert is_valid_uuid('c9bf9e57-1685-4c89-bafb-ff5af830be8a') is True

易于建造和包装

setup.py的clean命令已扩展为默认情况下使用--all标志并清除dist文件夹。已创建两个新的命令文件夹。package调用新的clean命令，以及sdist和 bdist_轮。换言之，它彻底清理并构建源和轮分布。publish发布到pypi并创建git标记，例如

python setup.py clean
python setup.py package
python setup.py publish

要使用这些命令，请创建一个setup.py是：

{夫人21}

欢迎加入QQ群-->： 979659372

hdx-python-utilities 1.7.8

hdx-python-utilities的Python项目详细描述

用法

下载文件

加载和保存json和yaml

数据库实用程序

字典和列表实用程序

HTML实用程序

比较文件

电子邮件

配置日志

路径实用程序

文本处理

从

有效Uuid

易于建造和包装

推荐PyPI第三方库

FUCHS

cmakelang

newspaper-no-download

pyess

dynamodbgeo

screcord

flexibox

nipunn-topsis

practical

PyColorText

unSteg

pyvfc

hy015removed

intake-cs109b-data-mma

xyz-aliyun

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

hdx-python-utilities 1.7.8

hdx-python-utilities的Python项目详细描述

用法

下载文件

加载和保存json和yaml

数据库实用程序

字典和列表实用程序

HTML实用程序

比较文件

电子邮件

配置日志

路径实用程序

文本处理

从

有效Uuid

易于建造和包装

推荐PyPI第三方库

FUCHS

cmakelang

newspaper-no-download

pyess

dynamodbgeo

screcord

flexibox

nipunn-topsis

practical

PyColorText

unSteg

pyvfc

hy015removed

intake-cs109b-data-mma

xyz-aliyun

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签