hdx python实用程序

hdx-python-utilities的Python项目详细描述


Build StatusCoverage Status

hdx python实用程序库提供了一系列有用的实用程序:

  1. Easy downloading of files with support for authentication, streaming and hashing
  2. Loading and saving JSON and YAML (inc. with OrderedDict)
  3. Database utilities (inc. connecting through SSH and SQLAlchemy helpers)
  4. Dictionary and list utilities
  5. HTML utilities (inc. BeautifulSoup helper)
  6. Compare files (eg. for testing)
  7. Simple emailing
  8. Easy logging setup
  9. Path utilities
  10. Text processing
  11. Py3-like raise from for Py2
  12. Check valid UUID
  13. Easy building and packaging

这个库是Humanitarian Data Exchange(hdx)项目的一部分。如果你有 人道主义相关数据,请上传到HDX。

用法

库中有详细的api文档,可以在这里找到:http://ocha-dap.github.io/hdx-python-utilities/。 库的代码在这里:https://github.com/ocha-dap/hdx-python-utilities

下载文件

帮助下载文件的各种实用程序。默认情况下包括重试。

例如,给定yaml文件extraparams.yml:

mykey:
    basic_auth: "XXXXXXXX"
    locale: "en"

我们可以创建一个下载程序,如下所示,它将使用basic-auth中定义的身份验证并添加参数 locale=en到每个请求(例如,对于get请求http://myurl/lala?param1=p1&locale=en):

with Download(user_agent='test', extra_params_yaml='extraparams.yml', extra_params_lookup='mykey') as downloader:
    response = downloader.download(url)  # get requests library response
    json = response.json()

    # Download file to folder/filename
    f = downloader.download_file('http://myurl', post=False,
                                 parameters=OrderedDict([('b', '4'), ('d', '3')]),
                                 folder=tmpdir, filename=filename)
    filepath = abspath(f)

    # Read row by row from tabular file
    for row in downloader.get_tabular_rows('http://myurl/my.csv', dict_rows=True, headers=1)
        a = row['col']

如果我们想要一个用户代理,它将用于所有相关的hdx python实用程序方法(以及所有hdx python api方法 如果包含该库),则可以配置一次并自动使用:

UserAgent.set_global('test')
with Download() as downloader:
    response = downloader.download(url)  # get requests library response

其他有用功能:

# Build get url from url and dictionary of parameters
Download.get_url_for_get('http://www.lala.com/hdfa?a=3&b=4',
                         OrderedDict([('c', 'e'), ('d', 'f')]))
# == 'http://www.lala.com/hdfa?a=3&b=4&c=e&d=f'

# Extract url and dictionary of parameters from get url
Download.get_url_params_for_post('http://www.lala.com/hdfa?a=3&b=4',
                                 OrderedDict([('c', 'e'), ('d', 'f')]))
# == ('http://www.lala.com/hdfa',
          OrderedDict([('a', '3'), ('b', '4'), ('c', 'e'), ('d', 'f')]))

加载和保存json和yaml

示例:

# Load YAML
mydict = load_yaml('my_yaml.yml')

# Load 2 YAMLs and merge into dictionary
mydict = load_and_merge_yaml('my_yaml1.yml', 'my_yaml2.yml')

# Load YAML into existing dictionary
mydict = load_yaml_into_existing_dict(existing_dict, 'my_yaml.yml')

# Load JSON
mydict = load_json('my_json.yml')

# Load 2 JSONs and merge into dictionary
mydict = load_and_merge_json('my_json1.json', 'my_json2.json')

# Load JSON into existing dictionary
mydict = load_json_into_existing_dict(existing_dict, 'my_json.json')

# Save dictionary to YAML file in pretty format
# preserving order if it is an OrderedDict
save_yaml(mydict, 'mypath.yml', pretty=True, sortkeys=False)

# Save dictionary to JSON file in compact form
# sorting the keys
save_json(mydict, 'mypath.json', pretty=False, sortkeys=False)

数据库实用程序

这些是在sqlalchemy的基础上构建的,并简化了它的设置。

sqlalchemy数据库表必须继承自 hdx.utilities.database例如:

from hdx.utilities.database import Base
class MyTable(Base):
    my_col = Column(Integer, ForeignKey(MyTable2.col2), primary_key=True)

示例:

# Get SQLAlchemy session object given database parameters and
# if needed SSH parameters. If database is PostgreSQL, will poll
# till it is up.
with Database(database='db', host='1.2.3.4', username='user', password='pass',
              driver='driver', ssh_host='5.6.7.8', ssh_port=2222,
              ssh_username='sshuser', ssh_private_key='path_to_key') as session:
    session.query(...)

# Extract dictionary of parameters from SQLAlchemy url
result = Database.get_params_from_sqlalchemy_url(TestDatabase.sqlalchemy_url)

# Build SQLAlchemy url from dictionary of parameters
result = Database.get_sqlalchemy_url(**TestDatabase.params)

# Wait util PostgreSQL is up
Database.wait_for_postgres('mydatabase', 'myserver', 5432, 'myuser', 'mypass')

字典和列表实用程序

示例:

# Merge dictionaries
d1 = {1: 1, 2: 2, 3: 3, 4: ['a', 'b', 'c']}
d2 = {2: 6, 5: 8, 6: 9, 4: ['d', 'e']}
result = merge_dictionaries([d1, d2])
assert result == {1: 1, 2: 6, 3: 3, 4: ['d', 'e'], 5: 8, 6: 9}

# Diff dictionaries
d1 = {1: 1, 2: 2, 3: 3, 4: {'a': 1, 'b': 'c'}}
d2 = {4: {'a': 1, 'b': 'c'}, 2: 2, 3: 3, 1: 1}
diff = dict_diff(d1, d2)
assert diff == {}
d2[3] = 4
diff = dict_diff(d1, d2)
assert diff == {3: (3, 4)}

# Add element to list in dict
d = dict()
dict_of_lists_add(d, 'a', 1)
assert d == {'a': [1]}
dict_of_lists_add(d, 2, 'b')
assert d == {'a': [1], 2: ['b']}
dict_of_lists_add(d, 'a', 2)
assert d == {'a': [1, 2], 2: ['b']}

# Spread items in list so similar items are further apart
input_list = [3, 1, 1, 1, 2, 2]
result = list_distribute_contents(input_list)
assert result == [1, 2, 1, 2, 1, 3]

# Get values for the same key in all dicts in list
input_list = [{'key': 'd', 1: 5}, {'key': 'd', 1: 1}, {'key': 'g', 1: 2},
              {'key': 'a', 1: 2}, {'key': 'a', 1: 3}, {'key': 'b', 1: 5}]
result = extract_list_from_list_of_dict(input_list, 'key')
assert result == ['d', 'd', 'g', 'a', 'a', 'b']

# Cast either keys or values or both in dictionary to type
d1 = {1: 2, 2: 2.0, 3: 5, 'la': 4}
assert key_value_convert(d1, keyfn=int) == {1: 2, 2: 2.0, 3: 5, 'la': 4}
assert key_value_convert(d1, keyfn=int, dropfailedkeys=True) == {1: 2, 2: 2.0, 3: 5}
d1 = {1: 2, 2: 2.0, 3: 5, 4: 'la'}
assert key_value_convert(d1, valuefn=int) == {1: 2, 2: 2.0, 3: 5, 4: 'la'}
assert key_value_convert(d1, valuefn=int, dropfailedvalues=True) == {1: 2, 2: 2.0, 3: 5}

# Cast keys in dictionary to integer
d1 = {1: 1, 2: 1.5, 3.5: 3, '4': 4}
assert integer_key_convert(d1) == {1: 1, 2: 1.5, 3: 3, 4: 4}

# Cast values in dictionary to integer
d1 = {1: 1, 2: 1.5, 3: '3', 4: 4}
assert integer_value_convert(d1) == {1: 1, 2: 1, 3: 3, 4: 4}

# Cast values in dictionary to float
d1 = {1: 1, 2: 1.5, 3: '3', 4: 4}
assert float_value_convert(d1) == {1: 1.0, 2: 1.5, 3: 3.0, 4: 4.0}

# Average values by key in two dictionaries
d1 = {1: 1, 2: 1.0, 3: 3, 4: 4}
d2 = {1: 2, 2: 2.0, 3: 5, 4: 4, 7: 3}
assert avg_dicts(d1, d2) == {1: 1.5, 2: 1.5, 3: 4, 4: 4}

# Read and write lists to csv
l = [[1, 2, 3, 'a'],
     [4, 5, 6, 'b'],
     [7, 8, 9, 'c']]
write_list_to_csv(l, filepath, headers=['h1', 'h2', 'h3', 'h4'])
newll = read_list_from_csv(filepath)
newld = read_list_from_csv(filepath, dict_form=True, headers=1)
assert newll == [['h1', 'h2', 'h3', 'h4'], ['1', '2', '3', 'a'], ['4', '5', '6', 'b'], ['7', '8', '9', 'c']]
assert newld == [{'h1': '1', 'h2': '2', 'h4': 'a', 'h3': '3'},
                {'h1': '4', 'h2': '5', 'h4': 'b', 'h3': '6'},
                {'h1': '7', 'h2': '8', 'h4': 'c', 'h3': '9'}]

## Convert command line arguments to dictionary
args = 'a=1,big=hello,1=3'
assert args_to_dict(args) == {'a': '1', 'big': 'hello', '1': '3'}

HTML实用程序

这些都是建立在美化组的基础上,并简化其设置。

示例:

# Get soup for url with optional kwarg downloader=Download() object
soup = get_soup('http://myurl', user_agent='test')
# user agent can be set globally using:
# UserAgent.set_global('test')
tag = soup.find(id='mytag')

# Get text of tag stripped of leading and trailing whitespace
# and newlines and with &nbsp replaced with space
result = get_text('mytag')

# Extract HTML table as list of dictionaries
result = extract_table(tabletag)

比较文件

比较两个文件:

result = compare_files(testfile1, testfile2)
# Result is of form eg.:
# ["- coal   ,3      ,7.4    ,'needed'\n",
#  '?         ^\n',
#  "+ coal   ,1      ,7.4    ,'notneeded'\n",
#  '?         ^                +++\n']

电子邮件

设置和发送电子邮件的示例:

smtp_initargs = {
    'host': 'localhost',
    'port': 123,
    'local_hostname': 'mycomputer.fqdn.com',
    'timeout': 3,
    'source_address': ('machine', 456),
}
username = 'user@user.com'
password = 'pass'
email_config_dict = {
    'connection_type': 'ssl',
    'username': username,
    'password': password
}
email_config_dict.update(smtp_initargs)

recipients = ['larry@gmail.com', 'moe@gmail.com', 'curly@gmail.com']
subject = 'hello'
text_body = 'hello there'
html_body = """\
<html>
  <head></head>
  <body>
    <p>Hi!<br>
       How are you?<br>
       Here is the <a href="https://www.python.org">link</a> you wanted.
    </p>
  </body>
</html>
"""
sender = 'me@gmail.com'

with Email(email_config_dict=email_config_dict) as email:
    email.send(recipients, subject, text_body, sender=sender)

配置日志

该库提供彩色日志和一个简单的默认设置,这应该足以满足大多数情况。如果你愿意 从默认值更改日志配置,您将需要用参数调用setup_logging

from hdx.utilities.easy_logging import setup_logging
...
logger = logging.getLogger(__name__)
setup_logging(KEYWORD ARGUMENTS)

keyword参数可以是:

ChooseArgumentTypeValueDefault
One of:logging_config_dictdictLogging configuration
dictionary
orlogging_config_jsonstrPath to JSON
Logging configuration
orlogging_config_yamlstrPath to YAML
Logging configuration
Library's internal
logging_configuration.yml
One of:smtp_config_dictdictEmail Logging
configuration dictionary
orsmtp_config_jsonstrPath to JSON Email
Logging configuration
orsmtp_config_yamlstrPath to YAML Email
Logging configuration

除非使用默认配置,否则不要提供smtp配置dictsmtp配置jsonsmtp配置yaml。 日志配置!

如果使用的是默认日志记录配置,则可以选择使用发送 如果发生严重错误,请通过提供smtp配置dictsmtp配置jsonsmtp_config_yaml。以下是yaml文件的模板,可以作为smtp_config_yaml参数传递:

handlers:
    error_mail_handler:
        toaddrs: EMAIL_ADDRESSES
        subject: "RUN FAILED: MY_PROJECT_NAME"

除非重写,否则默认smtp处理程序的邮件服务器mailhostlocalhost和from地址 fromaddr是<;noreply@localhost>;。

要在文件中使用日志记录,只需将下面的行添加到 每个python文件:

logger = logging.getLogger(__name__)

然后像这样使用记录器:

logger.debug('DEBUG message')
logger.info('INFORMATION message')
logger.warning('WARNING message')
logger.error('ERROR message')
logger.critical('CRITICAL error message')

路径实用程序

示例:

# Gets temporary directory from environment variable
# TEMP_DIR and falls back to os function
temp_folder = get_temp_dir()

# Gets temporary directory from environment variable
# TEMP_DIR and falls back to os function,
# optionally appends the given folder, creates the
# folder and on exiting, deletes the folder
with temp_dir('papa') as tempdir:
    ...

# Get current directory of script
dir = script_dir(ANY_PYTHON_OBJECT_IN_SCRIPT)

# Get current directory of script with filename appended
path = script_dir_plus_file('myfile.txt', ANY_PYTHON_OBJECT_IN_SCRIPT)

文本处理

示例:

## Replace multiple strings in a string simultaneously
a = 'The quick brown fox jumped over the lazy dog. It was so fast!'
result = multiple_replace(a, {'quick': 'slow', 'fast': 'slow', 'lazy': 'busy'})
assert result == 'The slow brown fox jumped over the busy dog. It was so slow!'

# Extract words from a string sentence into a list
result = get_words_in_sentence("Korea (Democratic People's Republic of)")
assert result == ['Korea', 'Democratic', "People's", 'Republic', 'of']

# Find matching text in strings
a = 'The quick brown fox jumped over the lazy dog. It was so fast!'
b = 'The quicker brown fox leapt over the slower fox. It was so fast!'
c = 'The quick brown fox climbed over the lazy dog. It was so fast!'
result = get_matching_text([a, b, c], match_min_size=10)
assert result == ' brown fox  over the  It was so fast!'

示例:

# Raise an exception from another exception on Py2 or Py3
except IOError as e:
    raisefrom(IOError, 'My Error Message', e)

有效Uuid

示例:

assert is_valid_uuid('jpsmith') is False
assert is_valid_uuid('c9bf9e57-1685-4c89-bafb-ff5af830be8a') is True

易于建造和包装

setup.py的clean命令已扩展为默认情况下使用--all标志并清除dist文件夹。 已创建两个新的命令文件夹。package调用新的clean命令,以及sdistbdist_轮。换言之,它彻底清理并构建源和轮分布。publish发布 到pypi并创建git标记,例如

python setup.py clean
python setup.py package
python setup.py publish

要使用这些命令,请创建一个setup.py是:

{夫人21}

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java Jgit对于给定的存储库,我们如何确定新提交的列表,以及每个提交来自哪个分支?   从MS Access数据库添加java ComboBoxItem   如何禁止Java列表中不同类的实例?   java在没有Web的JAXR上使用Shiro过滤器。xml   由于java原因,无法在Ubuntu上安装Netbeans 8.2。awt。未找到恐怖和辅助技术   java JUnit对RuntimeException的处理(特别是)   java空集合在Apache CXF服务(JAXWS)中被转换为null   java CannotAcquireLockException问题   sql如何在数据库中对(Java)枚举建模(使用SQL92)   安卓在Java中获取友好url后面的文件名   java如何访问数组名以获取列表?   javascript Java Nashorn longBitsToDouble   java控制台<init>错误   java将一个LinkedList追加/连接到另一个LinkedList的最有效方式是什么?   Java for正在跳过的循环   java帮助创建带有动画的复杂Swing GUI   java Android编辑文本。setHint在片段中不工作