导出Unihan to Python,Data Package,CSV,JSON and Yaml
unihan-tabular的Python项目详细描述
unihan tabular-将UNIHAN生成表格友好格式的工具 比如python、json、csv和yaml。是cihai项目的一部分。
UNIHAN的数据分散在多个文件中,格式为:
U+3400 kCantonese jau1 U+3400 kDefinition (same as U+4E18 丘) hillock or mound U+3400 kMandarin qiū U+3401 kCantonese tim2 U+3401 kDefinition to lick; to taste, a mat, bamboo bark U+3401 kHanyuPinyin 10019.020:tiàn U+3401 kMandarin tiàn
$ unihan-tabular将下载unihan.zip并将所有文件构建到 单一表格友好格式。
csv(默认),$ unihan-tabular:
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin 㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū 㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn
json,$ unihan-tabular-F json:
[{"char":"㐀","ucn":"U+3400","kCantonese":"jau1","kDefinition":"(same as U+4E18 丘) hillock or mound","kHanyuPinyin":null,"kMandarin":"qiū"},{"char":"㐁","ucn":"U+3401","kCantonese":"tim2","kDefinition":"to lick; to taste, a mat, bamboo bark","kHanyuPinyin":"10019.020:tiàn","kMandarin":"tiàn"}]
山药$ unihan-tabular-F yaml:
-char:㐀kCantonese:jau1kDefinition:(same as U+4E18 丘) hillock or moundkHanyuPinyin:nullkMandarin:qiūucn:U+3400-char:㐁kCantonese:tim2kDefinition:to lick; to taste, a mat, bamboo barkkHanyuPinyin:10019.020:tiànkMandarin:tiànucn:U+3401
功能
- 自动从Internet下载Unihan
- 通过-F 导出到json、csv和yaml(需要pyyaml)
- 可配置为通过-f 导出特定字段
- 解释由于Unicode内容太多而导致的编码冲突
- 为未来的中日韩(中文,日文, 韩语)数据集
- cjk库cihai的核心组件和依赖关系
- data package支持
- 支持python 2.7,>;=3.5和pypy
如果您遇到问题或有疑问,请create an issue。
用法
unihan-tabular支持命令行参数。有关如何指定自定义列、文件的信息,请参见unihan-tabular CLI arguments。 下载URL和输出目标。
下载并构建您自己的Unihan导出:
$ pip install unihan-tabular
要输出csv,默认格式:
$ unihan-tabular
输出json:
$ unihan-tabular -F json
要输出yaml:
$ pip install pyyaml $ unihan-tabular -F yaml
只输出csv中的kdefinition字段:
$ unihan-tabular -f kDefinition
要输出多个字段,请用空格分隔:
$ unihan-tabular -f kCantonese kDefinition
要输出到自定义文件:
$ unihan-tabular --destination ./exported.csv
输出到自定义文件(模板文件扩展名):
$ unihan-tabular --destination ./exported.{ext}
有关高级用法示例,请参见unihan-tabular CLI arguments。
结构
# output w/ JSON {XDG data dir}/unihan_tabular/unihan.json # output w/ CSV {XDG data dir}/unihan_tabular/unihan.csv # output w/ yaml (requires pyyaml) {XDG data dir}/unihan_tabular/unihan.yaml # script to download + build a SDF csv of unihan. unihan_tabular/process.py # unit tests to verify behavior / consistency of builder tests/* # python 2/3 compatibility module unihan_tabular/_compat.py # utility / helper functions unihan_tabular/util.py