ipapy是一个python模块,用于处理ipa字符串

ipap的Python项目详细描述


ipapy是一个使用国际拼音的python模块 字母表(IPA)字符串。

安装

$ pip install ipapy

$ git clone https://github.com/pettarin/ipapy.git
$ cd ipapy

用法

作为python模块

############ IMPORTS ############fromipapyimportUNICODE_TO_IPAfromipapyimportis_valid_ipafromipapy.ipacharimportIPAConsonantfromipapy.ipacharimportIPAVowelfromipapy.ipastringimportIPAString############ IPAChar ############# Def.: an IPAChar is an IPA letter or diacritic/suprasegmental/tone mark# create IPAChar from its Unicode representationc1=UNICODE_TO_IPA[u"a"]# vowel open front unroundedc2=UNICODE_TO_IPA[u"e"]# vowel close-mid front unroundedc3=UNICODE_TO_IPA[u"\u03B2"]# consonant voiced bilabial non-sibilant-fricativetS1=UNICODE_TO_IPA[u"t͡ʃ"]# consonant voiceless palato-alveolar sibilant-affricatetS2=UNICODE_TO_IPA[u"t͜ʃ"]# consonant voiceless palato-alveolar sibilant-affricatetS3=UNICODE_TO_IPA[u"tʃ"]# consonant voiceless palato-alveolar sibilant-affricatetS4=UNICODE_TO_IPA[u"ʧ"]# consonant voiceless palato-alveolar sibilant-affricatetS5=UNICODE_TO_IPA[u"\u0074\u0361\u0283"]# consonant voiceless palato-alveolar sibilant-affricatetS6=UNICODE_TO_IPA[u"\u0074\u035C\u0283"]# consonant voiceless palato-alveolar sibilant-affricatetS7=UNICODE_TO_IPA[u"\u0074\u0283"]# consonant voiceless palato-alveolar sibilant-affricatetS8=UNICODE_TO_IPA[u"\u02A7"]# consonant voiceless palato-alveolar sibilant-affricatec1==c2# Falsec1==c3# Falsec1==tS1# FalsetS1==tS2# True (they both point to the same IPAChar object)tS1==tS3# True (idem)tS1==tS4# True (idem)tS1==tS5# True (idem)tS1==tS6# True (idem)tS1==tS7# True (idem)tS1==tS8# True (idem)# create custom IPACharsmy_a1=IPAVowel(name="my_a_1",descriptors=u"open front unrounded",unicode_repr=u"a")my_a2=IPAVowel(name="my_a_2",descriptors=[u"open","front","unrounded"],unicode_repr=u"a")my_a3=IPAVowel(name="my_a_3",height=u"open",backness=u"front",roundness=u"unrounded",unicode_repr=u"a")my_a4=IPAVowel(name="my_a_4",descriptors=[u"low",u"fnt","unr"],unicode_repr=u"a")my_ee=IPAVowel(name="my_e_1",descriptors=u"close-mid front unrounded",unicode_repr=u"e")my_b1=IPAConsonant(name="bilabial fricative",descriptors=u"voiced bilabial non-sibilant-fricative",unicode_repr=u"\u03B2")my_b2=IPAConsonant(name="bf",voicing=u"voiced",place=u"bilabial",manner=u"non-sibilant-fricative",unicode_repr=u"\u03B2")my_tS=IPAConsonant(name="tS",voicing=u"voiceless",place=u"palato-alveolar",manner=u"sibilant-affricate",unicode_repr=u"t͡ʃ")my_a1==my_a2# False (two different objects)my_a1==c1# False (two different objects)my_a1==UNICODE_TO_IPA["a"]# False (two different objects)# associate non-standard Unicode representationmy_aa=IPAVowel(name="a special",descriptors=[u"low",u"fnt",u"unr"],unicode_repr=u"a{*}")print(my_aa)# "a{*}"# equality vs. equivalencemy_tS==tS1# False (my_tS is a different object than tS1)my_tS.is_equivalent(tS1)# True  (my_tS is equivalent to tS1...)tS1.is_equivalent(my_tS)# True  (... and vice versa)# compare IPAChar objectsmy_a1.is_equivalent(my_a2)# Truemy_a1.is_equivalent(my_a3)# Truemy_a1.is_equivalent(my_a4)# Truemy_a1.is_equivalent(my_ee)# Falsemy_a1.is_equivalent(my_b1)# Falsemy_b1.is_equivalent(my_b2)# Truemy_b1.is_equivalent(my_tS)# False# compare IPAChar and a Unicode stringmy_b1.is_equivalent(u"\u03B2")# Truemy_b1.is_equivalent(u"β")# Truemy_b1.is_equivalent(u"b")# Falsemy_tS.is_equivalent(u"tS")# Falsemy_tS.is_equivalent(u"tʃ")# False (missing the combining diacritic)my_tS.is_equivalent(u"t͡ʃ")# True (has combining diacritic)# compare IPAChar and a string listing descriptorsmy_a1.is_equivalent(u"open front unrounded")# False (missing 'vowel')my_a1.is_equivalent(u"open front unrounded vowel")# Truemy_a1.is_equivalent(u"low fnt unr vwl")# True (known abbreviations are good as well)my_ee.is_equivalent(u"open front unrounded vowel")# Falsemy_b1.is_equivalent(u"voiced bilabial non-sibilant-fricative")# False (missing 'consonant')my_b1.is_equivalent(u"voiced bilabial non-sibilant-fricative consonant")# Truemy_b1.is_equivalent(u"consonant non-sibilant-fricative bilabial voiced")# True (the order does not matter)my_b1.is_equivalent(u"consonant non-sibilant-fricative bilabial voiceless")# False# compare IPAChar and list of descriptorsmy_a1.is_equivalent([u"open",u"front",u"unrounded"])# Falsemy_a1.is_equivalent([u"vowel",u"open",u"front",u"unrounded"])# Truemy_a1.is_equivalent([u"open",u"unrounded",u"vowel",u"front"])# Truemy_a1.is_equivalent([u"low",u"fnt",u"unr",u"vwl"])# True############## IPAString ############### Def.: an IPAString is a list of IPAChar objects# check if Unicode string contains only IPA valid characterss_uni=u"əˈkiːn æˌkænˈθɑ.lə.d͡ʒi"# Unicode string of the IPA pronunciation for "achene acanthology"is_valid_ipa(s_uni)# Trueis_valid_ipa(u"LoL")# False (uppercase letter L is not IPA valid)# create IPAString from list of IPAChar objectsnew_s_ipa=IPAString(ipa_chars=[c3,c2,tS1,c1])# create IPAString from Unicode strings_ipa=IPAString(unicode_string=s_uni)# IPAString is similar to regular Python string objectprint(s_ipa)# "əˈkiːn æˌkænˈθɑ.lə.d͡ʒi"len(s_ipa)# 21s_ipa[0]# (first IPA char)s_ipa[5:8]# (6th, 7th, 8th IPA chars)s_ipa[19:]# (IPA chars from the 20th)s_ipa[-1]# (last IPA char)len(new_s_ipa)# 4new_s_ipa.append(UNICODE_TO_IPA[u"a"])# (append IPA char "a")len(new_s_ipa)# 5new_s_ipa.append(UNICODE_TO_IPA[u"t͡ʃ"])# (append IPA char "t͡ʃ")len(new_s_ipa)# 6new_s_ipa.extend(s_ipa)# (append s_ipa to new_s_ipa)len(new_s_ipa)# 27double=s_ipa+new_s_ipa# (concatenate s_ipa and new_s_ipa)len(double)# 48# new IPAString objects containing only...print(s_ipa.consonants)# "knknθld͡ʒ"                (consonants)print(s_ipa.vowels)# "əiææɑəi"                 (vowels)print(s_ipa.letters)# "əkinækænθɑləd͡ʒi"         (vowels and consonants)print(s_ipa.cns_vwl)# "əkinækænθɑləd͡ʒi"         (vowels and consonants)print(s_ipa.cns_vwl_pstr)# "əˈkinækænˈθɑləd͡ʒi"       (  + primary stress marks)print(s_ipa.cns_vwl_pstr_long)# "əˈkiːnækænˈθɑləd͡ʒi"      (    + long marks)print(s_ipa.cns_vwl_str)# "əˈkinæˌkænˈθɑləd͡ʒi"      (  + stress marks)print(s_ipa.cns_vwl_str_len)# "əˈkiːnæˌkænˈθɑləd͡ʒi"     (    + length marks)print(s_ipa.cns_vwl_str_len_wb)# "əˈkiːn æˌkænˈθɑləd͡ʒi"    (      + word breaks)print(s_ipa.cns_vwl_str_len_wb_sb)# "əˈkiːn æˌkænˈθɑ.lə.d͡ʒi"  (        + syllable breaks)cns=s_ipa.consonants# (store new IPA string)cns==s_ipa.consonants# False (two different objects)cns.is_equivalent(s_ipa.consonants)# Truecns.is_equivalent(s_ipa)# False# print representation and name of all IPAChar objects in IPAStringforcins_ipa:print(u"%s\t%s"%(c,c.name))# ə vowel mid central unrounded# ˈ suprasegmental primary-stress# k consonant voiceless velar plosive# i vowel close front unrounded# ː suprasegmental long# n consonant voiced alveolar nasal#   suprasegmental word-break# æ vowel near-open front unrounded# ˌ suprasegmental secondary-stress# k consonant voiceless velar plosive# æ vowel near-open front unrounded# n consonant voiced alveolar nasal# ˈ suprasegmental primary-stress# θ consonant voiceless dental non-sibilant-fricative# ɑ vowel open back unrounded# . suprasegmental syllable-break# l consonant voiced alveolar lateral-approximant# ə vowel mid central unrounded# . suprasegmental syllable-break# d͡ʒ   consonant voiced palato-alveolar sibilant-affricate# i vowel close front unrounded# compare IPAString objectss_ipa_d=IPAString(unicode_string=u"diff")s_ipa_1=IPAString(unicode_string=u"at͡ʃe")s_ipa_2=IPAString(unicode_string=u"aʧe")s_ipa_3=IPAString(unicode_string=u"at͡ʃe",single_char_parsing=True)s_ipa_d==s_ipa_1# Falses_ipa_1==s_ipa_2# False (different objects)s_ipa_1==s_ipa_3# False (different objects)s_ipa_2==s_ipa_3# False (different objects)s_ipa_d.is_equivalent(s_ipa_1)# Falses_ipa_1.is_equivalent(s_ipa_2)# Trues_ipa_2.is_equivalent(s_ipa_1)# Trues_ipa_1.is_equivalent(s_ipa_3)# Trues_ipa_2.is_equivalent(s_ipa_3)# True# compare IPAString and list of IPAChar objectss_ipa_1.is_equivalent([my_a1,my_tS,my_ee])# True# compare IPAString and Unicode strings_ipa_d.is_equivalent(u"diff")# Trues_ipa_1.is_equivalent(u"atse")# Falses_ipa_1.is_equivalent(u"atSe")# Falses_ipa_1.is_equivalent(u"at͡ʃe")# Trues_ipa_1.is_equivalent(u"at͜ʃe")# Trues_ipa_1.is_equivalent(u"aʧe")# Trues_ipa_1.is_equivalent(u"at͡ʃeLOL",ignore=True)# True (ignore chars non IPA valid)s_ipa_1.is_equivalent(u"at͡ʃeLoL",ignore=True)# False (ignore chars non IPA valid, note extra "o")######################### CONVERSION FUNCTIONS #########################fromipapy.kirshenbaummapperimportKirshenbaumMapperkmapper=KirshenbaumMapper()# mapper to Kirshenbaum ASCII IPAs_k_ipa=kmapper.map_ipa_string(s_ipa)# u"@'ki:n#&,k&n'TA#l@#dZi"s_k_uni=kmapper.map_unicode_string(s_uni)# u"@'ki:n#&,k&n'TA#l@#dZi"s_k_ipa==s_k_uni# Trues_k_lis=kmapper.map_unicode_string(s_uni,return_as_list=True)# [u'@', u"'", u'k', u'i', u':', u'n', u'#', u'&', u',', u'k', u'&', u'n', u"'", u'T', u'A', u'#', u'l', u'@', u'#', u'dZ', u'i']fromipapy.arpabetmapperimportARPABETMapperamapper=ARPABETMapper()# mapper to ARPABET ASCII IPA (stress marks not supported yet)s_a=amapper.map_unicode_string(u"pɹuːf")# error: long suprasegmental not mappeds_a=amapper.map_unicode_string(u"pɹuːf",ignore=True)# u"PRUWF"s_a=amapper.map_unicode_string(u"pɹuːf",ignore=True,return_as_list=True)# [u'P', u'R', u'UW', u'F']

作为命令行工具

ipapy附带了一个命令行工具,可以在 给定Unicode UTF-8编码字符串,表示IPA字符串。 因此,建议在支持utf-8的shell上运行它。

目前,支持的操作有:

  • canonize:规范ipa字符串的unicode表示形式
  • chars:列出出现在ipa字符串中的所有ipa字符
  • check:检查给定的unicode字符串是否是ipa有效的
  • clean:删除IPA无效的字符
  • u2a:打印相应的arpabet(ascii ipa)字符串
  • u2k:打印相应的kirshenbaum(ascii ipa)字符串

使用--help参数运行以列出所有可用选项:

$ python -m ipapy --help

usage: __main__.py [-h][-i][-p][--separator [SEPARATOR]][-s][-u]command string

ipapy perform a command on the given IPA/Unicode string

positional arguments:
  command[canonize|chars|check|clean|u2a|u2k]
  string                String to canonize, check, clean, or convert

optional arguments:
  -h, --help            show this help message and exit
  -i, --ignore          Ignore Unicode characters that are not IPA valid
  -p, --print-invalid   Print Unicode characters that are not IPA valid
  --separator [SEPARATOR]
                        Print IPA chars separated by this character (default:
                        '')
  -s, --single-char-parsing
                        Perform single character parsing instead of maximal
                        parsing
  -u, --unicode         Print each Unicode character that is not IPA valid
                        with its Unicode codepoint and name

示例:

$ python -m ipapy canonize "eʧiu"
et͡ʃiu

$ python -m ipapy canonize "eʧiu" --separator " "
e t͡ʃ i u

$ python -m ipapy chars "eʧiu"'e' vowel close-mid front unrounded (U+0065)'t͡ʃ'   consonant voiceless palato-alveolar sibilant-affricate (U+0074 U+0361 U+0283)'i' vowel close front unrounded (U+0069)'u' vowel close back rounded (U+0075)

$ python -m ipapy chars "et͡ʃiu"'e' vowel close-mid front unrounded (U+0065)'t͡ʃ'   consonant voiceless palato-alveolar sibilant-affricate (U+0074 U+0361 U+0283)'i' vowel close front unrounded (U+0069)'u' vowel close back rounded (U+0075)

$ python -m ipapy chars "et͡ʃiu" -s
'e' vowel close-mid front unrounded (U+0065)'t' consonant voiceless alveolar plosive (U+0074)'͡' diacritic tie-bar-above (U+0361)'ʃ' consonant voiceless palato-alveolar sibilant-fricative (U+0283)'i' vowel close front unrounded (U+0069)'u' vowel close back rounded (U+0075)

$ python -m ipapy check "eʧiu"
True

$ python -m ipapy check "LoL"
False

$ python -m ipapy check "LoL" -p
False
LL

$ python -m ipapy check "LoLOL" -p -u
False
LLOL
'L' 0x4c    LATIN CAPITAL LETTER L
'O' 0x4f    LATIN CAPITAL LETTER O

$ python -m ipapy clean "/eʧiu/"
eʧiu

$ python -m ipapy u2k "eʧiu"
etSiu

$ python -m ipapy u2k "eTa"
The given string contains characters not IPA valid. Use the 'ignore' option to ignore them.

$ python -m ipapy u2k "eTa" -i
ea

$ python -m ipapy u2a "eʧiu" --separator " "
EH CH IH UW

单元测试

$ python run_all_unit_tests.py

许可证

ipapy在mit许可下发布。

致谢

  • bram vanroy为windows用户提供了一个setup.py修复程序

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
安卓 studio安装的java Unity本机广告   java如何将映射转换为对象   java我试图使用rest控制器从h2数据库中检索记录,但它说没有数据集   反思为什么会抛出java。lang.InstanceException?   在opensuse中找不到Java/javac   java为Android上的谷歌地图添加了5900多个标记。如何有效地做   java如何在if语句中使用循环   java如何在JPA(Hibernate)中映射一对多关系和复合主键?   如何在Java中读取和写入外部进程?   Java线程。睡眠时间最短   java使用EclipseGradle插件如何离线托管和使用依赖项(库jar文件)   java为什么虚拟引用在排队时没有被清除?   java无法理解如何创建用于响铃报警的取消按钮   java解析不应通过注入容器错误发生   java Toast或ProgressDialog不显示   java在自定义对象上使用优先级队列的更好方法   java格式的。wmv文件。(或者任何视频文件都很好)   从页面调用另一个侦听器后,不会调用java JSF<f:ajax>侦听器   java注释ConfigApplicationContext不能多次刷新有什么原因吗?