Unicode类别数据库

unicategories的Python项目详细描述


单一类别

Unicode类别数据库,在安装时生成。

此模块公开包含RangeGroup实例的类别字典。

示例

fromunicategoriesimportcategoriesupperchars=categories['Lu'].characters()# iteratorprint('Unicode uppercase caracters are "%s"'%''.join(upperchars))# Unicode uppercase caracters are "ABCDEF..."

范围组

不可变iterable(基于元组,使用一些有用的方法)的(开始,结束) 元组就像python的range,在末尾打开。

为了提高存储效率,我们选择了这种方法,分别存储 记忆中的字符会占用大量的记忆。

rangegroup类提供以下方法:

range_group.characters()

Get iterator with all characters on this range group.

:yields:iterator of characters (str of size 1):ytype:str

range_group.codes()

Get iterator for all unicode code points contained in this range group.

:yields:iterator of character index (int):ytype:int

range_group.has(character)

Get if character (or character code point) is contained by any range on
this range group.

:param character:character or unicode code point to look for:type character:str or int:returns:True if character is contained by any range, False otherwise:rtype:bool

Unicode类别


取自wikipedia

ValueCategory Major, minorBasic typeCharacter assignedFixedRemarks
LuLetter, uppercaseGraphicCharacter
LlLetter, lowercaseGraphicCharacter
LtLetter, titlecaseGraphicCharacterLigatures containing uppercase followed by lowercase letters (e.g., ^{} , ^{} , ^{} , and ^{} )
LmLetter, modifierGraphicCharacter
LoLetter, otherGraphicCharacter
MnMark, nonspacingGraphicCharacter
McMark, spacing combiningGraphicCharacter
MeMark, enclosingGraphicCharacter
NdNumber, decimal digitGraphicCharacterAll these, and only these, have Numeric Type = De
NlNumber, letterGraphicCharacterNumerals composed of letters or letterlike symbols (e.g., Roman numerals )
NoNumber, otherGraphicCharacterE.g., vulgar fractions , superscript and subscript digits
PcPunctuation, connectorGraphicCharacterIncludes "_" underscore
PdPunctuation, dashGraphicCharacterIncludes several hyphen characters
PsPunctuation, openGraphicCharacterOpening bracket characters
PePunctuation, closeGraphicCharacterClosing bracket characters
PiPunctuation, initial quoteGraphicCharacterOpening quotation mark . Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
PfPunctuation, final quoteGraphicCharacterClosing quotation mark. May behave like Ps or Pe depending on usage
PoPunctuation, otherGraphicCharacter
SmSymbol, mathGraphicCharacter
ScSymbol, currencyGraphicCharacter
SkSymbol, modifierGraphicCharacter
SoSymbol, otherGraphicCharacter
ZsSeparator, spaceGraphicCharacterIncludes the space, but not TAB , CR , or LF , which are Cc
ZlSeparator, lineFormatCharacterOnly U+2028 LINE SEPARATOR (LSEP)
ZpSeparator, paragraphFormatCharacterOnly U+2029 PARAGRAPH SEPARATOR (PSEP)
CcOther, controlControlCharacterFixed 65No name , ^{}
CfOther, formatFormatCharacterIncludes the soft hyphen , control characters to support bi-directional text , and language tag characters
CsOther, surrogateSurrogateNot (but abstract)Fixed 2,048No name , ^{}
CoOther, private usePrivate-useNot (but abstract)Fixed 137,468 total: 6,400 in BMP , 131,068 in Planes 15–16No name , ^{}
CnOther, not assignedNoncharacterNotFixed 66No name , ^{}
CnOther, not assignedReservedNotNot fixedNo name , ^{}

除此之外,unicategories还提供一般类别LMNPSZC

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
数组解析来自输入java的不同类型的数据   多用户上传java文件   java MalformedChunkCodingException:分块流意外结束   http在建立与网站的连接时,如何确保域级SSL证书存在于信任存储中?   java如何在安卓中播放服务器上的音频   java我可以使用jmock替换工厂返回的实现吗?   java JsonParsing:UTF8编码:JsonParseException:非法的无引号字符   java Hibernate与JPA注释问题惰性对象   JavaSpringWebApplistener问题   java mongodb集群更新减缓了spring boot上的整体测试   java图像接收(web)服务器   Android aapt工具无法正确生成R.Java   java摩托罗拉Android 2.2摄像头忽略额外的输出参数   java在Web应用程序中显示R脚本图形   java将日期与当前日期进行比较会产生NumberFormatException   java应用程序在ViewPager上强制关闭?   java如何放置这些XMLAPI。罐子,斯塔克萨皮斯。jar或xercesImpl。jar:到我的pom,xml   JavaGlassFish 4.1,JPA2.1持久性。xml   java什么是SAML元数据?   Checkstyle Java泛型:“?”前面没有空格