<p>查看<code>unicodedata</code><a href="https://docs.python.org/3/library/unicodedata.html" rel="nofollow noreferrer">module</a>。在</p>
<pre><code>>>> import unicodedata
>>> word = 'कुरुक्षेत्र'
</code></pre>
<p>分配给每个字符的名称:</p>
^{pr2}$
<p>分配给每个字符的常规类别:</p>
<pre><code>>>> for ch in word:
print(unicodedata.category(ch))
Lo
Mn
Lo
Mn
Lo
Mn
Lo
Mn
Lo
Mn
Lo
</code></pre>
<p><a href="http://www.fileformat.info/info/unicode/category/index.htm" rel="nofollow noreferrer">FileFormat.info</a>具有Unicode字符类别列表。在</p>
<p>看看这是否是你想要达到的目标:</p>
<pre><code>import unicodedata
def split_clusters(txt):
""" Generate grapheme clusters for the Devanagari text."""
stop = '्'
cluster = u''
end = None
for char in txt:
category = unicodedata.category(char)
if (category == 'Lo' and end == stop) or category == 'Mn':
cluster = cluster + char
else:
if cluster:
yield cluster
cluster = char
end = char
if cluster:
yield cluster
</code></pre>
<p>测试功能:</p>
<pre><code>>>> list(split_clusters('धर्मक्षेत्रे'))
['ध', 'र्म', 'क्षे', 'त्रे']
>>> list(split_clusters('कुरुक्षेत्र'))
['कु', 'रु', 'क्षे', 'त्र']
</code></pre>