TEI阅读器
tei-reader的Python项目详细描述
python 3库,用于读取tei p5(lite)文件的文本内容和元数据
库的重点是从文件中提取主文本内容,并提供有关文本的可用元数据。
tl;dr
pip install tei-reader
fromtei_readerimportTeiReaderreader=TeiReader()corpora=reader.read_file('example-tei.xml')# or read_stringprint(corpora.text)# show element attributes before the actual element textprint(corpora.tostring(lambdax,text:str(list(a.key+'='+a.textforainx.attributes))+text))
更多说明
可以使用TeiReader()
打开读卡器。然后可以调用read_file(file_name)
或read_string(str)
。两者都将返回包含以下属性的Corpora
对象:
Property | Description |
---|---|
^{ | A corpora can contain sub-corpora. |
^{ | The ^{ |
Corpora
和Document
都继承自Element
。在由此派生的所有对象中,都可以调用:
Property | Description |
---|---|
^{ | Contain attributes applicable to this element. If an attribute contains attributes these are also returned. (e.g. ^{ |
^{ | Get the entire text content as ^{ |
^{ | Recursively get all the text divisions in document order. If an element contains parts or text without tag. Those will be returned in order and wrapped with a ^{ |
^{ | Recursively get the parts in document order constituting the entire text e.g. if something has emphasis, a footnote or is marked as foreign. Text without a container element will be returned in order and wrapped with a ^{ |
Attribute
、PlaceholderDivision
和PlaceholderPart
都支持与Element
相同的属性。
上传到pypi
python setup.py sdist twine upload dist/*