rdflib的存储后端,允许读取和查询HDT文档
rdflib-hdt的Python项目详细描述
rdflib的存储后端,允许读取和查询HDT文档。在
要求
Pythonversion 3.6.4或更高版本
在- 在
gcc/clang支持c++11
在Python开发头文件 .. 在
在You should have the ^{tt1}$ header available on your system.For example, for Python 3.6, install the ^{tt2}$ package on Debian/Ubuntu systems.
安装
{a8}强烈建议安装{a7}!
PyPi安装(推荐)
# you can install using pip pip install rdflib_hdt # or you can use pipenv pipenv install rdflib_hdt
手动安装
要求:pipenv
^{pr2}$入门
您可以在两种模式下使用rdflib-hdt库:作为rdflib图或作为原始HDT文档。在
图形使用(推荐)
fromrdflibimportGraphfromrdflib_hdtimportHDTStorefromrdflib.namespaceimportFOAF# Load an HDT file. Missing indexes are generated automatically# You can provide the index file by putting them in the same directory than the HDT file.store=HDTGraph("test.hdt")# Display some metadata about the HDT document itselfprint(f"Number of RDF triples: {len(store)}")print(f"Number of subjects: {store.nb_subjects}")print(f"Number of predicates: {store.nb_predicates}")print(f"Number of objects: {store.nb_objects}")print(f"Number of shared subject-object: {store.nb_shared}")
使用rdflibapi,还可以在HDT文档上execute SPARQL queries。 如果您这样做,我们建议您首先调用optimize_sparql函数,这样可以优化 HDT文档上下文中的rdflibsparql查询引擎。在
fromrdflibimportGraphfromrdflib_hdtimportHDTStore,optimize_sparql# Calling this function optimizes the RDFlib SPARQL engine for HDT documentsoptimize_sparql()graph=Graph(store=HDTStore("test.hdt"))# You can execute SPARQL queries using the regular RDFlib APIqres=graph.query(""" PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?friend WHERE { ?a foaf:knows ?b. ?a foaf:name ?name. ?b foaf:name ?friend. }""")forrowinqres:print(f"{row.name} knows {row.friend}")
HDT文档使用
fromrdflib_hdtimportHDTDocument# Load an HDT file. Missing indexes are generated automatically.# You can provide the index file by putting them in the same directory than the HDT file.document=HDTDocument("test.hdt")# Display some metadata about the HDT document itselfprint(f"Number of RDF triples: {document.total_triples}")print(f"Number of subjects: {document.nb_subjects}")print(f"Number of predicates: {document.nb_predicates}")print(f"Number of objects: {document.nb_objects}")print(f"Number of shared subject-object: {document.nb_shared}")# Fetch all triples that matches { ?s foaf:name ?o }# Use None to indicates variablestriples,cardinality=document.search_triples((None,FOAF("name"),None))print(f"Cardinality of (?s foaf:name ?o): {cardinality}")fors,p,ointriples:print(triple)# The search also support limit and offsettriples,cardinality=document.search_triples((None,FOAF("name"),None),limit=10,offset=100)# etc ...
HDT文档还支持在一组三元组模式上评估连接。在
fromrdflib_hdtimportHDTDocumentfromrdflibimportVariablefromrdflib.namespaceimportFOAF,RDFdocument=HDTDocument("test.hdt")# find the names of two entities that know each othertp_a=(Variable("a"),FOAF("knows"),Variable("b"))tp_b=(Variable("a"),FOAF("name"),Variable("name"))tp_c=(Variable("b"),FOAF("name"),Variable("friend"))query=set([tp_a,tp_b,tp_c])iterator=document.search_join(query)print(f"Estimated join cardinality: {len(iterator)}")# Join results are produced as ResultRow, like in the RDFlib SPARQL APIforrowiniterator:print(f"{row.name} knows {row.friend}")
在python中处理非UTF-8字符串
如果HDT文档是用非UTF-8编码编码的,则前面的代码将无法正常工作,并将导致UnicodeDecodeError。 有关如何将字符串从C++转换为Python ^ {A11}
的更多细节为了解决这个问题,我们将HDT文档的API增加了一倍:
- search_triples_bytes(...)返回一个三元组的迭代器(py::bytes,py::bytes,py::bytes)
- search_join_bytes(...)返回映射为py::set(py::bytes,py::bytes)
- convert_tripleid_bytes(...)返回三元组为:(py::bytes,py::bytes,py::bytes)
- convert_id_bytes(...)返回一个py::bytes
参数和文档与标准版本相同
fromrdflib_hdtimportHDTDocumentdocument=HDTDocument("test.hdt")it=document.search_triple_bytes("","","")fors,p,oinit:print(s,p,o)# print b'...', b'...', b'...'# now decode it, or handle any errortry:s,p,o=s.decode('UTF-8'),p.decode('UTF-8'),o.decode('UTF-8')exceptUnicodeDecodeErroraserr:# try another other codecs, ignore error, etcpass
- 项目
标签: