假定正确的SPARQL查询（Wikidata）不会在Python中产生任何结果

2024-05-19 17:03:35 发布

男 | 程序猿一只，喜欢编程写python代码。

编辑：当我使用

print("Result", result)

…我得到这个输出：

Result <sparql._ResultsParser object at 0x7f05adbc9668>

…但我不知道这是否意味着格式有问题

编辑2：在另一个关于wikidata的请求之后，多亏了这个线程中的注释，我得出结论，为每个关系查询wikidata是不可行的。因此，我最终下载了一个所有属性的列表，其中包含了它们的英文标签、描述和altlabel，并执行了“离线”搜索。如果需要，倒排索引将进一步提高性能。Wikidata中的属性数量相对较少。以下是您可以在官方SPARQL API中运行的查询，以查看结果：

SELECT ?property ?propertyLabel ?propertyDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
    ?property a wikibase:Property .
    OPTIONAL { ?property skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
 }
GROUP BY ?property ?propertyLabel ?propertyDescription

下面是我的Python程序内部的外观，包括满足我需要的解析。我知道查询中的大多数前缀都是不必要的，但它们也不会造成伤害：

from SPARQLWrapper import SPARQLWrapper, JSON
from datetime import datetime

File_object = open(r"/home/YOUR_NAME/PycharmProjects/Proj/data_files/wikidata_relation_labels.txt", "r+")

# https://stackoverflow.com/questions/30755625/urlerror-with-sparqlwrapper-at-sparql-query-convert
sparql = SPARQLWrapper("https://query.wikidata.org/sparql", agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 "
                                                                  "(KHTML, like Gecko) Chrome/23.0.1271.64 "
                                                                  "Safari/537.11")
sparql.setQuery("""PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wds: <http://www.wikidata.org/entity/statement/>
PREFIX wdv: <http://www.wikidata.org/value/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX skos:  <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?property ?propertyLabel ?propertyDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
    ?property a wikibase:Property .
    OPTIONAL { ?property skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
 }
GROUP BY ?property ?propertyLabel ?propertyDescription
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
dateTimeObj = datetime.now()
print("timestamp: ", print(dateTimeObj))
for result in results["results"]["bindings"]:
    p_id = p_label = p_description = p_alt_labels = ""
    if result["property"]["value"]:
        p_id = result["property"]["value"].rsplit('/', 1)[1]
    if result["propertyLabel"]["value"]:
        p_label = result['propertyLabel']['value']
    # why all these "if"s? Because some properties have no description.
    if "propertyDescription" in result:
        if result["propertyDescription"]["value"]:
            p_description = result['propertyDescription']['value']
    if result["altLabel_list"]["value"]:
        p_alt_labels = result["altLabel_list"]["value"]
    File_object.write(p_id + " | " + p_label + " | " + p_description + " | " + p_alt_labels + "\n")

# simple way to check if Wikidata decided to include a pipe somewhere
for line in File_object:
    if line.count('|') > 4:
        print("Too many pipes: ", line)

lines = File_object.readlines()
lines.sort()

# TODO: sort through terminal: 'sort wikidata_relation_labels.txt -o wikidata_relation_labels.txt'

File_object.close()

我用管子作为分隔物。在许多情况下，这可能被视为不良做法

我试图获取所有Wikidata属性的id和标签，其中属性的标签或其“也称为”（可选）标签之一等于/包含给定字符串（relation.label）

我在Python3.x中使用了this SPARQL client/API（描述有些矛盾）

以下是我的代码片段：

import sparql

endpoint = 'https://query.wikidata.org/sparql'

def is_simple_relation(relation):
    s = sparql.Service(endpoint, "utf-8", "GET")
    q = """SELECT DISTINCT ?property ?propertyLabel WHERE {
         ?property rdf:type wikibase:Property;
         rdfs:label ?propertyLabel;
         skos:altLabel ?altLabel.
         FILTER(LANG(?propertyLabel) = "[AUTO_LANGUAGE]").
         FILTER(CONTAINS(?propertyLabel, "replace_me") || CONTAINS(?altLabel, "replace_me")).
         }
         LIMIT 100"""
    q = q.replace('replace_me', relation.label)
    print("Query: ", q)
    print("Querying")
    result = sparql.query(endpoint, q)
    print("Finished query")
    for row in result.fetchone():
        print("row: ", row)

我的输出是：

Query:  SELECT DISTINCT ?property ?propertyLabel WHERE {
         ?property rdf:type wikibase:Property;
         rdfs:label ?propertyLabel;
         skos:altLabel ?altLabel.
         FILTER(LANG(?propertyLabel) = "[AUTO_LANGUAGE]").
         FILTER(CONTAINS(?propertyLabel, "has effect") || CONTAINS(?altLabel, "has effect")).
         }
         LIMIT 100
Querying
Finished query

也就是说，我什么都没找到。我已经尝试过执行查询here，它按照预期工作，因此查询很好。我已经尝试在我的程序中执行了一个示例查询，它按预期工作，按预期打印多行

我能想到的唯一可能的原因是，当从我的程序执行查询时，达到超时所需的时间会更长，而查询是通过第二个链接及时计算的。但我没有得到任何警告或任何东西。我的假设正确吗？如果是这样的话，我的查询是否可以改进？可能有一个性能杀手，我不知道

谢谢

Tags： org http prefix if value www property result

0条回答

目前没有回答

假定正确的SPARQL查询（Wikidata）不会在Python中产生任何结果

相关问题更多 >

编程相关推荐

热门问题

热门文章

假定正确的SPARQL查询（Wikidata）不会在Python中产生任何结果

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >