在python中使用wikipedia模块

2024-06-25 23:50:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我在python代码中使用wikipedia模块。我想有一个输入从用户到维基百科搜索,并从其摘要得到2行。因为可能有许多同名的主题,所以我使用了这样的方式

import wikipedia
value=input("Enter what u want to search")
m=wikipedia.search(value,3)
print(wikipedia.summary(m[0],sentences=2))

执行此命令时,它会显示大约3页的异常。这有什么问题? 编辑: 按照@Ruperto的建议,我更改了如下代码

import wikipedia
import random
value=input("Enter the words: ")
try:
    p=wikipedia.page(value)
    print(p)
except wikipedia.exceptions.DisambiguationError as e:
    s=random.choice(e.options)
    p=wikipedia.summary(s,sentences=2)
    print(p)

现在我得到的错误是

Traceback (most recent call last):   File "C:\Users\vdhan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw   File "C:\Users\vdhan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\connection.py", line 84, in create_connection
    raise err   File "C:\Users\vdhan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\connection.py", line 74, in create_connection
    sock.connect(sa) TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last):   File "C:\Users\vdhan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen
    chunked=chunked, urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x03AEEAF0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

现在该怎么办


Tags: thetovalueliblocalwikipediaconnectionusers
3条回答

我遇到了一个类似的问题,经过大量的挠头和谷歌搜索,我找到了这个解决方案:

import wikipediaapi as api
import wikipedia as wk

# Wikipediaapi 'initialization'
wiki_wiki = api.Wikipedia('en')


# Getting fixed number of sentences from summary
def summary(pg, sentences=5):
    summ = pg.summary.split('. ')
    summ = '. '.join(summ[:sentences])
    summ += '.'
    return summ


s_term = 'apple'# Any term, ambiguous or not
wk_res = wk.search(s_term)
page = wiki_wiki.page(wk_res[0])
print("Page summary", summary(page))

基本上,从我看到的情况来看,仅仅使用wikipedia模块并不能得到一个好的解决方案。 例如,如果我搜索“印度”,我永远无法获得印度这个国家的页面,这正是我想要的。 这是因为印度(国家)维基百科页面的标题只是“印度”。但是,由于该标题可能涉及的内容太多,因此该标题无效。这种情况也适用于许多其他事情

但是,wiki_wiki_.page可以获得标题不明确的页面,这是代码所依赖的系统

正如您的错误所述,这可能是由于没有/不良的internet连接造成的

A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

您可以更改/检查internet连接,然后重试。两者都不是,这是python环境的问题。 我的实施是,

import warnings
warnings.filterwarnings("ignore")

import wikipedia
import random


value=input("Enter the words: ")
try:
    m=wikipedia.search(value,3)
    print(wikipedia.summary(m[0],sentences=2))
    # print(p)
except wikipedia.exceptions.DisambiguationError as e:
    s=random.choice(e.options)
    p=wikipedia.summary(s,sentences=2)
    print(p)

输出:

Enter the words: programming
Program management or programme management is the process of managing several related projects, often with the intention of improving an organization's performance. In practice and in its aims, program management is often closely related to systems engineering, industrial engineering, change management, and business transformation.

它在googlecolab中运行良好,我的实现colab文件可以找到here

上述错误是由于互联网的连接问题造成的。但是,下面的代码可以工作

value=input("Enter the words: ")
try:
    m=wikipedia.search(value,3)
    print(wikipedia.summary(m[0],sentences=2))
except wikipedia.exceptions.DisambiguationError as e:
    s=random.choice(e.options)
    p=wikipedia.summary(s,sentences=2)
    print(p)

然而,这里需要注意的是,由于这是较大代码块的一部分,因此最好使用任何NLP库进行抽象或提取摘要,因为wikipdia包只使用beautifulsoup和soupsieve进行web抓取,并以一种非摘要的方式还原仅有的几行顶行。维基百科上的内容也可以每2小时更改一次

相关问题 更多 >