回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我这样做是为了一个项目,我需要从维基百科专门做一些网页抓取。以前工作过的东西现在突然停止工作了。它需要告诉我用户从维基百科文章中输入的人的职业,我使用的方法是:</p>
<pre><code>#Finding their profession
#Declaring keywords for each profession
sportspersonKeywords = ['Sportsperson', 'Sportsman', 'Sportsman', 'Sports', 'Sport', 'Coach', 'Game', 'Olympics', 'Paralympics', 'Medal', 'Bronze', 'Silver', 'Gold', 'Player', 'sportsperson', 'sportsman', 'sportsman', 'sports', 'sport', 'coach', 'game', 'olympics', 'paralympics', 'medal', 'bronze', 'silver', 'gold', 'player', 'footballer', 'Footballer']
scientistKeywords = ['Scientist', 'Mathematician', 'Chemistry', 'Biology', 'Physics', 'Nobel Prize', 'Invention', 'Discovery', 'Invented', 'Discovered', 'science', 'scientist', 'mathematician', 'chemistry', 'biology', 'physics', 'nobel prize', 'invention', 'discovery', 'invented', 'discovered', 'science', 'Physicist', 'physicist', 'chemist', 'Chemist', 'Biologist', 'biologist']
politicianKeywords = ['Politician', 'Politics', 'Election', 'President', 'Vice-President', 'Vice President', 'Senate', 'Senator', 'Representative', 'Democracy', 'politician', 'politics', 'election', 'president', 'vice-president', 'vice president', 'senate', 'senator', 'representative', 'democracy']
#Declaring the first sentence (from the summary)
firstSentence = summary.split('.')[0]
profession = ['Scientist', 'Sportsperson', 'Politician']
professionFinal = ''
#Splitting the first sentence of the summary into separate words
firstSentenceList = firstSentence.split()
#Replacing each other character in the first sentence
counter = 0
print(firstSentenceList)
for i in firstSentenceList:
x = [',', '.']
if x[0] in i:
firstSentenceList = firstSentenceList[counter].replace(',', '')
counter += 1
elif x[1] in i:
i = i.replace('.', '')
counter += 1
else:
counter += 1
continue
print(firstSentenceList)
#Checking each word in the first sentence against the keywords in each profession to try to get a match
for i in firstSentenceList:
if i in sportspersonKeywords:
professionFinal = profession[1]
break
elif i in scientistKeywords:
professionFinal = profession[0]
break
elif i in politicianKeywords:
professionFinal = profession[2]
break
#if a match is found, then that person has that profession, if not, then their profession is not in our parameters
if professionFinal == '':
print('[PROFESSION]: NOT A SPORTPERSON, SCIENTIST, OR POLITICIAN')
else:
print('[PROFESSION]: ' + professionFinal)
</code></pre>
<p>对于阿尔伯特·爱因斯坦、塞琳娜·威廉姆斯、唐纳德·特朗普和其他人来说,这一切都很顺利,但当我搜索<a href="https://en.wikipedia.org/wiki/James_Watson" rel="nofollow noreferrer" title="James Watson's Wikipedia Page">James Watson</a>时。为了澄清,我只需要从上面的参数中找到他们的职业。如果他们不是科学家、运动员或政治家,就不必再进一步了,只要说他们都不是。不幸的是,我使用的是Repl.it,它不允许断点和其他许多东西,因此我必须通过输入<code>print()</code>语句来手动调试,以检查一切是如何进行的。当我打印存储我的第一句话(我用来检查关键字的那句话)的<strong><code>firstSentenceList</code></strong>变量时,我发现它应该识别生物学家,但它没有识别,因为单词biologist后面有一个逗号;所以它是这样列出的:<em><strong><code>'biologist,'</code></strong></em>,这会把关键字搜索搞砸。此代码:</p>
<pre><code>#Replacing each other character in the first sentence
counter = 0
print(firstSentenceList)
for i in firstSentenceList:
x = [',', '.']
if x[0] in i:
firstSentenceList = firstSentenceList[counter].replace(',', '')
counter += 1
elif x[1] in i:
i = i.replace('.', '')
counter += 1
else:
counter += 1
continue
print(firstSentenceList)
</code></pre>
<p>是我刚刚加入的东西,试图替换列表中的逗号和句号。我试着运行它,然后wallah,出错了。其中之一是:</p>
<p><a href="https://i.stack.imgur.com/vfUec.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/vfUec.png" alt="Error message 1 - "string index out of range""/></a></p>
<p>因此,简而言之,我不知道如何替换列表中每个字符串中的上述项。谁能教我怎么做。再一次,对于那些看到我的另一篇文章并为我如何把它们写得这么长而惊叹的人,我对此表示歉意</p>
<p>**链接到我的Repl.it:<a href="https://repl.it/@BrightBulb123/WikiPedia-Web-scraping-Project-Coding-Competition#main.py" rel="nofollow noreferrer">Wikipedia Web-scraping Project - Brightbulb123 - Repl.it</a></p>