Http 404找不到错误Apach

2024-10-01 11:35:12 发布

您现在位置：Python中文网/ 问答频道 /正文

11868

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试抓取一组网页，并使用apachesolr为它们编制索引。为了抓取网页，我在beauthoulsoup和urllib2的帮助下使用python。我成功地检索了url和html数据。在

现在我正试图让Solr通过Solr索引它们(http://code.google.com/p/solrpy/). 我一直得到一个Http 404错误找不到。在

我没有修改默认值架构.xml我使用的是apachesolr附带的示例服务器。在

我的代码是：

import sys 
import urllib2
import solr
from bs4 import BeautifulSoup
from lxml import etree
import hashlib
solrUrl = 'http://localhost:8983/solr/'
solrInstance = solr.SolrConnection(solrUrl)
conn = urllib2.urlopen('http://seekingalpha.com/market_currents.xml')   
root = etree.fromstring(conn.read())
links = root.findall(".//link")
counter = 0
for link in links:
    counter=counter+1
    url = link.text 
    url_md5 = hashlib.md5(url).hexdigest()
    conn = urllib2.urlopen(link.text)
    soup = BeautifulSoup(conn.read())
    title_page = soup.html.head.title.string.decode("utf-8")
    print title_page
    try: # Add to the Solr instance
        solrInstance.add(id=str(url_md5),url_s=url,text=str(title_page),title=str(title_page))
    except Exception as inst:
        print "Error adding URL: "+url
        print "\tWith Message: "+str(inst)
    else:
        print "Added Page \""+title+"\" with URL "+url
try:
    solrInstance.commit()
except:
    print "Could not Commit Changes to Solr Instance - check logs"
else:
    print "Success. "+str(counter)+" documents added to index"

错误就在这里：

^{pr2}$

我该如何纠正？提前谢谢。在

Tags： text import http url title counter page link

1条回答

网友

1楼 · 发布于 2024-10-01 11:35:12

我自己并没有使用solrpy，但是在使用它之后，您似乎必须删除solr URL中的尾随/。把它改成

solrUrl = 'http://localhost:8983/solr'

Http 404找不到错误Apach

相关问题更多 >

编程相关推荐

热门问题

热门文章

Http 404找不到错误Apach

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >