有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java将Nutch 1.17与Eclipse集成(Ubuntu 18.04)

我不知道指南是否过时了,或者我做错了什么。 我刚开始使用nutch,我已经将它与solr集成,并通过终端在一些网站上进行爬网/索引。 现在我正试图在java应用程序中使用它们,所以我一直在遵循以下教程: https://cwiki.apache.org/confluence/display/NUTCH/RunNutchInEclipse#RunNutchInEclipse-RunningNutchinEclipse

我通过Eclipse下载了Subclipse、IvyDE和m2e,还下载了ant,所以我应该具备所有的先决条件。 教程中的m2e链接已断开,所以我在其他地方找到了它。事实证明,eclipse在安装时就已经拥有了它

在terminal中运行“ant eclipse”时,我会收到大量错误消息。 由于字数的原因,请将一个指向粘贴箱的链接与整个错误消息放在一起 here

我真的不确定我做错了什么。 方向没有那么复杂,所以我真的不知道我把事情搞砸了

为了以防万一,这里是nutch网站。我们需要修改的xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
   <name>plugin.folders</name>
   <value>/home/user/trunk/build/plugins</value>
</property>

<!-- HTTP properties -->

<property>
  <name>http.agent.name</name>
  <value>MarketDataCrawler</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.

  NOTE: You should also check other related properties:

    http.robots.agents
    http.agent.description
    http.agent.url
    http.agent.email
    http.agent.version

  and set their values appropriately.

  </description>
</property>

<property>
  <name>http.robots.agents</name>
  <value></value>
  <description>Any other agents, apart from 'http.agent.name', that the robots
  parser would look for in robots.txt. Multiple agents can be provided using 
  comma as a delimiter. eg. mybot,foo-spider,bar-crawler
  
  The ordering of agents does NOT matter and the robots parser would make 
  decision based on the agent which matches first to the robots rules.  
  Also, there is NO need to add a wildcard (ie. "*") to this string as the 
  robots parser would smartly take care of a no-match situation. 
    
  If no value is specified, by default HTTP agent (ie. 'http.agent.name') 
  would be used for user agent matching by the robots parser. 
  </description>
</property>

</configuration>

很多错误都与Ivy有关,所以我不知道Nutch和eclipse中安装的插件之间的Ivy版本是否兼容


共 (1) 个答案

  1. # 1 楼答案

    按照日志文件中的指导

    [ivy:resolve]   SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.pom
    [ivy:resolve]   SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar
    [ivy:resolve]   SERVER ERROR: HTTPS Required url=http://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.pom
    

    你应该在常春藤/常春藤中使用更新的存储库URL。xml。一个选项是在ivy中将每个URL从http更改为https。xml

    我认为,你正在使用一些旧版本,否则这个问题应该已经解决了