java使用jsoup从url中提取适当的内容

1 年 Questions & Answers 275

我正在研究如何使用Jsoup提取CNN或《纽约时报》等新闻文章的内容

事实上，我已经尝试了以下代码：

Document document = Jsoup.connect("http://edition.cnn.com/2013/11/10/world/asia/philippines-typhoon-haiyan/index.html").get();

Element contents = document.select("#content").first();

System.out.println(contents.html()); 

System.out.println(contents.text());

我收到了这个错误：

Exception in thread "main" java.lang.NullPointerException
at com.clearforest.Test.main(Test.java:36)

你知道我如何从文章中提取合适的文本吗

Tags:

共 (1) 个答案

# 1 楼答案

在select调用之后，您的contents Element为空。您指定的选择器在从CNN下载的文档中没有返回匹配项。请尝试类似document.select("div.cnn_strycntntlft")的方法，返回故事div内容

有 Java 编程相关的问题?

java使用jsoup从url中提取适当的内容

共 (1) 个答案

# 1 楼答案