有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java通过Android Studio列表从HTMLData获取URL

我离我想要的很近,但我被封锁了。。。 我的字符串contentString中有HTML数据: Log.i(TAG, "ALL URL : " + contentString);

<p><b>14th April</b></p>
<p>The wind is south west with 4 to 5 foot of swell at the peak. Streedagh will probably be the best beach break.</p>
<p><span id="more-113"></span></p>
<p>High tide: 1250  3.1m    <span style="color: #ff0000;"> <a href="http://www.bundoransurfco.com/webcam/"><strong>CLICK HERE FOR LIVE PEAK WEBCAM</strong></a></span></p>
<p>Low Tide: 1854 1.4m</p>
<p></p>
<p></p>
<style type='text/css'>
#gallery-1 {
margin: auto;
}
#gallery-1 .gallery-item {
float: left;
margin-top: 10px;
text-align: center;
width: 50%;
}
#gallery-1 img {
border: 2px solid #cfcfcf;
}
#gallery-1 .gallery-caption {
margin-left: 0;
}
/* see gallery_shortcode() in wp-includes/media.php */
</style>
<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-thumbnail'><dl class='gallery-item'>
<dt class='gallery-icon portrait'>
<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="11149460_10152656389992000_7842452340110509403_n" /></a>
</dt></dl><dl class='gallery-item'>
<dt class='gallery-icon portrait'>
<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="14th April" /></a>
</dt></dl><br style="clear: both" />
</div>
<p></p>
<p><b>3 day forecast to April 13th</b></p>
<p>Solid swell and onshore winds for the weekend. Best spots will be Rossnowlagh and Streedagh. Bundoran beaches and reefs will be blown out.</p>
<h1> Wind Charts</h1>
<p><a href="http://www.windguru.cz/int/index.php?sc=103244"><img class="size-thumbnail wp-image-747 alignleft" title="wind guru" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/wind-guru-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.xcweather.co.uk/"><img class="alignnone size-thumbnail wp-image-749" title="xcweathersmall" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/xcweathersmall2-67x68.jpg" alt="" width="67" height="68" /></a>       <a href="http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e"><img class="alignnone size-thumbnail wp-image-750" title="buoy weather" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/buoy-weather-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.windguru.cz/int/index.php?sc=103244">Wind Guru</a>       <a href="http://www.xcweather.co.uk/">XC Weather</a>       <a href="http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e">Buoy Weather</a></p>

我只想用<a rel="prettyPhoto[gallery-113]" ...>获取href的URL(我的示例中有两个)

为此,我使用了模式:

Pattern pattern = Pattern.compile("<a rel=\"prettyPhoto\\[gallery-113\\]\"[^>]*>");
        Matcher matcher = pattern.matcher(contentString);
        List<String> urlWithRel = new ArrayList<String>();
        String lastString;
        List<String> imagesUrl = null;
        while (matcher.find()) {
            urlWithRel.add(matcher.group());
            lastString = urlWithRel.toString();
        }
        Log.i(TAG, "url with rel : " + urlWithRel);
        Log.i(TAG, "final url : " + imagesUrl);
        Log.i(TAG, "List size : " + imagesUrl.size());

使用第一个正则表达式,我可以得到我需要的两个标记:

<a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg'>

现在我只想存储href的URL,我找到了一个只用于获取URL的正则表达式:(?<;=href=)*(?=>;)

但问题是我不能使用列表中的另一个正则表达式。。。如果我创建一个字符串来生成正则表达式,正则表达式只处理第一个对象

以下是我的最终代码(不起作用):

Pattern pattern = Pattern.compile("<a rel=\"prettyPhoto\\[gallery-113\\]\"[^>]*>");
Matcher matcher = pattern.matcher(contentString);
List<String> urlWithRel = new ArrayList<String>();
String lastString;
List<String> imagesUrl = null;
while (matcher.find()) {
    urlWithRel.add(matcher.group());
    lastString = urlWithRel.toString();
    Pattern lastPattern = Pattern.compile("(?<=href=).*(?=>)");
    Matcher lastMatcher = lastPattern.matcher(lastString);
    imagesUrl = new ArrayList<String>();
    while (lastMatcher.find()) {
        imagesUrl.add(lastMatcher.group());
    }
}
Log.i(TAG, "url with rel : " + urlWithRel);
Log.i(TAG, "final url : " + imagesUrl);
Log.i(TAG, "List size : " + imagesUrl.size());

返回:

final url : ['http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyPhoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-April.jpg']

共 (1) 个答案

  1. # 1 楼答案

    如果您愿意使用jsoup库,这是您应该使用的代码片段:

    ArrayList<Url> urls=new ArrayList<Url>();
    Document doc=Jsoup.parse(contentString);
    Elements els=doc.select("a[href]");
    for(Element el : els)
        if(el.attr("rel").equals("prettyPhoto[gallery-113]"))
           urls.add(new Url(el.attr("href")));
    

    并记住为Url对象处理MalformedURLException