有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java如何解析具有多个表的页面

你知道如何用多个表来抓取网页吗? 我正在连接到网页

这是一个表,但在同一网页上有多个表

我也不知道怎么看这张表

XML:

    <p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p> 
<div class="storyStats"> 
<table> 
<thead> 
<tr> 
<th>RANK</th> 
<th>CENTRES</th> 
<th>TEAM</th> 
<th>POS</th> 
<th>GP</th> 
<th>G</th> 
<th>A</th> 
<th>PTS</th> 
<th>+/-</th> 
<th>PIM</th> 
<th>PPP</th> 
</tr> 
</thead> 
<tbody> 
<tr class="bg1"> 
<td>1.</td> 
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven&nbsp;Stamkos</a></td> 

<td>Tampa Bay</td> 
<td>C</td> 
<td align="right">81</td> 
<td align="right">50</td> 
<td align="right">51</td> 
<td align="right">101</td> 
<td align="right">-2</td> 
<td align="right">56</td> 
<td align="right">38</td> 
</tr> 


Iterator<Element> trSIter = doc.select("table")
            .iterator();
    while (trSIter.hasNext()) {
        Element trEl = trSIter.next().child(0);
        Elements tdEls = trEl.children();
        Iterator<Element> tdIter = tdEls.select("tr").iterator();
        System.out.println("><1><><"+tdIter);
        boolean firstRow = true;
        while (tdIter.hasNext()) {

            Element tr = (Element) tdIter.next();


            while (tdIter.hasNext()) {
                int tdCount = 1;
                Element tdEl = tdIter.next();
                //name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();

                Elements tdsEls = tdEl.select("td");
                System.out.println("><2><><"+tdsEls);
                Iterator<Element> columnIt = tdsEls.iterator();

                while (columnIt.hasNext()) {

                    Element column = columnIt.next();
                    switch (tdCount++) {
                    case 1:
                        name =column.select("a").first().text();

                        break;
                    case 2:
                        stat2 = Double.parseDouble(column.text());
                        break;
                    case 3:
                        stat3 = Double.parseDouble(column.text());
                        break;
                    case 4:
                        stat4 = Double.parseDouble(column.text());
                        break;
                    case 5:
                        stat5 = Double.parseDouble(column.text());
                        break;
                    case 6:
                        stat6 = Double.parseDouble(column.text());
                        break;
                    case 7:
                        stat7 = Double.parseDouble(column.text());
                        break;
                    case 8:
                        stat8 = Double.parseDouble(column.text());
                        break;

共 (2) 个答案

  1. # 1 楼答案

    这应该让你开始。每个表都有一条空白记录,您必须对其进行说明。你还必须弄清楚你想要哪些数据,以及它们在表格中的位置。你可以通过tds.get()获得统计数据。让我知道它对你的作用

        Document doc = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815").get();
    
        for (Element table : doc.select("div.storyStats").select("table")) {
            for (Element row : table.select("tr")) {
                Elements tds = row.select("td");
                if (tds.size() > 0) {
                    System.out.println(tds.get(1).text() + ":" + tds.get(5).text());
                }
            }
        }
    
  2. # 2 楼答案

    使用下面的代码,从HTML解析表似乎没有问题

    public class JsoupActivity extends Activity {
        Document doc;
        myHttpGet _myGet;
        @Override
        public void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            setContentView(R.layout.main);
            final TextView tv = (TextView)findViewById(R.id.tv1);
            _myGet = new myHttpGet();
            try {
                doc = _myGet.doHttpGet();
                Elements tdsEls = doc.getElementsByClass("storyStats");
                //tv.setText(tdsEls.get(0).child(0).text());
                tv.setText(String.valueOf(tdsEls.first().children().size()));
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    
        private class myHttpGet {
            Document myDom;
            Connection myConnection;
            Response myResponse;
            public Document doHttpGet() {
                myConnection = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815");
                try {
                    myResponse = myConnection.execute();
                    try {
                        myDom = myResponse.parse();
                        return myDom;
                    } catch (IOException e) {
                        Log.e("napster","Parse Error");
                    }
                } catch (IOException e) {
                    Log.e("napster","HTTP Error");
                }
                return myDom;
            }
        }
    
    }
    

    代码可以在textView中显示5,这是在类storyStats下HTML中的表数。如果必须继续解析表的内容,可以将表分配到另一个Elements对象中并继续解析它

    Elements es = tdsEls.first().children();
    

    安德森的回答展示了如何解析数据。希望有帮助