有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java如何通过HTMLUnit单击锚来下载ZIP文件

我正在尝试使用以下代码下载一个带有HTMLUnit 2.32的ZIP文件

我获得的“myfile.zip”比通过普通浏览器下载的文件(179kb对79kb)大,并且已损坏

一个人应该如何点击一个锚并下载一个带有HTMLUnit的文件

        WebClient wc = new WebClient(BrowserVersion.CHROME);

        final String HREF_SCARICA_CONSOLIDATI = "/web/area-pubblica/quotate?viewId=export_quotate";

        final String CONSOBBase = "http://www.consob.it";

        HtmlPage page = wc.getPage(CONSOBBase + HREF_SCARICA_CONSOLIDATI);

        final String downloadButtonXpath = "//a[contains(@href, 'javascript:downloadAzionariato()')]";
        List<HtmlAnchor> downloadAnchors = page.getByXPath(downloadButtonXpath);
        HtmlAnchor downloadAnchor = downloadAnchors.get(0);

        UnexpectedPage downloadedFile = downloadAnchor.click();

       InputStream contentAsStream = downloadedFile.getWebResponse().getContentAsStream();
        File destFile = new File("/tmp", "myfile.zip");
        Writer out = new OutputStreamWriter(new FileOutputStream(destFile));
        IOUtils.copy(contentAsStream, out);
        out.close();

共 (2) 个答案

  1. # 1 楼答案

    已经对代码片段进行了一些更新,使其能够正常工作。希望内联评论有助于理解正在发生的事情(使用HtmlUnit的最新快照代码(2.34-SNAPSHOT 2018/11/03)

    final String HREF_SCARICA_CONSOLIDATI = "http://www.consob.it/web/area-pubblica/quotate?viewId=export_quotate";
    
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {                                   
        HtmlPage page = webClient.getPage(HREF_SCARICA_CONSOLIDATI);                                               
    
        final String downloadButtonXpath = "//a[contains(@href, 'javascript:downloadAzionariato()')]";             
        List<HtmlAnchor> downloadAnchors = page.getByXPath(downloadButtonXpath);                                   
        HtmlAnchor downloadAnchor = downloadAnchors.get(0);                                                        
    
        // click does some javascript magic - have a look at your browser                                          
        // seems like this opens a new window with the content as response                                         
        // because of this we can ignore the page returned from click                                              
        downloadAnchor.click();                                                                                    
        // instead of we are waiting a bit until the javascript is done                                            
        webClient.waitForBackgroundJavaScript(1000);                                                               
    
        // now we have to pick up the window/page that was opened as result of the download                        
        Page downloadPage = webClient.getCurrentWindow().getEnclosedPage();                                        
    
        // and finally we can save to content                                                                      
        File destFile = new File("/tmp", "myfile.zip");                                                            
        try (InputStream contentAsStream = downloadPage.getWebResponse().getContentAsStream()) {                   
            try (OutputStream out = new FileOutputStream(destFile)) {                                              
                IOUtils.copy(contentAsStream, out);                                                                
            }                                                                                                      
        }                                                                                                          
    
        System.out.println("Output written to " + destFile.getAbsolutePath());                                     
    }                                                                                                              
    
  2. # 2 楼答案

    虽然RBRi的考虑因素很有趣,但我发现我的代码在使用HTMLUnit 2.32时没有做任何修改,但我编写文件的方式是错误的

    我曾经

    Writer out = new OutputStreamWriter(new FileOutputStream(destFile));
    IOUtils.copy(contentAsStream, out);
    

    虽然它必须是(无OutputStreamWriter)

    FileOutputStream out = new FileOutputStream(destFile);
    IOUtils.copy(contentAsStream, out);