我正在尝试用无限滚动来刮一个网站

2024-09-28 01:23:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我曾在R中尝试过,但我无法实现无限滚动

这是reference link来了解使用pyton中的Selenium包进行无限滚动的一些想法。我对Python编码不太在行,但还是尝试了一些参考文章中的编辑。在

以下是在R中进行抓取的代码

library(rvest)
 uuu_df2 <- data.frame(x = c('http://www.magicbricks.com/property-for-
 sale/residential-real-estate?bedroom=1&proptype=Multistorey-Apartment,Builder-
 Floor-Apartment,Penthouse,Studio-Apartment&cityName=Thane&BudgetMin=5-
 Lacs&BudgetMax=5-Lacs',
                            'http://www.magicbricks.com/property-for-sale/residential-real-estate?bedroom=1&proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment&cityName=Thane&BudgetMin=5-Lacs&BudgetMax=10-Lacs',
'http://www.magicbricks.com/property-for-sale/residential-real-estate?bedroom=1&proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment&cityName=Thane&BudgetMin=5-Lacs&BudgetMax=10-Lacs'))

    urlList <- llply(uuu_df2[,1], function(url){     

      this_pg <- read_html(url)

      results_count <- this_pg %>% 
        xml_find_first(".//span[@id='resultCount']") %>% 
        xml_text() %>%
        as.integer()

      if(!is.na(results_count) & (results_count > 0)){

        cards <- this_pg %>% 
          xml_find_all('//div[@class="SRCard"]')

        df <- ldply(cards, .fun=function(x){
          y <- data.frame(wine = x %>% xml_find_first('.//span[@class="agentNameh"]') %>% xml_text(),
                          excerpt = x %>% xml_find_first('.//div[@class="postedOn"]') %>% xml_text(),
                          locality = x %>% xml_find_first('.//span[@class="localityFirst"]') %>% xml_text(),
                          society = x %>% xml_find_first('.//div[@class="labValu"]') %>% xml_text() %>% gsub('\\n', '', .))
          return(y)
        })

      } else {
        df <- NULL
      }

      return(df)   
    }, .progress = 'text')
    names(urlList) <- uuu_df2[,1]

这是python代码,用于无限滚动,我试图从原始帖子中编辑它

^{pr2}$

但它给了我一个错误:

execfile(filename, namespace)
  File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "D:/Deepesh/All files/test_forCSVData.py", line 27
    self.driver.execute_script(".//span[@class="agentNameh;")

有什么建议可以在我的Python/R代码中进行编辑,这样它就可以无限滚动了。在


Tags: 代码texthttp编辑wwwxmlfinduuu

热门问题