Web Scraping：运行spider时输入空/NA/Null，在scrapy sh中输入正确的条目

class NaughtySpider(scrapy.Spider): name = "naughtyspider" allowed_domains = ["pornhub.com"] max_pages = 3 # Start request def start_requests(self): for i in range(1, self.max_pages): yield scrapy.Request('https://www.pornhub.com/video?o=cm&page=%s' % i, callback=self.parse_video) # First parsing method def parse_video(self, response): self.log('F i n i s h e d s c r a p i n g ' + response.url) video_links = response.css('ul#videoCategory').css('li.videoBox').css('div.thumbnail-info-wrapper').css('span.title > a').css('::attr(href)') #Correct path, chooses 32 videos from page ignoring the links coming from ads links_to_follow = video_links.extract() for url in links_to_follow: yield response.follow(url = url, callback = self.parse_metadata) # Second parsing method def parse_metadata(self, response): # Create a SelectorList of the course titles text video_title = response.css('div.title-container > h1.title > span.inlineFree::text') # Extract the text and strip it clean video_title_ext = video_title.extract_first() # Extract views video_views = response.css('span.count::text').extract_first() # Extract tags video_tags = response.css('div.tagsWrapper a::text').extract() # Extract Categories video_categories = response.css('div.categoriesWrapper a::text').extract() # Fill in the dictionary yield { 'title': video_title_ext, 'views': video_views, 'tags': video_tags, 'categories': video_categories, }

In [4]: fetch('https://www.pornhub.com/view_video.php?viewkey=ph5d594b093f8d6') [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.pornhub.com/view_video.php?viewkey=ph5d594b093f8d6> (referer: None) In [5]: response.css('div.tagsWrapper a::text').extract() Out[5]: ['alday', '559', '+ '] In [6]: response.css('span.count::text').extract_first() Out[6]: '6'

1条回答

网友

1楼 · 发布于 2024-06-26 13:47:25

视图、持续时间等数据。。。似乎由HTML变量元素<var> DATA </var>调用。例如，如果您在scrapy shell中输入以下行，则应该获得视图。你知道吗

response.xpath(".//var[@class='duration')")

不确定是否有效，但值得一试。你知道吗

顺便说一句，我得告诉我妻子那是为了教育。。

相关问题更多 >

编程相关推荐

热门问题

热门文章