为政治家准备的碎Python

2024-06-28 11:07:17 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的蜘蛛网.py文件:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scinews_com.items import scinews_item
import codecs
import sys 

class sci_news_com(BaseSpider):
   sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout, errors='replace')
   name = "scinews"
   allowed_domains = ["sci-news.com"]
   start_urls = [
       "http://www.sci-news.com/news/astronomy"
   ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       bottomposts = hxs.select('//div[@class="bottom-recentpost-wrapper-cat"]')      
       items = []
       for bottompost in bottomposts:
           item = scinews_item()
           item['Article_Title'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="bottom-content-heading-0"]/h2/a/text()').extract()
           item['Article_Desc'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="post-content"]/p/text()').extract()
           item['Article_Date'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="recentpost-dateauthor"]/text()').extract()
           item['Article_Author'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="recentpost-dateauthor"]/a/text()').extract()
           item['Article_Link'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="bottom-content-heading-0"]/h2/a/@href').extract()
           item['Article_Image'] = bottompost.select('div[@class="bottom-archive"]/div[@class="bottom-recentpost-image-0"]/a/img/@src').extract()
           items.append(item)
       return items

还有我的项目.py公司名称:

^{pr2}$

由于某些原因,它没有将每个select语句放入它自己的“Field()”,而是将所有语句一起输出(在CSV和JSON输出中):

Article_Title,Article_Image,Article_Desc,Article_Date,Article_Link,Article_Author
" HIP 102152: Astronomers Find Oldest Solar Twin 250 Light-Years Away , ALMA Sees Spectacular Newborn Star 1400 Light-Years Away , Astronomers Discover New Earth-Sized Exoplanet Kepler 78b , Hubble Zooms in on Galaxies in Early Universe , Chandra Sees Multimillion-Degree Gas Cloud in NGC 1232 , Pulsar Helps Astronomers Measure Magnetic Field around Milky Way’s Central Black Hole , Astronomers Discover First Ever Six-Image Lensed Quasar , Astronomers See Bizarre Pair in Large Magellanic Cloud ","http://cdn4.sci-news.com/images/2013/08/image_1344_1-HIP102152-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1325-Herbig-Haro-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1324f-Kepler-78b-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1318-galaxies-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1314-NGC-1232-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1312-pulsar-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1304f-quasar-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1301-LMC-195x110.jpg","Astronomers using ESO’s Very Large Telescope have identified the oldest solar twin known to date.
This image shows the Sun-like star HIP 102152. Credit:...,A team of astronomers using the Atacama Large Millimeter/submillimeter Array (ALMA) has captured a beautiful close-up view of an object named Herbig-Haro...,An international team of astronomers reporting in the Astrophysical Journal (arXiv.org) has discovered an Earth-sized exoplanet called Kepler 78b that...,Astronomers using NASA’s Hubble Space Telescope have established that mature-looking galaxies existed much earlier than previously known, when the...,U.S. astronomer using NASA’s Chandra X-ray Observatory has discovered a massive cloud of hot gas likely caused by a collision between a dwarf galaxy...,An international team of astronomers has used observations of the newly discovered pulsar PSR J1745-2900 to measure the magnetic field emanating from a...,A team of scientists at the University of Copenhagen’s Niels Bohr Institute, Denmark, has reported the discovery of a six-image lensed quasar named...,A team of astronomers using the Very Large Telescope (VLT) at ESO’s Paranal Observatory in Chile has captured an image of two distinctive glowing..."," Aug 29, 2013   by 
, Aug 21, 2013   by 
, Aug 21, 2013   by 
, Aug 16, 2013   by 
, Aug 15, 2013   by 
, Aug 15, 2013   by 
, Aug 12, 2013   by 
, Aug 9, 2013   by 
","http://www.sci-news.com/astronomy/science-oldest-solar-twin-01344.html,http://www.sci-news.com/astronomy/science-alma-newborn-star-01325.html,http://www.sci-news.com/astronomy/science-new-earth-sized-exoplanet-kepler78b-01324.html,http://www.sci-news.com/astronomy/science-hubble-galaxies-early-universe-01318.html,http://www.sci-news.com/astronomy/science-chandra-gas-cloud-ngc1232-01314.html,http://www.sci-news.com/astronomy/science-pulsar-magnetic-field-milky-way-black-hole-01312.html,http://www.sci-news.com/astronomy/science-first-ever-six-image-lensed-quasar-01304.html,http://www.sci-news.com/astronomy/science-large-magellanic-cloud-01301.html","Sci-News.com,Enrico de Lazaro,Sergio Prostak,Enrico de Lazaro,Sci-News.com,Sci-News.com,Sergio Prostak,Sci-News.com"

提前谢谢你!在


Tags: oftheimagedivcomhttpbywww
1条回答
网友
1楼 · 发布于 2024-06-28 11:07:17

正在查看http://www.sci-news.com/news/astronomy的HTML源代码

<div id="content" class="single-wrapper" role="main">
    <section>
        <div class="post-wrapper-archive">
            <div class="related-post-wrapper-block">...
            <div class="bottom-recentpost-wrapper-cat">
                <div class="bottom-archive">
                    <div class="bottom-recentpost-image-0">...
                    <div class="post-content-holder">...
                    <div class="cb"></div>
                </div>
                <div class="bottom-archive">...
                <div class="bottom-archive">...
                <div class="bottom-archive">...
                <div class="bottom-archive">...
                <div class="bottom-archive">...
                <div class="bottom-archive">...
                <div class="bottom-archive">...
                <div class="cb"></div>
            </div>
            <div id="pag">
        </div>
    </section>
</div>

我建议您将div[@class="bottom-archive"]步骤移动到bottomposts选择器中:

^{pr2}$

相关问题 更多 >