这是我的蜘蛛网.py文件:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scinews_com.items import scinews_item
import codecs
import sys
class sci_news_com(BaseSpider):
sys.stdout = codecs.getwriter(sys.stdout.encoding)(sys.stdout, errors='replace')
name = "scinews"
allowed_domains = ["sci-news.com"]
start_urls = [
"http://www.sci-news.com/news/astronomy"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
bottomposts = hxs.select('//div[@class="bottom-recentpost-wrapper-cat"]')
items = []
for bottompost in bottomposts:
item = scinews_item()
item['Article_Title'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="bottom-content-heading-0"]/h2/a/text()').extract()
item['Article_Desc'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="post-content"]/p/text()').extract()
item['Article_Date'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="recentpost-dateauthor"]/text()').extract()
item['Article_Author'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="recentpost-dateauthor"]/a/text()').extract()
item['Article_Link'] = bottompost.select('div[@class="bottom-archive"]/div[@class="post-content-holder"]/div[@class="bottom-content-heading-0"]/h2/a/@href').extract()
item['Article_Image'] = bottompost.select('div[@class="bottom-archive"]/div[@class="bottom-recentpost-image-0"]/a/img/@src').extract()
items.append(item)
return items
还有我的项目.py公司名称:
^{pr2}$由于某些原因,它没有将每个select语句放入它自己的“Field()”,而是将所有语句一起输出(在CSV和JSON输出中):
Article_Title,Article_Image,Article_Desc,Article_Date,Article_Link,Article_Author
" HIP 102152: Astronomers Find Oldest Solar Twin 250 Light-Years Away , ALMA Sees Spectacular Newborn Star 1400 Light-Years Away , Astronomers Discover New Earth-Sized Exoplanet Kepler 78b , Hubble Zooms in on Galaxies in Early Universe , Chandra Sees Multimillion-Degree Gas Cloud in NGC 1232 , Pulsar Helps Astronomers Measure Magnetic Field around Milky Way’s Central Black Hole , Astronomers Discover First Ever Six-Image Lensed Quasar , Astronomers See Bizarre Pair in Large Magellanic Cloud ","http://cdn4.sci-news.com/images/2013/08/image_1344_1-HIP102152-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1325-Herbig-Haro-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1324f-Kepler-78b-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1318-galaxies-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1314-NGC-1232-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1312-pulsar-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1304f-quasar-195x110.jpg,http://cdn4.sci-news.com/images/2013/08/image_1301-LMC-195x110.jpg","Astronomers using ESO’s Very Large Telescope have identified the oldest solar twin known to date.
This image shows the Sun-like star HIP 102152. Credit:...,A team of astronomers using the Atacama Large Millimeter/submillimeter Array (ALMA) has captured a beautiful close-up view of an object named Herbig-Haro...,An international team of astronomers reporting in the Astrophysical Journal (arXiv.org) has discovered an Earth-sized exoplanet called Kepler 78b that...,Astronomers using NASA’s Hubble Space Telescope have established that mature-looking galaxies existed much earlier than previously known, when the...,U.S. astronomer using NASA’s Chandra X-ray Observatory has discovered a massive cloud of hot gas likely caused by a collision between a dwarf galaxy...,An international team of astronomers has used observations of the newly discovered pulsar PSR J1745-2900 to measure the magnetic field emanating from a...,A team of scientists at the University of Copenhagen’s Niels Bohr Institute, Denmark, has reported the discovery of a six-image lensed quasar named...,A team of astronomers using the Very Large Telescope (VLT) at ESO’s Paranal Observatory in Chile has captured an image of two distinctive glowing..."," Aug 29, 2013 by
, Aug 21, 2013 by
, Aug 21, 2013 by
, Aug 16, 2013 by
, Aug 15, 2013 by
, Aug 15, 2013 by
, Aug 12, 2013 by
, Aug 9, 2013 by
","http://www.sci-news.com/astronomy/science-oldest-solar-twin-01344.html,http://www.sci-news.com/astronomy/science-alma-newborn-star-01325.html,http://www.sci-news.com/astronomy/science-new-earth-sized-exoplanet-kepler78b-01324.html,http://www.sci-news.com/astronomy/science-hubble-galaxies-early-universe-01318.html,http://www.sci-news.com/astronomy/science-chandra-gas-cloud-ngc1232-01314.html,http://www.sci-news.com/astronomy/science-pulsar-magnetic-field-milky-way-black-hole-01312.html,http://www.sci-news.com/astronomy/science-first-ever-six-image-lensed-quasar-01304.html,http://www.sci-news.com/astronomy/science-large-magellanic-cloud-01301.html","Sci-News.com,Enrico de Lazaro,Sergio Prostak,Enrico de Lazaro,Sci-News.com,Sci-News.com,Sergio Prostak,Sci-News.com"
提前谢谢你!在
正在查看http://www.sci-news.com/news/astronomy的HTML源代码
我建议您将
^{pr2}$div[@class="bottom-archive"]
步骤移动到bottomposts
选择器中:相关问题 更多 >
编程相关推荐