回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我的目标是刮<a href="https://www.1177.se/hitta-vard/?caretype=&cs=false&location=&q=&s=name&st=69fb5a17-fd0d-4fb5-b0c4-32a385536bb7" rel="nofollow noreferrer">this URL</a></p>
<p>列表中的每个项目都链接到有关它的更多信息。我的目标是清除所有17000个链接页面。只显示10个结果,并且<em>查看更多</em>按钮发出请求,通过JSON向列表中添加10个以上的结果。我试图通过更改<em>batchsize</em>来修改请求,该参数用于定义列表中结果的数量,但没有起作用。我还尝试使用此代码(从a<a href="https://blog.scrapinghub.com/2016/06/22/scrapy-tips-from-the-pros-june-2016" rel="nofollow noreferrer">tutorial</a>),但无法将其调整为适合我的特定任务:</p>
<pre><code>import json
import scrapy
class SpidyQuotesSpider(scrapy.Spider):
name = 'spidyquotes'
quotes_base_url = 'http://spidyquotes.herokuapp.com/api/quotes?page=%s'
start_urls = [quotes_base_url % 1]
download_delay = 1.5
def parse(self, response):
data = json.loads(response.body)
for item in data.get('quotes', []):
yield {
'text': item.get('text'),
'author': item.get('author', {}).get('name'),
'tags': item.get('tags'),
}
if data['has_next']:
next_page = data['page'] + 1
yield scrapy.Request(self.quotes_base_url % next_page)
</code></pre>
<p>我看了一些例子<a href="https://stackoverflow.com/questions/47104156/scraping-infinite-scrolling-pages-with-load-more-button-using-scrapy">here</a>、<a href="https://stackoverflow.com/questions/48477688/scrape-page-with-load-more-results-button">here</a>和<a href="https://blog.scrapinghub.com/2016/06/22/scrapy-tips-from-the-pros-june-2016" rel="nofollow noreferrer">here</a>。然而,经过2天的尝试,我仍然不知道如何解决这个问题,因为我希望抓取的网站上的URL请求与所有示例不同,而且似乎它们使抓取变得更加困难</p>
<p>点击<em>查看更多</em>发出的请求如下:</p>
<blockquote>
<p>Request URL:
<a href="https://www.1177.se/api/hjv/search?batchsize=10&caretype=&componentname&cs=false&location=&p=2&q=&s=name&sortorder=name&st=4af2ed43-1154-4363-ae6b-718f9b84d23a" rel="nofollow noreferrer">https://www.1177.se/api/hjv/search?batchsize=10&caretype=&componentname&cs=false&location=&p=2&q=&s=name&sortorder=name&st=4af2ed43-1154-4363-ae6b-718f9b84d23a</a></p>
</blockquote>
<p><strong>p=</strong>参数在点击<em>查看更多</em>时递增:
<a href="https://i.stack.imgur.com/crBDw.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/crBDw.png" alt="enter image description here"/></a></p>
<p>返回的JSON具有以下格式:</p>
<blockquote>
<p>{"Heading":"<strong>17952</strong> träffar på <strong>Alla
mottagningar</strong>","Query":"","Region":null,"NextPage":3,"Page":2,"BatchSize":10,"BatchText":"Visa
10
till","TotalHits":17952,"SortOrder":"name","Latitude":0.0,"Longitude":0.0,"Bounds":null,"SearchHits":[{"HsaId":"SE162321000255-O23228","FriendlyUrl":"/hitta-vard/kontaktkort/A5-Psykoterapi-Katia-Karlsson-Carli-AB-Lund/","DisplayName":"A5
Psykoterapi Katia Karlsson Carli AB, Lund","Address":"Stortorget 1,
Lund","PhoneNumber":"073-046 26
68","HasMvkServices":true,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":55.703161529482479,"Longitude":13.193039057187006},{"HsaId":"SE162321000255-O22542","FriendlyUrl":"/hitta-vard/kontaktkort/A5Psykoterapi-Gunilla-Lundqvist-Lund/","DisplayName":"A5Psykoterapi
- Gunilla Lundqvist, Lund","Address":"Stortorget 1 5:e vån, Lund","PhoneNumber":"070-624 13
97","HasMvkServices":true,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":55.703161529482479,"Longitude":13.193039057187006},{"HsaId":"SE2321000057-6SV4","FriendlyUrl":"/hitta-vard/kontaktkort/A6-Ogonklinik-AB/","DisplayName":"A6
Ögonklinik AB","Address":"Batterigatan 9 NB,
Jönköping","PhoneNumber":"036-860 20
30","HasMvkServices":true,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":57.768032303027383,"Longitude":14.202798620555548},{"HsaId":"SE162321000024-0059892","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Evelina-Linder-KBT/","DisplayName":"AB
Evelina Linder KBT","Address":"Drottninggatan 1A,
Uppsala","PhoneNumber":"073-593 00
73","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.858328320441558,"Longitude":17.638292776307694},{"HsaId":"SE162321000024-0052597","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Forsberg-KBT-konsult/","DisplayName":"AB
Forsberg KBT-konsult","Address":"Trädgårdsgatan 5A,
Uppsala","PhoneNumber":"070-818 17
11","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.856845411620185,"Longitude":17.635819529969204},{"HsaId":"SE2321000016-C7H4","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Lyhord-Ostermalmstorg/","DisplayName":"AB
Lyhörd - Östermalmstorg","Address":"Östermalmstorg
1,STOCKHOLM","PhoneNumber":"08-425 004
00","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.336237708592563,"Longitude":18.079317099784653},{"HsaId":"SE2321000016-BH0B","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Suavis-horsel-Solna-Business-park/","DisplayName":"AB
Suavis hörsel, Solna Business park","Address":"Svetsarvägen 15,2
tr,SOLNA","PhoneNumber":"010-207 11
77","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.35928477168008,"Longitude":17.980058512140353},{"HsaId":"SE2321000016-56DM","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Vackra-Tander-Annette-Goransson/","DisplayName":"AB
Vackra Tänder Annette Göransson","Address":"Drottninggatan
71A,STOCKHOLM","PhoneNumber":"08-21 52
62","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.33592153903674,"Longitude":18.059258535271329},{"HsaId":"SE5564844115-106Q","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Vackra-Tander-Norrmalm/","DisplayName":"AB
Vackra Tänder, Norrmalm","Address":"Drottninggatan 71 A, 3
tr,","PhoneNumber":"08-21 52
62","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.33592396728109,"Longitude":18.059118082991937},{"HsaId":"SE2321000016-97P2","FriendlyUrl":"/hitta-vard/kontaktkort/ABA-Ogonklinik-i-Alvik/","DisplayName":"ABA
Ögonklinik i Alvik","Address":"Tranebergsplan
3,,BROMMA","PhoneNumber":"08-124 440
10","HasMvkServices":true,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.33516807973394,"Longitude":17.978288641135208}],"HasZeroHits":false}</p>
</blockquote>
<p>我会很感激一些能让我开始的初始代码行</p>