<p>在标记为重复之前,我搜索并尝试了SO上找到的其他解决方案,这些解决方案包括:</p>
<ol>
<li><a href="https://stackoverflow.com/questions/40985060/scrapy-css-selector-get-text-of-all-inner-tags/40985082">scrapy css selector: get text of all inner tags</a></li>
<li><a href="https://stackoverflow.com/questions/26631196/how-to-get-the-text-from-child-nodes-if-it-is-parents-to-other-node-in-scrapy-us">How to get the text from child nodes if it is parents to other node in Scrapy using XPath</a></li>
<li><a href="https://stackoverflow.com/questions/26564843/scrapy-get-the-entire-text-including-children">scrapy get the entire text including children</a></li>
</ol>
<p>我要提取的HTML是:</p>
<pre><code><span class="location">
Mandarin Oriental Hotel
<a class="" href="/search-results/Jalan+Pinang%252C+Kuala+Lumpur+City+Centre%252C+50088+Kuala+Lumpur%252C+Wilayah+Persekutuan./?state=Kuala+Lumpur" itemprop="addressRegion" title="Jalan Pinang, Kuala Lumpur City Centre, 50088 Kuala Lumpur, Wilayah Persekutuan.">
Jalan Pinang, Kuala Lumpur City Centre, 50088 Kuala Lumpur, Wilayah Persekutuan.
</a>
,
<a class="" href="/search-results/?neighbourhood=Kuala+Lumpur&state=Kuala+Lumpur" title="Kuala Lumpur">
Kuala Lumpur
</a>
,
<a class="" href="/search-results/?state=Kuala+Lumpur" title="Kuala Lumpur">
Kuala Lumpur
</a>
<span class="" itemprop="postalCode">
50088
</span>
</span>
</code></pre>
<p>我想获取//span[@class='location']中的所有文本。在</p>
<p>我试过:</p>
<ol>
<li><code>response.xpath("//span[@class='location']//text()").extract_first()</code></li>
<li><code>response.css("span.location *::text").extract_first()</code></li>
<li><code>response.css("span.location ::text").extract_first()</code></li>
</ol>
<p>它们都只返回<code>Mandarin Oriental Hotel</code>,而不是完整地址。在</p>
<p>编辑:
文本应该屈服</p>
<blockquote>
<p>Mandarin Oriental Hotel Jalan Pinang, Kuala Lumpur City Centre, 50088 Kuala Lumpur, Wilayah Persekutuan., Kuala Lumpur, Kuala Lumpur 50088</p>
</blockquote>