擅长:python、mysql、java
<p>可以使用<code>BeautifulSoup</code>提取<code>html img</code>标记的<code>src</code>属性。在我的示例中,<code>htmlText</code>包含<code>img</code>标记本身,但这也可以与<code>urllib2</code>一起用于URL。</p>
<p><strong>对于URL</strong></p>
<pre><code>from BeautifulSoup import BeautifulSoup as BSHTML
import urllib2
page = urllib2.urlopen('http://www.youtube.com/')
soup = BSHTML(page)
images = soup.findAll('img')
for image in images:
#print image source
print image['src']
#print alternate text
print image['alt']
</code></pre>
<p><strong>对于带有img标签的文本</p>
<pre><code>from BeautifulSoup import BeautifulSoup as BSHTML
htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """
soup = BSHTML(htmlText)
images = soup.findAll('img')
for image in images:
print image['src']
</code></pre>