<p><code>Beautiful soup</code>是关于如何处理提取数据的,但首先要做的是:</p>
<p>这里<code>test.html</code>是您发布的内容。它有一个<code>try, catch block</code>的原因是,如果find操作失败,那么它不会打印错误而不会打印任何内容。在</p>
<pre><code>from bs4 import BeautifulSoup
soup = BeautifulSoup (open(r'd:\test.html','r'))
#print soup.prettify()
items = soup.findAll("meta")
try:
print "#How can I find all of the instances of property?"
for all_prop in items:
if all_prop['property']:
print all_prop
except:
print ""
try:
print "#How can I then extract tall and wide?"
for properties in items:
print(properties['property'])
except:
print ""
try:
print "#all of the instances of tall"
print soup.findAll('meta', attrs = {'property':'tall'})
print soup.findAll('meta', attrs = {'name':'tall'})
print ""
except:
print ""
try:
print "#How can I then extract tall?"
for just_tall in items:
if just_tall.get('property') == 'tall':
print just_tall.get('property')
if just_tall.get('name') == 'tall':
print just_tall.get('name')
except:
print ""
</code></pre>
<p>输出:</p>
^{pr2}$
<p>休息就是玩玩,但以上这些将帮助你开始。有些问题仍然模棱两可,所以我在上面举了一些例子来帮助你。在</p>
<p>教程和更多示例:<a href="http://www.nyu.edu/projects/politicsdatalab/workshops/BeautifulSoup.pdf" rel="nofollow">Link to docs</a></p>