<p>使用selenium来获取页面源代码(这样就可以获得js/ajax的真正内容)和<a href="http://www.crummy.com/software/BeautifulSoup/" rel="nofollow">BeautifulSoup</a>之类的东西来解析它。</p>
<pre><code>from bs4 import BeautifulSoup
soup = BeautifulSoup("""<div class="general_table">
<div class="general_s">
<div class="general_text1">Name</div>
<div class="general_text2">Abhishek</div>
</div>
<div class="general_m">
<div class="general_text1">Last Name</div>
<div class="general_text2">Kulkarni</div>
</div>
<div class="general_s">
<div class="general_text1">Phone</div>
<div class="general_text2"> 13613123</div>
</div>
<div class="general_m">
<div class="general_text1">Cell Phone</div>
<div class="general_text2">82928091</div>
</div>
<div class="general_s">
<div class="general_text1">City</div>
<div class="general_text2"></div>
</div>
<div class="general_m">
<div class="general_text1">Model</div>
<div class="general_text2"> DELL PERC H700</div>
</div>
</div>""")
def tags(iterable):
return filter(lambda x: not isinstance(x, basestring), iterable)
for table in soup.find_all('div', {'class': 'general_table'}):
for line in tags(table.contents):
for i, column in enumerate(tags(line.contents)):
if column.string:
print column.string.strip(),
if i:
print ',',
else:
print ':',
print ''
</code></pre>
<p>结果:</p>
<pre><code>Name : Abhishek ,
Last Name : Kulkarni ,
Phone : 13613123 ,
Cell Phone : 82928091 ,
City :
Model : DELL PERC H700 ,
</code></pre>