擅长:python、mysql、java
<p>您可以使用regex来解析<code>href</code>,但我太懒了,没法写。请参阅下面的<code>href_parse</code>,以了解在检索URI后解析查询字符串的正确方法:</p>
<pre><code>from urlparse import urlparse
from urlparse import parse_qs
def href_parse(value):
if (value.startswith('javascript: OpenWindow(&#39;') and
value.endswith('&#39;)'):
begin_length = len('javascript: OpenWindow(&#39;')
end_length = len('&#39;)')
file_location = value[begin_length:-end_length]
query_string = urlparse(file_location).query
query_dict = parse_qs(query_string)
return query_dict.get('fileId', None)
href_data = [[href_parse(td.find('a', attrs={'class': 'blue'})['href'])
for td in tr.findAll("td")]
for tr in rows]
print href_data
</code></pre>