擅长:python、mysql、java
<p>与其花时间在正则表达式上,不如使用为任务设计的东西。我喜欢<a href="http://www.crummy.com/software/BeautifulSoup/" rel="nofollow">BeautifulSoup</a>:</p>
<pre><code>>>> s = """
... <tr class="classo">
... <td>text1</td>
... <td class="dot">text2 </td>
... <td>text3</td>
... <td class="dot"> text4</td>
... <td class="dot">text4</td>
... </tr>
... """
>>>
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(s)
>>> soup.find_all("td")
[<td>text1</td>, <td class="dot">text2 </td>, <td>text3</td>, <td class="dot"> text4</td>, <td class="dot">text4</td>]
>>> [tag.text for tag in soup.find_all("td")]
[u'text1', u'text2 ', u'text3', u' text4', u'text4']
</code></pre>