<p>您不需要正则表达式,只需解析锚定标记即可获得名称,并调用<code>next_sibling</code>来获得年龄文本拆分和剥离来获得年龄文本,找到<code>coursestudent</code>之前的<code>course_name</code>也将为您提供相关课程:</p>
<pre><code>h = """<div class="overview">
<span class="course_titles">Courses:</span>
<a href="/schools/courses/173/" class="course_name">Math101</a> (Math; Monday; Room 10);
<a href="/schools/student/1388/" class="coursestudent_name">Mark</a> 17,
<a href="/schools/student/1401/" class="coursestudent_name">Alex</a> 18, ),
<a href="/schools/courses/2693/" class="course_name">English101</a> (English; Thursdays; Room 12);
<a href="/schools/student/1403/" class="coursestudent_name">Sarah</a> 16,
<a href="/schools/student/1411/" class="coursestudent_name">Nancy</a> 17,
<a href="/schools/student/1390/" class="coursestudent_name">Casey</a> 17 ),
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(h)
data = [[a.find_previous("a", "course_name").text ,a.text, a.next_sibling.split()[0].strip(",")] for a in soup.select("div.overview a.coursestudent_name")]
[[u'Math101', u'Mark', u'17'], [u'Math101', u'Alex', u'18'], [u'English101', u'Sarah', u'16'], [u'English101', u'Nancy', u'17'], [u'English101', u'Casey', u'17']]
</code></pre>