擅长:python、mysql、java
<p>你可以用一点regex来获得学生年龄,而不是任何html标记</p>
<pre><code>soup = BeautifulSoup(html, "html.parser")
allA = soup.find("div", {"class" : "overview"}).find_all("a")
classInfo = {}
currentClass = None
for item in allA:
if item['class'] == ['course_name']:
classInfo[item.text] = []
currentClass = item.text
else:
classInfo[currentClass] += [(item.text, int(re.search(item.text + r"</a> (\d+)", html).group(1)))]
print(classInfo)
</code></pre>
<p>这将输出:</p>
<pre><code>{'English101': [('Sarah', 16), ('Nancy', 17), ('Casey', 17)], 'Math101': [('Mark', 17), ('Alex', 18)]}
</code></pre>