如何在Python中解析具有行跨度的HTML表?

2024-05-09 02:22:50 发布

您现在位置:Python中文网/ 问答频道 /正文

问题

我试图解析一个包含rowspan的HTML表,就像在分析我的大学日程安排一样。在

我遇到了这样一个问题:如果最后一行包含一个rowspan,那么下一行将缺少一个TD,其中rowspan现在是缺少的TD。在

我不知道如何解释这一点,我希望能够解析这个时间表。在

我尝试了什么

几乎所有我能想到的。在

我得到的结果

[
    {
        'blok_eind': 4,
        'blok_start': 3,
        'dag': 4, # Should be 5
        'leraar': 'DOODF000',
        'lokaal': 'ALK C212',
        'vak': 'PROJ-T',
    },
]

如您所见,在上面的输出片段中有一个值为PROJ-Tvak键,dag是{},而它应该是5(又称星期五/Vrijdag),如下所示:

Table

我想要的结果

一个Python dict(),看起来像上面发布的,但是值正确

其中:

  • day/dag是1~5的整数,表示星期一到星期五
  • block_start/blok_start是一个表示课程何时开始的int(时间块,表的左侧)
  • block_end/blok_eind是一个表示课程结束的块的整数
  • classroom/lokaal是课程所在的教室代码
  • teacher/leraar是教师的ID
  • course/vak是课程的ID

上述数据的基本HTML结构

^{pr2}$

代码

HTML

<CENTER><font size="3" face="Arial" color="#000000">
<BR></font>
  <font size="6" face="Arial" color="#0000FF">
16AO4EIO1B
&nbsp;</font> <font size="4" face="Arial">
IO1B
</font>
  <BR>
  <TABLE border="3" rules="all" cellpadding="1" cellspacing="1">
    <TR>
      <TD align="center">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial" color="#000000">
Maandag 29-08
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
Dinsdag 30-08
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
Woensdag 31-08
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
Donderdag 01-09
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
Vrijdag 02-09
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>1</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
8:30
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
9:20
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
BLEEJ002
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B021</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
WEBD
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>2</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
9:20
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
10:10
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
BLEEJ002
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B021B</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
WEBD
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>3</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
10:25
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
11:15
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
DOODF000
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK C212</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
PROJ-T
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>4</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
11:15
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
12:05
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
BLEEJ002
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B021B</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
MENT
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>5</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
12:05
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
12:55
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>6</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
12:55
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
13:45
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
JONGJ003
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B008</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
BURG
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>7</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
13:45
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
14:35
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
FLUIP000
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B004</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
ICT algemeen  Prakti
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>8</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
14:50
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
15:40
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=4 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
KOOLE000
</font> </TD>
            <TD width="50%" nowrap=1><font size="2" face="Arial">
<B>ALK B008</B>
</font> </TD>
          </TR>
          <TR>
            <TD colspan="2" width="50%" nowrap=1><font size="2" face="Arial">
NED
</font> </TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>9</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
15:40
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
16:30
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
    <TR>
      <TD rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD align="center" rowspan="2" nowrap=1><font size="3" face="Arial">
<B>10</B>
</font> </TD>
            <TD align="center" nowrap=1><font size="2" face="Arial">
16:30
</font> </TD>
          </TR>
          <TR>
            <TD align="center" nowrap=1><font size="2" face="Arial">
17:20
</font> </TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
      <TD colspan=12 rowspan=2 align="center" nowrap="1">
        <TABLE>
          <TR>
            <TD></TD>
          </TR>
        </TABLE>
      </TD>
    </TR>
    <TR>
    </TR>
  </TABLE>
  <TABLE cellspacing="1" cellpadding="1">
    <TR>
      <TD valign=bottom> <font size="4" face="Arial" color="#0000FF"></TR></TABLE><font size="3" face="Arial">
Periode1   29-08-2016 (35) - 04-09-2016 (35)   G r u b e r  &amp;  P e t t e r s   S o f t w a r e
</font></CENTER>

Python

from pprint import pprint
from bs4 import BeautifulSoup
import requests

r = requests.get("http://rooster.horizoncollege.nl/rstr/ECO/AMR/400-ECO/Roosters/36"
                 "/c/c00025.htm")
daytable = {
    1: "Maandag",
    2: "Dinsdag",
    3: "Woensdag",
    4: "Donderdag",
    5: "Vrijdag"
}
timetable = {
    1: ("8:30", "9:20"),
    2: ("9:20", "10:10"),
    3: ("10:25", "11:15"),
    4: ("11:15", "12:05"),
    5: ("12:05", "12:55"),
    6: ("12:55", "13:45"),
    7: ("13:45", "14:35"),
    8: ("14:50", "15:40"),
    9: ("15:40", "16:30"),
    10: ("16:30", "17:20"),
}

page = BeautifulSoup(r.content, "lxml")

roster = []
big_rows = 2
last_row_big = False
# There are 10 blocks, each made up out of 2 TR's, run through them
for block_count in range(2, 22, 2):
    # There are 5 days, first column is not data we want
    for day in range(2, 7):
        dayroster = {
            "dag": 0,
            "blok_start": 0,
            "blok_eind": 0,
            "lokaal": "",
            "leraar": "",
            "vak": ""
        }
        # This selector provides the classroom
        table_bold = page.select(
            "html > body > center > table > tr:nth-of-type(" + str(block_count) + ") > td:nth-of-type(" + str(
                day) + ") > table > tr > td > font > b")

        # This selector provides the teacher's code and the course ID
        table = page.select(
            "html > body > center > table > tr:nth-of-type(" + str(block_count) + ") > td:nth-of-type(" + str(
                day) + ") > table > tr > td > font")

        # This gets the rowspan on the current row and column
        rowspan = page.select(
            "html > body > center > table > tr:nth-of-type(" + str(block_count) + ") > td:nth-of-type(" + str(
                day) + ")")

        try:
            if table or table_bold and rowspan[0].attrs.get("rowspan") == "4":
                last_row_big = True
                # Setting end of class
                dayroster["blok_eind"] = (block_count // 2) + 1
            else:
                last_row_big = False
                # Setting end of class
                dayroster["blok_eind"] = (block_count // 2)
        except IndexError:
            pass

        if table_bold:
            x = table_bold[0]
            # Classroom ID
            dayroster["lokaal"] = x.contents[0]

        if table:
            iter = 0
            for x in table:
                content = x.contents[0].lstrip("\r\n").rstrip("\r\n")
                # Cell has data
                if content != "":
                    # Set start of class
                    dayroster["blok_start"] = block_count // 2
                    # Set day of class
                    dayroster["dag"] = day - 1
                    if iter == 0:
                        # Teacher ID
                        dayroster["leraar"] = content
                    elif iter == 1:
                        # Course ID
                        dayroster["vak"] = content
                    iter += 1

        if table or table_bold:
            # Store the data
            roster.append(dayroster)

# Remove duplicates
seen = set()
new_l = []
for d in roster:
    t = tuple(d.items())
    if t not in seen:
        seen.add(t)
        new_l.append(d)
pprint(new_l)

Tags: ofsizetablewidthtrtdfacecenter
2条回答

也许最好使用bs4内置函数来解析表,比如“findAll”。在

您可以使用以下代码:

from pprint import pprint
from bs4 import BeautifulSoup
import requests

r = requests.get("http://rooster.horizoncollege.nl/rstr/ECO/AMR/400-ECO/Roosters/36"
                 "/c/c00025.htm")

content=r.content
page = BeautifulSoup(content, "html")
table=page.find('table')
trs=table.findAll("tr", {},recursive=False)
tr_count=0
trs.pop(0)
final_table={}

for tr in trs:
    tds=tr.findAll("td", {},recursive=False)
    if tds:
        td_count=0
        tds.pop(0)
        for td in tds:
            if td.has_attr('rowspan'):                              
                final_table[str(tr_count)+"-"+str(td_count)]=td.text.strip()
                if int(td.attrs['rowspan'])==4:
                    final_table[str(tr_count+1)+"-"+str(td_count)]=td.text.strip()
                if final_table.has_key(str(tr_count)+"-"+str(td_count+1)):
                    td_count=td_count+1         
            td_count=td_count+1
        tr_count=tr_count+1

roster=[]
for i in range(0,10): #iterate over time
    for j in range(0,5): #iterate over day
        item=final_table[str(i)+"-"+str(j)]
        if len(item)!=0:    
            block_eind=i+1          

            try:
                if final_table[str(i+1)+"-"+str(j)]==final_table[str(i)+"-"+str(j)]:
                        block_eind=i+2
            except:
                pass

            try:
                lokaal=item.split('\r\n \n\n')[0]
                leraar=item.split('\r\n \n\n')[1].split('\n \n\r\n')[0]
                vak=item.split('\n \n\r\n')[1]
            except:
                lokaal=leraar=vak=" -"

            dayroster = {
                "dag": j+1,
                "blok_start": i+1,
                "blok_eind": block_eind,
                "lokaal": lokaal,
                "leraar": leraar,
                "vak": vak
            }


            dayroster_double = {
                "dag": j+1,
                "blok_start": i,
                "blok_eind": block_eind,
                "lokaal": lokaal,
                "leraar": leraar,
                "vak": vak
            }

            #use to prevent double dict for same event
            if dayroster_double not in roster:
                roster.append(dayroster)

print (roster)

您必须跟踪前几行的行跨距,每列一行。在

您可以简单地将rowspan的整数值复制到字典中,随后的行将rowspan值递减,直到它降到1(或者,为了便于编码,我们可以将整数值减1并放到0)。然后,可以根据前面的行跨度调整后续表计数。在

您的表使用了一个默认的范围大小2,每两步递增,但可以很容易地将其除以2,从而使这一点变得更加复杂。在

不要使用大量的CSS选择器,只选择表行,我们将迭代这些行:

roster = []
rowspans = {}  # track rowspanning cells
# every second row in the table
rows = page.select('html > body > center > table > tr')[1:21:2]
for block, row in enumerate(rows, 1):
    # take direct child td cells, but skip the first cell:
    daycells = row.select('> td')[1:]
    rowspan_offset = 0
    for daynum, daycell in enumerate(daycells, 1):
        # rowspan handling; if there is a rowspan here, adjust to find correct position
        daynum += rowspan_offset
        while rowspans.get(daynum, 0):
            rowspan_offset += 1
            rowspans[daynum] -= 1
            daynum += 1
        # now we have a correct day number for this cell, adjusted for
        # rowspanning cells.
        # update the rowspan accounting for this cell
        rowspan = (int(daycell.get('rowspan', 2)) // 2) - 1
        if rowspan:
            rowspans[daynum] = rowspan

        texts = daycell.select("table > tr > td > font")
        if texts:
            # class info found
            teacher, classroom, course = (c.get_text(strip=True) for c in texts)
            roster.append({
                'blok_start': block,
                'blok_eind': block + rowspan,
                'dag': daynum,
                'leraar': teacher,
                'lokaal': classroom,
                'vak': course
            })

    # days that were skipped at the end due to a rowspan
    while daynum < 5:
        daynum += 1
        if rowspans.get(daynum, 0):
            rowspans[daynum] -= 1

这将产生正确的输出:

^{pr2}$

此外,即使课程的范围超过2个块,或只有一个块,此代码仍将继续工作;支持任何行范围大小。在

相关问题 更多 >