自动排课优化爬虫？

3条回答

网友

1楼 · 编辑于 2024-07-03 06:21:06

BeautifulSoup在这里提到过几次，例如get-list-of-xml-attribute-values-in-python。在

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:
Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.
Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."
Valuable data that was once locked up in poorly-designed websites is now within your reach. Projects that would have taken hours take only minutes with Beautiful Soup.

网友

2楼 · 编辑于 2024-07-03 06:21:06

取决于你计划采取多大的#6，以及数据集有多大，它可能是非常重要的；对我来说，它肯定有NP难全局优化的味道。。。在

不过，如果你说的是几十个（而不是几百个）节点，那么一个相当愚蠢的算法应该能提供足够好的性能。在

所以，你有两个限制：

按分数对班级进行总排序；这是灵活的。在
阶级冲突；这是不灵活的。在

我所说的“灵活”是指你可以上间隔较大的班级（分数较低），但你不能同时上两个班。有趣的是，分数和冲突之间可能存在正相关；分数越高的班级越容易发生冲突。在

我第一次通过一个算法：

selected_classes = []
classes = sorted(classes, key=lambda c: c.score)
for clas in classes:
    if not clas.clashes_with(selected_classes):
        selected_classes.append(clas)

如果课程的长度不均衡，在奇怪的时间开始等等，计算冲突可能会很尴尬。将开始时间和结束时间映射为时间“块”的简化表示（每15分钟/30分钟或任何需要的时间），可以更容易地查找不同类的开始和结束之间的重叠。在

网友

3楼 · 编辑于 2024-07-03 06:21:06

这里的问题太多了。在

请把这个问题分成几个主题领域，并就每个主题提出具体问题。请把重点放在其中一个问题上。请定义你的术语：“最好”并不意味着没有特定的度量来优化。在

以下是我在你的主题列表中看到的。在

刮取HTML
1使用企业登录引擎登录网站
2查找我当前学期及其相关科目（预科）
3导航到右侧页面，从每个相关主题（讲座、实践和研讨会时间）中获取数据
4去除无用信息的数据
一些算法以“排名”为基础，寻找一个“最佳时间”。由于这些术语是未定义的，因此几乎不可能对此提供任何帮助。在
5对彼此距离较近的班级排名较高，随机日的排名较低
6解决最佳时间表解决方案
输出一些东西。在
7向我输出最佳案例信息的详细列表
8将可能的类信息的详细列表输出给我（例如，有些可能已满）
优化某物，寻找“最佳”。另一个无法定义的术语。在
9让程序自动选择最好的类
继续检查，看看能不能达到7。

顺便说一句，Python有“lists”。不管他们是否“链接”并不能真正进入其中。在

相关问题更多 >

编程相关推荐

热门问题

热门文章