用Python建立一个网络爬虫垃圾邮件列表

>>> spam_dict = {} >>> spam('http://reed.cs.depaul.edu/lperkovic/csc242/test1.html',0) >>> spam_dict.keys() dict_keys([]) >>> spam_dict = {} >>> spam('http://reed.cs.depaul.edu/lperkovic/csc242/test1.html',1) >>> spam_dict.keys() dict_keys(['lperkovic@cs.depaul.edu', 'nobody@xyz.com'])

from re import findall def emails(url): links = str(links3(url)) # how do I construct pattern? pattern='[A-Za-z0-9_.]+\@[A-Za-z0-9_.]+.com\.edu\.org lst = findall(pattern,links) print(lst)

2条回答

网友

1楼 · 编辑于 2024-09-30 02:20:00

想想递归是如何工作的。你想要的是你的函数在某些情况下能够调用它自己。在这种情况下，您需要为您的函数添加递归级别的参数，然后您需要确定它在各种情况下应该做什么？在

在最基本的层面上，它应该如何处理n=0？（提示：你已经准备好了）

如果n=1，它应该怎么做？您可能希望在n=0的现有列表的每个元素上再次调用函数。在

如果n大于1呢？您需要再次调用函数，对到目前为止得到的每个元素使用n=n-1。在

网友

2楼 · 编辑于 2024-09-30 02:20:00

正如问题所述，n将通过将递归限制到最大“调用深度”来发挥作用。在

这样做的想法是，由于您递归地调用扫描已在运行的扫描中的电子邮件，所以您将构建一个调用堆栈，该堆栈将在您继续递归调用扫描程序时变得越来越深。在

你不希望它永远持续下去，所以作为一个参数，你传递一个整数，你每次调用时都要递减。当它达到0时，您将停止执行递归调用，并让递归序列自行展开。在

call 1 (args...., n=3)
   call 2a (args...., n=2)
       call 3 (args...., n=1)
            call 4a (args..., n=0) <  these calls won't call more scans
            call 4b (args..., n=0) <  because n=0, so this is max depth
   call 2b (args...., n=2)

作业：

注：

用法：

添加：

相关问题更多 >

编程相关推荐

热门问题

热门文章