擅长:python、mysql、java
<p>您可以使用<code>curl</code>命令读取机器人.txt将文件拆分为一个字符串,用新行检查允许和不允许的URL。在</p>
<pre><code>import os
result = os.popen("curl https://fortune.com/robots.txt").read()
result_data_set = {"Disallowed":[], "Allowed":[]}
for line in result.split("\n"):
if line.startswith('Allow'): # this is for allowed url
result_data_set["Allowed"].append(line.split(': ')[1].split(' ')[0]) # to neglect the comments or other junk info
elif line.startswith('Disallow'): # this is for disallowed url
result_data_set["Disallowed"].append(line.split(': ')[1].split(' ')[0]) # to neglect the comments or other junk info
print (result_data_set)
</code></pre>