在对URL进行爬网时使用Python将名称列表附加到单个URL

2024-05-06 14:29:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在寻找创建一个脚本,将抓取我的网址,并将一个名称追加到网址的结尾在搜索。例如192.168.1.100/map/foo

然后我想解析响应,如果状态代码是200,内容长度是83。我想把它输出到一个文本文件。如果这两个条件都不匹配,那么我将跳过URL的打印。在

这是我的粗略想法。我在找一些大方向。在

我从URL开始,根据参数长度,我可以读取数组或列表。然后我将解析响应并查找条件。如果为true,我会将URL写入文本文档,否则我将继续循环。在

有什么想法?我不希望你写我的代码只是为了给我指明方向。在

谢谢

这是基于@德国石油规范的最新版本。在

import requests
import urlparse

url = "http://192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
              "register","test","users"]

with open("/root/Desktop/urls.txt", 'a') as urls:
    for name in names:
       newUrl = url + name
       r = requests.get(newUrl)
       c = r.content
       if r.status_code == 200 and (("Unknown" in c) <>1):
            urls.write(url + '\n')

哇哦!!!我犯了个愚蠢的错误。如果我在url中添加http://会有帮助。。。。 也不需要做urlparse.urljoin因为它切断了我的完整网址。在

这就是为什么我喜欢我超简单的C-ide:-D


Tags: 代码nameinimporthttpurlmapnames
2条回答

对于以下内容,您必须安装requestshttp://docs.python-requests.org/en/latest/user/install/#install

第一个

import requests
import urlparse

baseurl= "192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
              "register","test","users"]

for name in names:
    print urlparse.urljoin(baseurl, name)

提供以下输出:

^{pr2}$

然后可以使用get call更新代码:

import requests
import urlparse

baseurl = "192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
              "register","test","users"]

with open("C:\\urls.txt", 'a') as urls:    
    for name in names:
        url = urlparse.urljoin(baseurl, name)
        r = requests.get(url)
        if r.status_code == 200 and int(r.headers['content-length']) > 73: 
           urls.write(url) #Write it to the file

多亏了德国人,他指点了我。在

import requests
import urlparse

url = "http://192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
              "register","test","users"]

with open("/root/Desktop/urls.txt", 'a') as urls:
    for name in names:
       newUrl = url + name
       r = requests.get(newUrl)
       c = r.content
       if r.status_code == 200 and (("Unknown" in c) <>1):
            urls.write(url + '\n')

相关问题 更多 >