我想检查一个无限数量的自我生成的网址的有效性,如果有效的安全主体的响应在一个文件。URL看起来是这样的:https://mydomain.com/+随机字符串(例如https://mydomain.com/ake3t),我想用字母表“abcdefghijklmnopqrstuvwxyz0123456789”生成它们,然后用暴力尝试所有可能的方法。在
我用python编写了一个脚本,但由于我是一个绝对的初学者,它非常慢!因为我需要的东西非常快,所以我试着用了,因为我认为它是专门为这种工作。在
现在的问题是我不知道如何动态地生成url,我不能预先生成它们,因为它不是固定数量的url。在
有人能告诉我如何做到这一点,或者向我推荐另一种更适合这项工作的工具或库吗?在
更新: 这是我用的脚本,但我觉得它很慢。最让我担心的是,如果我使用多个线程(在threadsNr中指定),它会变慢
import threading, os
import urllib.request, urllib.parse, urllib.error
threadsNr = 1
dumpFolder = '/tmp/urls/'
charSet = 'abcdefghijklmnopqrstuvwxyz0123456789_-'
Url_pre = 'http://vorratsraum.com/'
Url_post = 'alwaysTheSameTail'
# class that generate the words
class wordGenerator ():
def __init__(self, word, charSet):
self.currentWord = word
self.charSet = charSet
# generate the next word set that word as currentWord and return the word
def nextWord (self):
self.currentWord = self._incWord(self.currentWord)
return self.currentWord
# generate the next word
def _incWord(self, word):
word = str(word) # convert to string
if word == '': # if word is empty
return self.charSet[0] # return first char from the char set
wordLastChar = word[len(word)-1] # get the last char
wordLeftSide = word[0:len(word)-1] # get word without the last char
lastCharPos = self.charSet.find(wordLastChar) # get position of last char in the char set
if (lastCharPos+1) < len(self.charSet): # if position of last char is not at the end of the char set
wordLastChar = self.charSet[lastCharPos+1] # get next char from the char set
else: # it is the last char
wordLastChar = self.charSet[0] # reset last char to have first character from the char set
wordLeftSide = self._incWord(wordLeftSide) # send left site to be increased
return wordLeftSide + wordLastChar # return the next word
class newThread(threading.Thread):
def run(self):
global exitThread
global wordsTried
global newWord
global hashList
while exitThread == False:
part = newWord.nextWord() # generate the next word to try
url = Url_pre + part + Url_post
wordsTried = wordsTried + 1
if wordsTried == 1000: # just for testing how fast it is
exitThread = True
print( 'trying ' + part) # display the word
print( 'At URL ' + url)
try:
req = urllib.request.Request(url)
req.addheaders = [('User-agent', 'Mozilla/5.0')]
resp = urllib.request.urlopen(req)
result = resp.read()
found(part, result)
except urllib.error.URLError as err:
if err.code == 404:
print('Page not found!')
elif err.code == 403:
print('Access denied!')
else:
print('Something happened! Error code', err.code)
except urllib.error.URLError as err:
print('Some other error happened:', err.reason)
resultFile.close()
def found(part, result):
global exitThread
global resultFile
resultFile.write(part +"\n")
if not os.path.isdir(dumpFolder + part):
os.makedirs(dumpFolder + part)
print('Found Part = ' + part)
wordsTried = 0
exitThread = False # flag to kill all threads
newWord = wordGenerator('',charSet); # word generator
if not os.path.isdir(dumpFolder):
os.makedirs(dumpFolder)
resultFile = open(dumpFolder + 'parts.txt','a') # open file for append
for i in range(threadsNr):
newThread().start()
你想要暴力还是随机?下面是一个具有重复字符的顺序暴力方法。速度将在很大程度上取决于您的服务器响应。还要注意,这很可能会很快产生拒绝服务条件。在
你不能检查“无限数量的网址”而不是“非常慢”,初学者或没有
scraper占用的时间几乎肯定是由您访问的服务器的响应时间决定的,而不是由脚本的效率决定的。在
你到底想干什么?在
相关问题 更多 >
编程相关推荐