我试图用Python实现MS Word(2019)文档的开放XML documentProtection哈希保护,以测试哈希算法。所以我创建了一个Word文档,用这个密码保护它不被编辑:johnjohn。然后,以ZIP/XML格式打开文档,我在documentProtection
部分看到以下内容:
<w:documentProtection w:edit="readOnly" w:enforcement="1" w:cryptProviderType="rsaAES" w:cryptAlgorithmClass="hash" w:cryptAlgorithmType="typeAny" w:cryptAlgorithmSid="14" w:cryptSpinCount="100000" w:hash="pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw==" w:salt="pH1TDVHSfGBxkd3Q88UNhQ==" />
根据开放式XML文档(ECMA-376-1:2016#17.15.1.29):
cryptAlgorithmSid="14"
指向SHA-512算法cryptSpinCount="100000"
意味着散列必须在100k轮中完成,使用以下algoright(引用上述标准):Specifies the number of times the hashing function shall be iteratively run (runs using each iteration's result plus a 4 byte value (0-based, little endian) containing the number of the iteration as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute.
用于哈希运算的BASE64编码salt(“pH1TDVHSfGBxkd3Q88UNhQ==”)在原始密码之前。目标BASE64编码哈希必须为“pvjr9kto9vlxijxcplh+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw=”
因此,我的Python脚本尝试使用下面描述的算法生成相同的哈希值:
import hashlib
import base64
import struct
TARGET_HASH = 'pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw=='
TARGET_SALT = 'pH1TDVHSfGBxkd3Q88UNhQ=='
bsalt = base64.b64decode(TARGET_SALT)
def hashit(what, alg='sha512', **kwargs):
if alg == 'sha1':
return hashlib.sha1(what)
elif alg == 'sha512':
return hashlib.sha512(what)
# etc...
else:
raise Exception(f'Unsupported hash algorithm: {alg}')
def gethash(data, salt=None, alg='sha512', iters=100000, base64result=True, returnstring=True):
# encode password in UTF-16LE
# ECMA-376-1:2016 17.15.1.29 (p. 1026)
if isinstance(data, str): data = data.encode('utf-16-le')
# prepend salt if provided
if not salt is None:
if isinstance(salt, str): salt = salt.encode('utf-16-le')
ghash = salt + data
else:
ghash = data
# hash iteratively for 'iters' rounds
for i in range(iters):
try:
# next hash = hash(previous data) + 4-byte integer (previous round number) with LE byte ordering
# ECMA-376-1:2016 17.15.1.29 (p. 1020)
ghash = hashit(ghash, alg).digest() + struct.pack('<I', i)
except Exception as err:
print(err)
break
# remove trailing round number bytes
ghash = ghash[:-4]
# BASE64 encode if requested
if base64result:
ghash = base64.b64encode(ghash)
# return as an ASCII string if requested
if returnstring:
ghash = ghash.decode()
return ghash
但是当我跑的时候
print(gethash('johnjohn', bsalt))
我得到以下哈希值,它不等于目标哈希值:
G47RT4/+JdE6pnrP6MqUKa3JyL8abeYSCX+E4+9J+6shiZqImBJ8M6bb+IMKEdvKd6+9dVnQ3oeOsgQz/aCdcQ==
我的实现可能在某个地方出错,或者您认为低级散列函数实现(Python的hashlib与开放XML)有区别吗
我意识到Word使用传统算法来预处理密码(与旧版本兼容)。该算法在ECMA-376-1:2016第4部分(过渡迁移特性,#14.8.1“遗留密码哈希算法”)中有详细描述。因此,我成功地制作了一个脚本,再现了官方的ECMA示例:
def strtobytes(s, trunc=15):
b = s.encode('utf-16-le')
# remove BOM symbol if present
if b[0] == 0xfeff: b = b[1:]
pwdlen = min(trunc, len(s))
if pwdlen < 1: return None
return bytes([b[i] or b[i+1] for i in range(0, pwdlen * 2, 2)])
def process_pwd(pwd):
# 1. PREPARE PWD STRING (TRUNCATE, CONVERT TO BYTES)
pw = strtobytes(pwd) if isinstance(pwd, str) else pwd[:15]
pwdlen = len(pw)
# 2. HIGH WORD CALC
HW = InitialCodeArray[pwdlen - 1]
for i in range(pwdlen):
r = 15 - pwdlen + i
for ibit in range(7):
if (pw[i] & (0x0001 << ibit)):
HW ^= EncryptionMatrix[r][ibit]
# 3. LO WORD CALC
LW = 0
for i in reversed(range(pwdlen)):
LW = (((LW >> 14) & 0x0001) | ((LW << 1) & 0x7FFF)) ^ pw[i]
LW = (((LW >> 14) & 0x0001) | ((LW << 1) & 0x7FFF)) ^ pwdlen ^ 0xCE4B
# 4. COMBINE AND REVERSE
return bytes([LW & 0xff, LW >> 8, HW & 0xff, HW >> 8])
所以当我做process_pwd('Example')
时,我得到了ECMA(0x7EEDCE64
)中所说的内容。散列函数也被修改了(正如我在论坛上发现的那样,最初的SALT+散列不应该包含在主迭代循环中):
def gethash(data, salt=None, alg='sha512', iters=100000, base64result=True, returnstring=True):
def hashit(what, alg='sha512'):
return getattr(hashlib, alg)(what)
# encode password with legacy algorithm if a string is given
if isinstance(data, str):
data = process_pwd(data)
if data is None:
print('WRONG PASSWORD STRING!')
return None
# prepend salt if provided
if not salt is None:
if isinstance(salt, str):
salt = process_pwd(salt)
if salt is None:
print('WRONG SALT STRING!')
return None
ghash = salt + data
else:
ghash = data
# initial hash (salted)
ghash = hashit(ghash, alg).digest()
# hash iteratively for 'iters' rounds
for i in range(iters):
try:
# next hash = hash(previous data + 4-byte integer (previous round number) with LE byte ordering)
# ECMA-376-1:2016 17.15.1.29 (p. 1020)
ghash = hashit(ghash + struct.pack('<I', i), alg).digest()
except Exception as err:
print(err)
return None
# BASE64 encode if requested
if base64result:
ghash = base64.b64encode(ghash)
# return as an ASCII string if requested
if returnstring:
ghash = ghash.decode()
return ghash
无论我多次重新检查这段代码,我都看不到更多的错误。但我仍然无法在测试Word文档中复制目标哈希:
myhash = gethash('johnjohn', base64.b64decode('pH1TDVHSfGBxkd3Q88UNhQ=='))
print(myhash)
print(TARGET_HASH == myhash)
我得到:
wut2VOpT+X8pKXky6u/+YtwRX2inDv1WVC8FtZcdxKsyX0gHNBJGYwBgV8xzq7Rke/hWMfWe9JVvqDQAZ11A5w==
False
我们今天也必须看到这一点,并设法对其进行反向工程
简单来说,步骤如下:
low-order word = (((low-order word >> 14) AND 0x0001) | (low-order word << 1) & 0x7FFF)) ^ character (byte)
(<;,>;,分别是位左移位运算符和位右移位运算符。|,&;,^分别是按位or、and和异或。)low-order word = (((low-order word >> 14) & 0x0001) | (low-order word << 1) & 0x7FFF)) ^ password length ^ 0xCE4B.
下面是我在C#(NuGet)中的实现:
我使用您的示例哈希对其进行了测试,并检查其是否通过:
相关问题 更多 >
编程相关推荐