将文本行分组,如果A=B和B=C,则A=C

2024-09-27 20:18:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在处理dig命令的输出解析。该命令输出规范名称,然后输出最后一条记录的实际IP。你知道吗

例如,解析dig mail.yahoo.com执行以下操作:

borrajax@borrajax.kom /tmp/ $ dig @8.8.8.8 @4.2.2.2 +nocomments \
     +noquestion +noauthority +noadditional \
     +nostats +nocmd mail.yahoo.com

mail.yahoo.com.     0   IN  CNAME   login.yahoo.com.
login.yahoo.com.    0   IN  CNAME   ats.login.lgg1.b.yahoo.com.
ats.login.lgg1.b.yahoo.com. 0   IN  CNAME   ats.member.g02.yahoodns.net.
ats.member.g02.yahoodns.net. 0  IN  CNAME   any-ats.member.a02.yahoodns.net.
any-ats.member.a02.yahoodns.net. 49 IN  A   98.139.21.169

所以我想说mail.yahoo.com分解成98.139.21.169,为了做到这一点,我需要将mail.yahoo.com合并成login.yahoo.com,然后login.yahoo.com合并成ats.login.lgg1.b.yahoo.com。。。等。。。直到到达最后一条A记录。你知道吗

another question中,我已经有了一个很好的regexp来解析dig的输出,因此我可以很好地清理这些行并将它们存储到一个列表中:

[
    ('mail.yahoo.com', 'CNAME', 'login.yahoo.com'),
    ('login.yahoo.com', 'CNAME', 'ats.login.lgg1.b.yahoo.com'),
    ('ats.login.lgg1.b.yahoo.com', 'CNAME', 'ats.member.g02.yahoodns.net'),
    ('ats.member.g02.yahoodns.net', 'CNAME', 'any-ats.member.a02.yahoodns.net'),
    ('any-ats.member.a02.yahoodns.net', 'A', '98.139.21.169')
]

问题是:我怎样才能有效地,以一般的方式,所以如果我在CNAME之间有一些其他的随机线,它也会工作:

[
    ('mail.yahoo.com', 'CNAME', 'login.yahoo.com'),
    ('foo.com', 'CNAME', 'baz.com'),    # Wooops, watch out!
    ('login.yahoo.com', 'CNAME', 'ats.login.lgg1.b.yahoo.com'),
    ('ats.login.lgg1.b.yahoo.com', 'CNAME', 'ats.member.g02.yahoodns.net'),
    ('baz.com', 'A', '204.236.134.199'), # Wooops, watch out!
    ('ats.member.g02.yahoodns.net', 'CNAME', 'any-ats.member.a02.yahoodns.net'),
    ('any-ats.member.a02.yahoodns.net', 'A', '98.139.21.169')
]

所需的输出为:

  • mail.yahoo.com解析为98.139.21.169
  • foo.com解析为204.236.134.199

当然,我可以去检查所有的CNAMES以及每次我找到一个时它的实际解析,但那将是O(n^2)。。。太可怕了。你知道吗

我肯定有更好的办法,但我想不出任何办法。提前谢谢你的建议。你知道吗


Tags: incomnetloginanymailyahoomember
2条回答

以下是我的解决方案(有关算法的更多信息,请参阅注释):

import copy

def resolve(arr):
    # create an index for easy access of the urls
    index = {item[0]: item[2] for item in arr}
    # copy the index 
    mapping = copy.copy(index)

    # loop through the index
    for index_key in index: 
        # get the current value
        value = index[index_key]
        # loop through the mapping as long as the final ip address is reached
        # but only if this url wasn't found before
        while value in mapping:
            # remember the new key (so it can be deleted afterwards)
            key = value
            # get the new value
            value = mapping[key]
            # save the found value as the new value (for later use)
            # this reduces the complexity (-> better performance)
            mapping[index_key] = value
            # delete the "one in the middle" out of the mapping array
            # so that the next item don't have to search for 
            # the correct mapping (because the mapping has been found already)
            del mapping[key]

    return mapping

通过这个脚本,您可以看到无论列表如何排序,它都会生成相同的输出:

import random

data = [
    ('mail.yahoo.com', 'CNAME', 'login.yahoo.com'),
    ('foo.com', 'CNAME', 'baz.com'),    # Wooops, watch out!
    ('login.yahoo.com', 'CNAME', 'ats.login.lgg1.b.yahoo.com'),
    ('ats.login.lgg1.b.yahoo.com', 'CNAME', 'ats.member.g02.yahoodns.net'),
    ('baz.com', 'A', '204.236.134.199'), # Wooops, watch out!
    ('ats.member.g02.yahoodns.net', 'CNAME', 'any-ats.member.a02.yahoodns.net'),
    ('any-ats.member.a02.yahoodns.net', 'A', '98.139.21.169')
]

# test 50 times
for x in xrange(50):
    # shuffle the data array
    random.shuffle(data)

    print resolve(data)

我将构建一个dict并从那里解析链:

data = [
    ('mail.yahoo.com', 'CNAME', 'login.yahoo.com'),
    ('foo.com', 'CNAME', 'baz.com'),    # Wooops, watch out!
    ('login.yahoo.com', 'CNAME', 'ats.login.lgg1.b.yahoo.com'),
    ('ats.login.lgg1.b.yahoo.com', 'CNAME', 'ats.member.g02.yahoodns.net'),
    ('baz.com', 'A', '204.236.134.199'), # Wooops, watch out!
    ('ats.member.g02.yahoodns.net', 'CNAME', 'any-ats.member.a02.yahoodns.net'),
    ('any-ats.member.a02.yahoodns.net', 'A', '98.139.21.169')
]

data = { t[0]:t[1:] for t in data }

def lookup(host):
    record_type = None
    while record_type != 'A':
        record_type, host = data[host]
    return host

assert lookup('mail.yahoo.com') == '98.139.21.169'
assert lookup('foo.com') == lookup('baz.com') == '204.236.134.199'

相关问题 更多 >

    热门问题