在python中,有没有更好的方法可以迭代两个列表来查找项之间的关系?

2024-10-04 11:32:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我模拟ip列表和子网dict作为输入:

# ip address list
ip_list = [
'192.168.1.151', '192.168.10.191', '192.168.6.127', 
'192.168.2.227', '192.168.2.5', '192.168.3.237', 
'192.168.6.188', '192.168.7.209', '192.168.9.10',
# Edited: add some /28, /16 case
'192.168.12.39', '192.168.12.58', '10.63.11.1', '10.63.102.69',
]

# subnet dict
netsets = {
'192.168.1.0/24': 'subnet-A',     # {subnet: subnet's name} 
'192.168.10.0/24': 'subnet-B', 
'192.168.2.0/24': 'subnet-C', 
'192.168.3.0/24': 'subnet-C',
'192.168.6.0/24': 'subnet-D', 
'192.168.7.0/24': 'subnet-D', 
'192.168.9.0/24': 'subnet-E',
# Edited: add some /28, /16 case
'192.168.12.32/28': 'subnet-F',
'192.168.12.48/28': 'subnet-G',
'10.63.0.0/16': 'subnet-I',
}

然后,ip_list中的每个ip地址都需要找到子网的名称。在

我们假设每个ip地址都可以在netsets中找到对应的子网。在

输出如下:

^{pr2}$

我使用netaddr计算CIDR,以下是我的代码:

from netaddr import IPAddress, IPNetwork

def netaddr_test(ips, netsets):
    for ip in ips:
        for subnet, name in netsets.iteritems():
            if IPAddress(ip) in IPNetwork(subnet):
                print ip, '\t',  name
                break

netaddr_test(ip_list, netsets)

但是这段代码太慢了,迭代太多了。时间的复杂性是O(n**2)。在

一旦我们有数万个ip要迭代,这段代码花费的时间太多了。在

有没有更好的办法来解决这个问题?在


Tags: 代码nameinipadd地址somedict
3条回答

我建议避免在for循环中创建新实例。这不会降低复杂性(它会增加复杂性),但会加快netaddr_test,特别是当它被多次调用时。示例:

def _init(ips, netsets):
    """Initialize all objects"""
    new_ips = []
    new_subs = {}
    for ip in ips:
         new_ips.append(IPAddress(ip))

    for subnet, info in netsets.iteritems():

        new_subs[subnet] = {'name': info, 'subnet': IPNetwork(subnet)}

    return new_ips, new_subs

def netaddr_test(ips, netsets):
    for ip in ips:
        for stringnet, info in netsets.iteritems():
            if ip in info['subnet']:
                print ip, '\t',  info['name']
                break

ni, ns = _init(ip_list, netsets)
netaddr_test(ni, ns)

更新:用测试了上面的代码

^{pr2}$

结果:

# Original
$ time python /tmp/test.py > /dev/null

real    0m0.357s
user    0m0.345s
sys     0m0.012s

# Modified
$ time python /tmp/test2.py > /dev/null

real    0m0.126s
user    0m0.122s
sys     0m0.005s

现在,我从未使用过netaddr,所以我不确定它如何在内部处理子网。在您的例子中,您可以将子网视为一个IP范围,每个IP都是uint_32,因此您可以将所有内容转换为整数:

 # IPs now are 
 ip_list_int = [3232235927, 3232238271, ...]

 netsets_expanded = {
     '192.168.1.0/24': {'name': 'subnet-A', 'start': 3232235776, 'end': 3232236031}

netaddr可用于转换上述格式的数据。一旦到达,您的netaddr_test变成(并且仅适用于整数比较):

def netaddr_test(ips, netsets):
    for ip in ips:
        for subnet, subinfo in netsets.iteritems():
            if ip >= subinfo['start'] and ip < subinfo['end']:
                print ip, '\t',  subinfo.name
                break

我可以推荐使用经过特别优化的intervaltree模块来快速搜索。这样就可以在O(m*logn)时间内完成任务。例如:

   from intervaltree import Interval, IntervalTree
   from ipaddress import ip_network, ip_address

   # build nets tree
   netstree = IntervalTree(
                           Interval(
                                    ip_network(net).network_address, 
                                    ip_network(net).broadcast_address, 
                                    name
                                   ) 
                          for 
                          net, name 
                          in 
                          netsets.items()
                         )

   # Now you may check ip intervals     
   for i in ip_list:
       ip = ip_address(i)
       nets = netstree[ip]
       if nets:   # set is not empty
            netdata = list(nets)[0]
            print(netdata.data)
            # prints 'subnet-E'
# ip address list
ip_list = [
'192.168.1.151', '192.168.10.191', '192.168.6.127',
'192.168.2.227', '192.168.2.5', '192.168.3.237',
'192.168.6.188', '192.168.7.209', '192.168.9.10'
]

# subnet dict
netsets = {
'192.168.1.0/24': 'subnet-A',     # {subnet: subnet's name} 
'192.168.10.0/24': 'subnet-B',
'192.168.2.0/24': 'subnet-C',
'192.168.3.0/24': 'subnet-C',
'192.168.6.0/24': 'subnet-D',
'192.168.7.0/24': 'subnet-D',
'192.168.9.0/24': 'subnet-E',
}
new_netsets = {}
for k,v in netsets.items():
   new_netsets['.'.join(k.split('.')[:3])] = v

for IP in ip_list:
   newIP = '.'.join(IP.split('.')[:3])
   print IP, new_netsets[newIP]

希望这有帮助。在

相关问题 更多 >