如何将元组列表转换为更优化的mann中的原始列表

2024-10-01 00:27:07 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的文本文件,每行包含一个元组:

(1, 2)
(3, 4)
(5, 6)

阅读上述文件并生成如下结构的列表时,最粗略和优化的视角是什么

[[1,2],[3,4],[5,6]]

这是我目前的做法,不是我真正想要的:

with open("agentListFile.txt") as f:
        agentList = [agentList.rstrip('\n') for line in f.readlines()]

Tags: 文件txt列表foraswithopen结构
3条回答

这是迄今为止我能想出的最快的解决方案。你知道吗

def re_sol1():
    ''' re.findall on whole file w/ capture groups '''
    with open('agentListFile.txt') as f:
        numpairs = [[int(numstr)
            for numstr in numpair]
            for numpair in re.findall(r'(\d+), (\d+)', f.read())]
        return numpairs

它利用了^{}和所有值都是正整数这一事实。通过将正则表达式中的捕获组与re.findall结合使用,可以有效地获取正整数字符串对,并将它们映射到列表中的整数

也可以使用r'-?\d+'作为正则表达式来处理负整数。你知道吗


当我在Python 2.7.6 Linux默认版本上运行以下代码时,似乎可以看出re_sol1是最快的:

with open('agentListFile.txt', 'w') as f:
    for tup in zip(range(1, 1001), range(1, 1001)):
        f.write('{}\n'.format(tup))

funcs = []
def test(func):
    funcs.append(func)
    return func

import re, ast

@test
def re_sol1():
    ''' re.findall on whole file w/ capture groups '''
    with open('agentListFile.txt') as f:
        numpairs = [[int(numstr)
            for numstr in numpair]
            for numpair in re.findall(r'(\d+), (\d+)', f.read())]
        return numpairs

@test
def re_sol2():
    ''' naive re.findall on whole file '''
    with open('agentListFile.txt') as f:
        nums = [int(numstr) for numstr in re.findall(r'\d+', f.read())]
        numpairs = [nums[i:i+2] for i in range(0, len(nums), 2)]
        return numpairs

@test
def re_sol3():
    ''' re.findall on whole file w/ str.split '''
    with open('agentListFile.txt') as f:
        numpairs = [[int(numstr) 
            for numstr in numpair.split(', ')] 
            for numpair in re.findall(r'\d+, \d+', f.read())]
        return numpairs

@test
def re_sol4():
    ''' re.finditer on whole file '''
    with open('agentListFile.txt') as f:
        match_iterator = re.finditer(r'(\d+), (\d+)', f.read())
        numpairs = [[int(ns) for ns in m.groups()] for m in match_iterator]
        return numpairs

@test
def re_sol5():
    ''' re.match line by line '''
    with open('agentListFile.txt') as f:
        numpairs = [[int(ns) 
            for ns in re.match(r'\((\d+), (\d+)', line).groups()] 
            for line in f]
        return numpairs

@test
def re_sol6():
    ''' re.search line by line '''
    with open('agentListFile.txt') as f:
        numpairs = [[int(ns) 
            for ns in re.search(r'(\d+), (\d+)', line).groups()] 
            for line in f]
        return numpairs

@test
def sss_sol1():
    ''' strip, slice, split line by line '''
    with open("agentListFile.txt") as f:
        agentList = [map(int, line.strip()[1:-1].split(', ')) for line in f]
        return agentList

@test
def ast_sol1():
    ''' ast.literal_eval line by line '''
    with open("agentListFile.txt") as f:
        agent_list = [list(ast.literal_eval(line)) for line in f]
        return agent_list

### Begin tests ###

def all_equal(iterable):
    try:
        iterator = iter(iterable)
        first = next(iterator)
        return all(first == rest for rest in iterator)
    except StopIteration:
        return True

if all_equal(func() for func in funcs):
    from timeit import Timer

    def print_timeit(func, cnfg={'number': 1000}):
        print('{}{}'.format(Timer(func).timeit(**cnfg), func.__doc__))

    for func in funcs:
        print_timeit(func)
else:
    print('At least one of the solutions is incorrect.')

单个运行的输出示例:

1.50156712532 re.findall on whole file w/ capture groups 
1.53699707985 naive re.findall on whole file 
1.71362090111 re.findall on whole file w/ str.split 
1.97333717346 re.finditer on whole file 
3.36241197586 re.match line by line 
3.59856200218 re.search line by line 
1.71777415276 strip, slice, split line by line 
12.8218641281 ast.literal_eval line by line 

您可以使用ast.literal_eval安全地计算元组并将这些元组转换为list comp内的列表,例如:

import ast
with open("agentListFile.txt") as f:
    agent_list = [list(ast.literal_eval(line)) for line in f]

有关更多信息,请阅读doc of ^{}this thread。你知道吗

下面的代码依赖于这样的假设,即您的行遵循相同的格式(number1, number2)

def strip_slice_split_solution():
    with open("agentListFile.txt") as f:
        agentList = [map(int, line.strip()[1:-1].split(', ')) for line in f]
        return agentList    

s[1:-1]将省略s的第一个和最后一个字符(括号)。你知道吗

我将Shashank's solution(从函数中删除了import)和Jon's solution以及我的放入一个文件,并决定做一些测试。我生成了一些带有5000-1000行的文件来进行测试。你知道吗

测试摘录

In [3]: %timeit re_solution()
100 loops, best of 3: 2.3 ms per loop

In [4]: %timeit strip_slice_split_solution()
100 loops, best of 3: 2.28 ms per loop

In [5]: %timeit ast_solution()
100 loops, best of 3: 14.1 ms per loop

所有3个函数产生相同的结果

In [6]: ast_solution() == re_solution() == strip_slice_split_solution()
Out[6]: True

相关问题 更多 >