以尽可能快的速度在python中导入大型tecplot块文件

import re from numpy import zeros, array, prod def vectorr(I, J, K): """function""" vect = [] for k in range(0, K): for j in range(0, J): for i in range(0, I): vect.append([i, j, k]) return vect a = open('E:\u.dat') filelist = a.readlines() NumberCol = 6 count = 0 data = dict() leng = len(filelist) countzone = 0 while count < leng: strVARIABLES = re.findall('VARIABLES', filelist[count]) variables = re.findall(r'"(.*?)"', filelist[count]) countzone = countzone+1 data[countzone] = {key:[] for key in variables} count = count+1 strI = re.findall('I=....', filelist[count]) strI = re.findall('\d+', strI[0]) I = int(strI[0]) ## strJ = re.findall('J=....', filelist[count]) strJ = re.findall('\d+', strJ[0]) J = int(strJ[0]) ## strK = re.findall('K=....', filelist[count]) strK = re.findall('\d+', strK[0]) K = int(strK[0]) data[countzone]['indmax'] = array([I, J, K]) pr = prod(data[countzone]['indmax']) lin = pr // NumberCol if pr%NumberCol != 0: lin = lin+1 vect = vectorr(I, J, K) for key in variables: init = zeros((I, J, K)) for ii in range(0, lin): count = count+1 temp = map(float, filelist[count].split()) for iii in range(0, len(temp)): init.itemset(tuple(vect[ii*6+iii]), temp[iii]) data[countzone][key] = init count = count+1

2条回答

网友

1楼 · 编辑于 2024-10-01 15:35:52

将一大串字符串转换为数字总是有点慢，但是假设三重嵌套for循环是这里的瓶颈，也许将其更改为以下内容可以使您获得足够的加速：

# add this line to your imports
from numpy import fromstring

# replace the nested for-loop with:
count += 1
for key in variables:
    str_vector = ' '.join(filelist[count:count+lin])
    ar = fromstring(str_vector, sep=' ')
    ar = ar.reshape((I, J, K), order='F')

    data[countzone][key] = ar 
    count += lin

不幸的是，目前我只能使用我的智能手机（没有电脑），所以我无法测试这有多快，或者即使它工作正常或根本没有！在

更新

最后我开始做一些测试：

我的代码包含了一个小错误，但它现在似乎可以正常工作了。在
修改后的代码运行速度比原始代码快4倍左右
您的代码将大部分时间花在ndarray.itemset上，并且可能会花费循环开销和浮点转换。不幸的是，cProfile并没有详细说明这一点。。在
改进后的代码在numpy.fromstring上花费了大约70%的时间，在我看来，这表明这种方法对于使用Python/NumPy可以实现的目标相当快。在

更新2

当然，更好的方法是遍历文件，而不是一次加载所有内容。在这种情况下，这稍微快一点（我试过了），并显著减少了内存使用。您还可以尝试使用多个CPU内核来加载和转换为float，但是这样就很难将所有数据都放在一个变量下。最后一点警告：我使用的fromstring方法与字符串的长度不符。E、 g.从某个字符串长度开始，使用np.fromiter(itertools.imap(float, str_vector.split()), dtype=float)之类的东西会更有效。在

网友

2楼 · 编辑于 2024-10-01 15:35:52

如果在这里使用正则表达式，有两件事我会改变：

编译使用频率更高的REs（我猜这适用于您示例中的所有REs）。对它们执行regex=re.compile("<pattern>")，并将结果对象与match=regex.match()一起使用，如the Python documentation中所述。
对于I，J，K REs，考虑使用分组特性（如上所述），通过搜索“I=（\d+）”形式的模式，并使用regex.group(1)抓取括号内匹配的部分，将两个REs减少为一个。更进一步，您可以定义一个正则表达式来在一个步骤中捕获所有三个变量。

至少在开始部分时，REs似乎有点过头了：在需要查找的字符串中没有变化，string.find()就足够了，而且在这种情况下可能更快。在

编辑：我刚刚看到你已经在变量中使用分组。。。在

相关问题更多 >

编程相关推荐

热门问题

热门文章