将ascii文件中的稀疏矩阵读入python

2条回答

网友

1楼 · 编辑于 2024-09-27 22:32:15

我不熟悉PETSc或它的矩阵格式，但是给出了示例ASCII格式，在Python中肯定可以将其转换为任何其他矩阵格式。我假设文件中的每一行都包含一行，每一行中的数字对是列索引和相应的数字。对吗？你知道吗

你所认为的“优雅的方式”是一种个人观点，并不是一个真正有效的堆栈溢出问题，但我可以尝试为你指出一个工作解决方案的正确方向。你知道吗

首先，在我看来，在不了解所有细节的情况下，正确的问题应该是“为什么Fortran中的二进制输出和petsc4py中的二进制输入不兼容？”如果你能解决这个问题，那可能是最好的解决办法。如果我没记错的话，Fortran代码支持不同的字节顺序，默认情况下可能使用big-endian格式，而Python通常使用little-endian格式。也许您可以在其中一个库函数中指定字节顺序，或者在必要时手动转换字节顺序。这是你可能要先调查的事情。你知道吗

作为一种解决方法，您可以解析Python中的ASCII格式以进行进一步处理。我假设您已经搜索了现有的库，但找不到任何库，因此需要编写一些自定义代码。根据您的需要，“好的”解决方案将使用正则表达式，但是一种快速而肮脏的方法是使用标准字符串方法和eval()函数，因为ASCII格式已经非常类似于Python语法：-）

注意：只有当您信任输入文件时才使用eval()函数，因为它容易受到代码注入攻击！对于个人使用，这通常不是问题。你知道吗

下面我提供了一些示例代码。这就完成了基本的输入处理。您想对数据做什么取决于您自己，因此您需要自己完成代码。这个示例代码只是打印数字。你知道吗

def read_mpiaij(file):
    lines = file.read().splitlines()
    assert 'Mat Object: ' in lines[0]
    assert lines[1] == '  type: mpiaij'
    for line in lines[2:]:
        parts = line.split(': ')
        assert len(parts) == 2
        assert parts[0].startswith('row ')

        row_index = int(parts[0][4:])
        row_contents = eval(parts[1].replace(')  (', '), ('))

        # Here you have the row_index and a tuple of (column_index, value)
        # pairs that specify the non-zero contents. You could process this
        # depending on your needs, e.g. store the values in an array.
        for (col_index, value) in row_contents:
            print('row %d, col %d: %s' % (row_index, col_index, value))
            # TODO: Implement real code here.
            # You probably want to do something like:
            # data[row_index][col_index] = value


def main():
    with open('input.txt', 'rt', encoding='ascii') as file:
        read_mpiaij(file)


if __name__ == '__main__':
    main()

输出：

row 0, col 0: 0.934865
row 0, col 1: 0.00582401
row 0, col 2: -0.00125881
row 0, col 3: 0.000157352
row 0, col 10: 0.0212704
row 0, col 11: -9.37151e-05
row 0, col 12: 7.77296e-06
row 0, col 13: 1.15276e-06
row 0, col 20: -0.00457321
row 0, col 21: 9.31045e-06
row 0, col 22: -1.37541e-07
row 0, col 23: -3.00994e-07
row 0, col 30: 0.000571716
row 0, col 31: 5.82622e-07
row 0, col 32: -2.27908e-07
row 0, col 33: 4.55904e-08
row 0, col 3410: 0.0005718
row 0, col 3411: 3.14914e-06
row 0, col 3412: -5.83246e-07
row 0, col 3413: 5.58045e-08
row 0, col 3420: -0.00457491
row 0, col 3421: -3.91645e-05
row 0, col 3422: 6.62677e-06
row 0, col 3423: -5.10165e-07
row 0, col 3430: 0.0212818
row 0, col 3431: 0.000230778
row 0, col 3432: -3.75686e-05
row 0, col 3433: 2.57173e-06
...

网友
2楼 · 编辑于 2024-09-27 22:32:15

正则表达式是你的朋友。比如说：
for recnum, rec in enumerate(fh.readlines()): mat = re.match(r'row\s*(\d+):\s*(.*)', rec) if (not mat): raise IOError("Bad data at rec %d." % (recnum)) rowNum = int(mat.group(1)) rest = mat.group(2) lastColNum = -1 for col in re.finditer(r'\(\d+),\s*(\d+\.\d*\)', rest): colNum = int(mat.group(1)) if (colNum <= lastColNum): raise KeyError("colNum out of order at rec %d." % (colNum, recNum)) value = float(mat.group(2)) # save cell, like via numpy tbl[rowNum, colNum] = value
我假设每行中的列项目都是有序的。如果没有，或者如果有其他约束（例如，如果值必须在0.0…1.0，在您的示例中似乎是正确的），您当然可以调整。值得检查数据，因为数据很少像人们希望的那样干净。。。。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章