如何在python中格式化文本文件中的行

2024-09-26 22:51:53 发布

您现在位置:Python中文网/ 问答频道 /正文

原始txt文件:

M:org.apache.mahout.math.list.IntArrayListTest:testReplaceFromToWith() (S)org.apache.mahout.math.list.IntArrayListTest:assertEquals(long,long)
M:org.apache.mahout.math.list.IntArrayListTest:testRetainAllSmall() (O)org.apache.mahout.math.list.IntArrayList:<init>()
M:org.apache.mahout.common.RandomUtilsTest:testNextTwinPrime() (S)org.apache.mahout.common.RandomUtils:nextTwinPrime(int)
M:org.apache.mahout.math.map.OpenLongCharHashMapTest:testValues() (M)org.apache.mahout.math.list.CharArrayList:size()

我在一个文本文件中有超过50k行这样的内容,如何从.txt文件中读取并用Python将其格式化为下面所示的格式

原始字符串并不总是相同的(方法可以从不同的类继承),因此简单的替换将不起作用

所需格式:

IntArrayListTest:testReplaceFromToWith() IntArrayListTest:assertEquals(long,long)
IntArrayListTest:testRetainAllSmall() list.IntArrayList:<init>()
RandomUtilsTest:testNextTwinPrime() RandomUtils:nextTwinPrime(int)
OpenLongCharHashMapTest:testValues() CharArrayList:size()

Tags: 文件orgtxtinitapachemathcommonlong
2条回答

试试这个:

with open('input.txt') as fp:
    res = '\n'.join([' '.join([x.split('.')[-1] for x in line.strip().split()]) for line in fp])
print(res)

输出:

IntArrayListTest:testReplaceFromToWith() IntArrayListTest:assertEquals(long,long)
IntArrayListTest:testRetainAllSmall() IntArrayList:<init>()
RandomUtilsTest:testNextTwinPrime() RandomUtils:nextTwinPrime(int)
OpenLongCharHashMapTest:testValues() CharArrayList:size()
FunctionTest:testIsDensifying() DoubleDoubleFunction:isDensifying()

另一种方法:

with open('input.txt') as fp:
    res = ''
    for line in fp:
        x, y = line.strip().split()
        x, y = x.split(':'), y.split(':')
        x = x[1].split('.')[-1] + ':' + x[-1]
        y = y[0].split('.')[-1] + ':' + y[-1]
        res += x + ' ' + y + '\n'
print(res)

输出:

IntArrayListTest:testReplaceFromToWith() IntArrayListTest:assertEquals(long,long)
IntArrayListTest:testRetainAllSmall() IntArrayList:<init>()
RandomUtilsTest:testNextTwinPrime() RandomUtils:nextTwinPrime(int)
OpenLongCharHashMapTest:testValues() CharArrayList:size()
FunctionTest:testIsDensifying() DoubleDoubleFunction:isDensifying()
VectorBinaryAssignTest:testAll() DoubleDoubleFunction:apply(double,double)
VectorBinaryAssignTest:testAll() PrintStream:printf(java.lang.String,java.lang.Object[])

如果要删除的文本始终与上述内容完全相同。你可以做一个简单的替换

with open("in.txt", "r") as f:
    for line in f.readlines():
        new_line = line.strip().replace("M:org.apache.mahout.math.list.","").\
                   replace("(S)org.apache.mahout.math.list.","").\
                   replace("(O)org.apache.mahout.math.list.","")

        print(new_line)

或者如果总是存在“.Max.List.””在函数和“())中间,可以使用SPL:

with open("in.txt", "r") as f:
    for line in f.readlines():
        new_line = line.strip().split(".math.list.")[1:]
        new_line = new_line[0].split("() ")[0]+"() "+ new_line[1]
        print(new_line)

否则,正则表达式是最好的选择

输出:

IntArrayListTest:testReplaceFromToWith() IntArrayListTest:assertEquals(long,long)
IntArrayListTest:testRetainAllSmall() IntArrayList:<init>()
IntArrayListTest:testRemoveAll() IntArrayListTest:assertEquals(long,long)

相关问题 更多 >

    热门问题