如何将文本文件中的输入格式化为python中的defaultdict

2024-09-27 20:19:37 发布

您现在位置:Python中文网/ 问答频道 /正文

文本文件有超过50K行使用此格式

M:org.apache.mahout.common.RandomUtilsTest:testHashDouble():['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloat():['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction():['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction2():['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']

如何将这些数据读取并格式化到字典中,以便[]中的所有方法都是单独的值,并且[](测试方法)之前的字符串是键?如何在将它们存储为字典中的值之前删除“”

下面是用于填充文本文件的代码。现在,我尝试获取该txt文件数据并将其读入/解析回另一个字典

    d = {}
    with open("filtered.txt") as input:
        for line in input:
            (key, val) = line.strip().split(" ")
            if str(key) in d:
                d[str(key)].append(val)
            else:
                d[str(key)] = [val]

    keys = []
    for key in d:
        keys.append(key)

    keys.sort()

    input.close()

    with open('mahout-coverage.txt', 'w') as outfile:
        for key in keys:
            outfile.writelines('{}:{}'.format(key, d[key]) + "\n")


Tags: keyinorglangapachemathjavacommon
2条回答

使用ast.literal_eval可以将字符串列表转换为list

from collections import defaultdict
import ast
with open('tst.txt') as fp:
    d = defaultdict(list)
    for line in fp:
        k, v = line[: line.index('):') + 1], ast.literal_eval(line[line.index(':[') + 1:])
        d[k] += v
print(dict(d))

输出:

{
M:org.apache.mahout.common.RandomUtilsTest:testHashDoubl :  ['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloa :  ['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunctio :  ['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction :  ['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
}

json模块可用于将python字典存储到文件中,然后加载该文件,并在将其写入文件之前将其解析为相同的数据类型

d = {}
with open('filtered.txt') as input:
    for line in input:
        key, value = line.strip().split("():")
        key = "{}()".format(key)
        d[key] = value

print(d)

# It would be better and easy if you write the data to the file using json module
import json

with open('data.txt', 'w') as json_file:
  json.dump(d, json_file)

# Later you can read the file using the json module itself
with open('data.txt') as f:
  # this data would be a dicitonay which can be easily managed.
  data = json.load(f)

参考:json.dump()json.load()

相关问题 更多 >

    热门问题