基于分组变量从文件加载列表列表?

2024-09-25 18:27:20 发布

您现在位置:Python中文网/ 问答频道 /正文

如果我有文件:

A pgm1
A pgm2
A pgm3
Z pgm4
Z pgm5
C pgm6
C pgm7
C pgm8
C pgm9

如何创建列表:

[['pgm1','pgm2','pgm3'],['pgm4','pgm5'],['pgm6','pgm7','pgm8','pgm9']]

我需要保留加载文件中的原始顺序。所以[pgm4,pgm5]必须是第二个子列表。你知道吗

我的偏好是,当分组变量从上一个变量更改为“A,Z,C”时,会触发新的子列表。但是如果分组变量必须是连续的,我可以接受,即“1,2,3”。你知道吗

(这是为了支持同时运行每个子列表中的程序,但要等待所有上游程序完成后才能继续下一个列表。)

我正在使用python2.6.6运行rhel2.6.32


Tags: 文件程序列表顺序pgm1pgm4pgm8pgm7
2条回答

只需使用^{}。你知道吗

代码:

import collections
d = collections.defaultdict(list)

infile = 'filename'
with open(infile) as f:
    a = [i.strip() for i in f]

a = [i.split() for i in a]

for key, value in a:
    d[key].append(value)

l = list(d.values())

演示:

>>> import collections
>>> d = collections.defaultdict(list)

>>> infile = 'filename'
>>> with open(infile) as f:
...     a = [i.strip() for i in f]

>>> a = [i.split() for i in a]
>>> a
[['A', 'pgm1'], ['A', 'pgm2'], ['A', 'pgm3'], ['Z', 'pgm4'], ['Z', 'pgm5'], ['C', 'pgm6'], ['C', 'pgm7'], ['C', 'pgm8'], ['C', 'pgm9']]

>>> for key, value in a:
...     d[key].append(value)

>>> d
defaultdict(<class 'list'>, {'A': ['pgm1', 'pgm2', 'pgm3'], 'C': ['pgm6', 'pgm7', 'pgm8', 'pgm9'], 'Z': ['pgm4', 'pgm5']})

>>> d.values()
dict_values([['pgm1', 'pgm2', 'pgm3'], ['pgm6', 'pgm7', 'pgm8', 'pgm9'], ['pgm4', 'pgm5']])

>>> list(d.values())
[['pgm1', 'pgm2', 'pgm3'], ['pgm6', 'pgm7', 'pgm8', 'pgm9'], ['pgm4', 'pgm5']]
>>> 

blow代码执行与上述代码相同的操作,但保持顺序:

infile = 'filename'
with open(infile) as f:
    a = [i.strip() for i in f]

a = [i.split() for i in a]

def orderset(seq):
    seen = set()
    seen_add = seen.add
    return [ x for x in seq if not (x in seen or seen_add(x))]

l = []
for i in orderset([i[0] for i in a]):
    l.append([j[1] for j in a if j[0] == i])

在我的操作之后,额外的网络搜索发现:How do I use Python's itertools.groupby()?

这是我目前的做法。请告诉我能不能让它更像Python。你知道吗

loadfile1.txt(无分组变量-输出与loadfile4.txt相同):

pgm1
pgm2
pgm3

pgm4
pgm5

pgm6
pgm7
pgm8
/a/path/with spaces/pgm9

loadfile2.txt(随机分组变量):

10, pgm1
10, pgm2
10, pgm3

ZZ, pgm4
ZZ, pgm5

-5, pgm6
-5, pgm7
-5, pgm8
-5, /a/path/with spaces/pgm9

loadfile3.txt(相同的分组变量-无依赖关系-多线程):

,pgm1
,pgm2
,pgm3

,pgm4
,pgm5

,pgm6
,pgm7
,pgm8
,/a/path/with spaces/pgm9

loadfile4.txt(不同的分组变量-依赖项-单线程):

1, pgm1
2, pgm2
3, pgm3

4, pgm4
5, pgm5

6, pgm6
7, pgm7
8, pgm8
9, /a/path/with spaces/pgm9

我的Python脚本:

#!/usr/bin/python

# See https://stackoverflow.com/questions/4842057/python-easiest-way-to-ignore-blank-lines-when-reading-a-file

# convert file to list of lines, ignoring any blank lines
filename = 'loadfile2.txt'

with open(filename) as f_in:
    lines = filter(None, (line.rstrip() for line in f_in))

print(lines)

# convert list to a list of lists split on comma
lines = [i.split(',') for i in lines]
print(lines)

# create list of lists based on the key value (first item in sub-lists)
listofpgms = []
for key, group in groupby(lines, lambda x: x[0]):
    pgms = []
    for pgm in group:
        try:
            pgms.append(pgm[1].strip())
        except IndexError:
            pgms.append(pgm[0].strip())

    listofpgms.append(pgms)

print(listofpgms)

使用loadfile2.txt时输出:

['10, pgm1', '10, pgm2', '10, pgm3', 'ZZ, pgm4', 'ZZ, pgm5', '-5, pgm6', '-5, pgm7', '-5, pgm8', '-5, /a/path/with spaces/pgm9']
[['10', ' pgm1'], ['10', ' pgm2'], ['10', ' pgm3'], ['ZZ', ' pgm4'], ['ZZ', ' pgm5'], ['-5', ' pgm6'], ['-5', ' pgm7'], ['-5', ' pgm8'], ['-5', ' /a/path/with spaces/pgm9']]
[['pgm1', 'pgm2', 'pgm3'], ['pgm4', 'pgm5'], ['pgm6', 'pgm7', 'pgm8', '/a/path/with spaces/pgm9']]

相关问题 更多 >