按列对文件排序并获取uniq元素

2024-09-23 22:32:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我想根据文件的内容对原始文件进行排序,并在该列中获得唯一的元素:

原始文件:

qoow_12_xx7_21  wer1    rwty3
asss_x17_211    aqe3    sda4
acyi_112_werxc  xcu12   weqa1
qwer_234_ssd    aqe3    wers

输出排序数据:

asss_x17_211    aqe3    sda4
qwer_234_ssd    aqe3    wers
qoow_12_xx7_21  wer1    rwty3
acyi_112_werxc  xcu12   weqa1

输出唯一列2:

aqe3
wer1
xcu12

我的尝试无效代码:

from operator import itemgetter
import itemgetter


def get_unique(data):
    seen=""
    for e in data:
        if e not in seen:
            seen="\t".join(seen) 
    return seen

col2=""
with open("myfile.txt", "r") as infile, open("out.xls","w") as outfile:
    for line in infile:
        data=line.rstrip.split("\t")
        sorted_data=sorted(data, key=lambda e: e.itemgetter)
        col2="".join(data[1])
    uniq_col2=get_unique(col2)
    outfile.write(sorted_data)# tab-delimited sorted data
    outfile.write(uniq_col2) # sorted column 2 data

有人能帮我把这个代码写出来吗。谢谢


Tags: 文件indata排序outfilecol2sortedseen
1条回答
网友
1楼 · 发布于 2024-09-23 22:32:41

试试这个:

from operator import itemgetter

with open('test.txt') as infile, open('out.txt', 'w') as outfile:
    # sort input by 2nd column
    sorted_lines = sorted(
        (line.strip().split() for line in infile),
        key=itemgetter(1)
        )

    # output sorted input
    for line in sorted_lines:
        outfile.write('\t'.join(line))
        outfile.write('\n')

    # discard duplicates in already sorted sequence => uniq items
    prev_item = None
    for item in (line[1] for line in sorted_lines):
        if item != prev_item:
            prev_item = item
            outfile.write(item)
            outfile.write('\n')

相关问题 更多 >