删除fi中特定位置上的相同坐标

2024-06-29 00:48:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个x/y坐标的文件,我正在整理。该文件包含各种信息,但坐标在一条直线内的同一位置,如下所示:

IMPORTANT information 12213   1541515      COORDINATEX.COORDINATEY
IMPORTANT assadad213114141 asdadad         COORDINATEX.COORDINATEY
IMPORTANT assadad2ssss4141 asdadad         COORDINATEX.COORDINATEY
IMPORTANT ass 141 asd135566666666d         COORDINATEX.COORDINATEY

我要的是去掉所有坐标线(坐标)是相同的,除第一个字符外,前10个标记为“重要”的字符是相同的。我曾尝试在unix中使用sort-u,但这行不通,因为整个行都需要相同,这里不是这样。你知道吗

示例:

IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE1 fsafasdasd!38aaa!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1
IMPORTANTLINE2 sadasda333333333dadadada COORDINATEX.COORDINATE1

应该是这样的:

IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1

提前谢谢!你知道吗


Tags: 文件信息字符整理importantcoordinatexcoordinateycoordinate1
3条回答

对于从文件中读取的每一行,将定义重复项的部分分割成一个字符串。检查一个集合,看它是否包含字符串,如果不包含,则将行写入输出并将字符串放入集合中。你知道吗

所以,每行有四个字段,用空格隔开。在第二个领域-是吗?你知道吗

lines = []
found_lines = set()
with open("mydatafile.dat", "rt") as data_file:
   for line in data_file:
       #avoid stopping on blank lines (usually the last line in the file is blank)
       if not line.strip(): continue
       # separate fields
       imp, field1, x, y = line.split()
       #separate significative chars in field1:
       field1 = field1[1:10]  # "first 10 chars, except first"
       if (field1, x, y) in found_lines:
            continue
       found_lines.add(field1, x ,y)
       lines.append(line)

我想是这样的:

import re

data='''
IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE1 fsafasdasd!38aaa!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1
IMPORTANTLINE2 sadasda333333333dadadada COORDINATEX.COORDINATE1
'''
d={}
data_out=[]

for i,line in enumerate(data.split('\n')):
    m=re.search(r'^(IMPORTANTLINE\d+).*(COORDINATEX)\.(COORDINATE(Y)?\d+)',line)
    if m:
        h=m.group(1)+m.group(2)+m.group(3)
        if h not in d:
            d[h]=i
            data_out.append(line)

for line in data_out:
    print line  

输出:

IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1

相关问题 更多 >