预先获取标题行数

2024-05-21 16:23:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个二进制文件,它由头行和二进制部分组成。ftp://n5eil01u.ecs.nsidc.org/SAN/GLAS/GLA06.034/2003.02.21/GLA06_634_1102_001_0079_3_01_0001.DAT

我必须知道标题行占用的行数。我怎么能事先知道它,这样我就可以把下面的值,以便转义标题部分。你知道吗

import numpy as np    
fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT' 

with open(fname,'rb') as fi:
    fi.seek (176,0) ##HERE I HAVE TO PUT

Tags: 文件org标题as二进制ftpfnamedat
3条回答

FWIW,文件的hexdump显示“二进制数据”似乎从0x35c0开始:

00001a20  39 3b 0a 67 41 53 50 5f  74 31 3d 20 39 39 30 37  |9;.gASP_t1= 9907|
00001a30  39 32 30 30 2e 30 30 30  30 30 30 30 3b 0a 67 6c  |9200.0000000;.gl|
00001a40  6f 62 41 76 53 72 66 50  72 65 73 32 3d 20 38 39  |obAvSrfPres2= 89|
00001a50  30 35 38 2e 39 35 32 33  36 33 37 3b 0a 67 41 53  |058.9523637;.gAS|
00001a60  50 5f 74 32 3d 20 39 39  31 30 30 38 30 30 2e 30  |P_t2= 99100800.0|
00001a70  30 30 30 30 30 30 3b 0a  20 20 20 20 20 20 20 20  |000000;.        |
00001a80  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00001ae0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000035c0  01 53 05 b0 05 e8 11 30  00 01 0a de 08 0b 00 00  |.S.....0........|
000035d0  ff ff ff 52 00 00 61 a8  00 00 c3 50 00 01 24 f8  |...R..a....P..$.|
000035e0  00 01 86 a0 00 01 e8 48  00 02 49 f0 00 02 ab 98  |.......H..I.....|
000035f0  00 03 0d 40 00 03 6e e8  00 03 d0 90 00 04 32 38  |...@..n.......28|
00003600  00 04 93 e0 00 04 f5 88  00 05 57 30 00 05 b8 d8  |..........W0....|
00003610  00 06 1a 80 00 06 7c 28  00 06 dd d0 00 07 3f 78  |......|(......?x|
00003620  00 07 a1 20 00 08 02 c8  00 08 64 70 00 08 c6 18  |... ......dp....|

显然,二进制数据前面有一堆0x00。作为一种启发,我们可以尝试定位该部分:

fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT'

with open(fname,'rb') as fi:
    while fi.read(1) != b'\x00': # skip text part
        pass
    while fi.read(1) == b'\x00': # skip 0x00
        pass

    # rewind 1 byte
    fi.seek(fi.tell()-1)

    print "Binary data starts at ", fi.tell()

一些警告:

  • 您肯定应该在这里添加一些“错误处理”。你知道吗
  • 这是相当脆弱的,因为我不知道任何关于该格式。你知道吗
  • 你不能找到一些规格或文件格式,以便有一个更强大的解决方案?你知道吗

从提供的read file routine

n_headers = long( read_header( i_file, 'NUMHEAD', error=error) )
recl= long( read_header( i_file, 'RECL', error=error) )
offset=long(recl*n_headers)
print,'offset=',offset
print,'recl   n_headers = ',recl,n_headers
str_vers = 'pv'+strtrim(string(ver1),2)+'_'+ $
             strtrim(string(ver2),2)
print, 'version=',str_vers

头大小似乎是recl*n_headers,其中这两个值是头两个头。所以:

fname = 'GLA06_634_1102_001_0079_3_01_0001.DAT'

with open(fname,'rb') as fi:
    recl = None
    numhead = None

    # Loop in case the required headers are not the first two one
    # and/or in wrong order
    for line in fi:
        if line.startswith('Recl='):
            recl = int(line[5:-2])
        if line.startswith('Numhead='):
            numhead = int(line[8:-2])

        if recl is not None and numhead is not None:
            break

    offset = recl*numhead

    print "Binary data starts at ", offset
    fi.seek(offset)

假设这是一个空白行,将文本与二进制分隔开:

skiprows = 0
for line in open(file):
    if line != '\n'
        skiprows += 1
    else:
        break

with open(fname, 'rb') as fi:
    fi.seek(skiprows, 0)

相关问题 更多 >