numpy将数据转换为numpy数组

2024-09-30 06:19:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我从数据库中提取了一个数据,数据由“|”分隔,我试图将其加载到numpy数组中以执行一些过滤。例如,仅将第3列中包含注销的行保存到文件中。我从load example.txt文件开始使用:

import numpy as np


data = np.genfromtxt('example.txt',
                 skip_header=1,
                 skip_footer=1,
                 names=True,
                 dtype=None,
                 delimiter='|',
                 encoding='utf-8',
                 filling_values=None)

但我得到了一个错误:

ValueError: Some errors were detected !
Line #3 (got 14 columns instead of 13)
Line #4 (got 14 columns instead of 13)
Line #5 (got 14 columns instead of 13)

txt文件中的数据为:

|ID|TIMESTAMP|EVENT_DATE|GROUP|EVENT|CHANNEL|WERT|WERTY|WERTY|SESSION_ID|IP|WERT|DATA|
|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qwewqeq||weqeqewqewe
|5818222|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD||qweqe|weqeqewqewe
|5818222|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qweqe||weqeqewqewe
|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|||weqeqewqewe
|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|||weqeqewqewe
|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qweqwe|wqewqe|weqeqewqewe

每行包含的元素不超过13个。。我做错了什么


Tags: columns文件of数据txtsessionlinestandard
2条回答

如果您的数据在example.txt中,您可以执行以下操作:

with open('example.txt') as fp:
    lines = fp.read().splitlines()
data = [x.split('|')[1:] for x in lines][1:]

其中索引用于丢弃标题和空列。您将得到一个二维数组,其中包含文件中的数据。如果需要它作为Numpy数组,请执行np.array(data)

首先,问题只显示在第3,4,5行的原因是skip_headerskip_footer

在没有skip_footer的情况下:

import numpy as np


data = np.genfromtxt('example.txt',
                 skip_header=1,
                 names=True,
                 dtype=None,
                 delimiter='|',
                 encoding='utf-8',
                 filling_values=None)

错误:

    Line #3 (got 14 columns instead of 13)
    Line #4 (got 14 columns instead of 13)
    Line #5 (got 14 columns instead of 13)
    Line #6 (got 14 columns instead of 13)
    Line #7 (got 14 columns instead of 13)

因此,首先,skip_header值应该是0。 结果:

data = np.genfromtxt('example.txt',
                 names=True,
                 dtype=None,
                 delimiter='|',
                 encoding='utf-8',
                 filling_values=None)

结果:

array([(False, 5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qwewqeq', '', 'weqeqewqewe'),
       (False, 5818222, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', 'qweqe', 'weqeqewqewe'),
       (False, 5818222, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qweqe', '', 'weqeqewqewe'),
       (False, 5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', '', 'weqeqewqewe'),
       (False, 5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', '', 'weqeqewqewe'),
       (False, 5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qweqwe', 'wqewqe', 'weqeqewqewe')],
      dtype=[('ID', '?'), ('TIMESTAMP', '<i4'), ('EVENT_DATE', '<U25'), ('GROUP', '<U10'), ('EVENT', '<U6'), ('CHANNEL', '<U14'), ('WERT', '?'), ('WERTY', '<U15'), ('WERTY_1', '<U15'), ('SESSION_ID', '<U8'), ('IP', '<U22'), ('WERT_1', '<U7'), ('DATA', '<U6'), ('f0', '<U11')])

第一列值Falsedtype错误的原因是txt文件的第一行包含的分隔符比其他行多

>>>line0= "|ID|TIMESTAMP|EVENT_DATE|GROUP|EVENT|CHANNEL|WERT|WERTY|WERTY|SESSION_ID|IP|WERT|DATA|"
>>>line1 = 
"|5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qwewqeq||weqeqewqewe"
>>>delimiter  = '|'
>>>line0.count(delimiter)
14
>>>line1.count(delimiter)
13

解决方案: 对于1个分隔符,我们有2个信息,这里有13个信息,所以我们只需要12个分隔符,最后: txt文件:

ID|TIMESTAMP|EVENT_DATE|GROUP|EVENT|CHANNEL|WERT|WERTY|WERTY|SESSION_ID|IP|WERT|DATA
5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qwewqeq||weqeqewqewe
5818222|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD||qweqe|weqeqewqewe
5818222|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qweqe||weqeqewqewe
5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGOUT|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|||weqeqewqewe
5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|||weqeqewqewe
5818221|2021-03-15T18:18:20+01:00|2021-03-15|LOGIN|SESSION-EXPIRE||qweqwewqewqewqe|qweqewqewqwqeqw|STANDARD|lAkpligg11Ds9nJGFRPdeD|qweqwe|wqewqe|weqeqewqewe

代码:

data = np.genfromtxt('d2.txt',names=True,dtype=None,delimiter='|',encoding='utf-8',filling_values=None,skip_header=0)

结果:

array([(5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qwewqeq', '', 'weqeqewqewe'),
       (5818222, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', 'qweqe', 'weqeqewqewe'),
       (5818222, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qweqe', '', 'weqeqewqewe'),
       (5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGOUT', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', '', 'weqeqewqewe'),
       (5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', '', '', 'weqeqewqewe'),
       (5818221, '2021-03-15T18:18:20+01:00', '2021-03-15', 'LOGIN', 'SESSION-EXPIRE', False, 'qweqwewqewqewqe', 'qweqewqewqwqeqw', 'STANDARD', 'lAkpligg11Ds9nJGFRPdeD', 'qweqwe', 'wqewqe', 'weqeqewqewe')],
      dtype=[('ID', '<i4'), ('TIMESTAMP', '<U25'), ('EVENT_DATE', '<U10'), ('GROUP', '<U6'), ('EVENT', '<U14'), ('CHANNEL', '?'), ('WERT', '<U15'), ('WERTY', '<U15'), ('WERTY_1', '<U8'), ('SESSION_ID', '<U22'), ('IP', '<U7'), ('WERT_1', '<U6'), ('DATA', '<U11')])

相关问题 更多 >

    热门问题