防止pandas在read-cs中自动推断类型

In [149]: d = pandas.read_csv('resources/names/fos_names.csv', sep='#', header=None, names=['int_field', 'floatlike_field', 'str_field']) In [150]: d Out[150]: <class 'pandas.core.frame.DataFrame'> Int64Index: 1673 entries, 0 to 1672 Data columns: int_field 1673 non-null values floatlike_field 1673 non-null values str_field 1673 non-null values dtypes: float64(1), int64(1), object(1)

2条回答

网友

1楼 · 编辑于 2024-09-27 07:35:24

我认为您最好的办法是首先使用numpy作为记录数组读取数据。

# what you described:
In [15]: import numpy as np
In [16]: import pandas
In [17]: x = pandas.read_csv('weird.csv')

In [19]: x.dtypes
Out[19]: 
int_field            int64
floatlike_field    float64  # what you don't want?
str_field           object

In [20]: datatypes = [('int_field','i4'),('floatlike','S10'),('strfield','S10')]

In [21]: y_np = np.loadtxt('weird.csv', dtype=datatypes, delimiter=',', skiprows=1)

In [22]: y_np
Out[22]: 
array([(1, '2.31', 'one'), (2, '3.12', 'two'), (3, '1.32', 'three ')], 
      dtype=[('int_field', '<i4'), ('floatlike', '|S10'), ('strfield', '|S10')])

In [23]: y_pandas = pandas.DataFrame.from_records(y_np)

In [25]: y_pandas.dtypes
Out[25]: 
int_field     int64
floatlike    object  # better?
strfield     object

网友

2楼 · 编辑于 2024-09-27 07:35:24

我计划在即将到来的pandas 0.10文件解析器引擎大修中添加显式列数据类型。我不能百分之百的投入，但是随着新的基础设施的出现，它应该变得非常简单（http://wesmckinney.com/blog/？p=543）。

相关问题更多 >

编程相关推荐

热门问题

热门文章