从lis设置数据类型

2024-09-29 23:15:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在读一个大文件,以节省内存。我需要为数据帧中的每一列指定数据类型。我想从已经为数据类型创建的列表中删除。你知道吗

import pandas as pd

headers=['Record Identifier','Respondent_ID','Agency Code','Loan Type','Property Type','Loan Purpose','Owner Occupancy',
         'Loan Amount','Preapprovals','Type of Action Taken','Metropolitan Statistical Area/Metropolitan Division','State Code',
         'County Code','Census Tract','Applicant Ethnicity','Co-applicant Ethnicity','Applicant Race: 1','Applicant Race: 2',
         'Applicant Race: 3','Applicant Race: 4','Applicant Race: 5','Co-applicant Race: 1','Co-applicant Race: 2',
         'Co-applicant Race: 3','Co-applicant Race: 4','Co-applicant Race: 5','Applicant Sex','Co-applicant Sex',
         'Applicant Income','Type of Purchaser','Denial Reason: 1','Denial Reason: 2','Denial Reason: 3','Rate Spread',
         'HOEPA Status','Lien Status','Population','Minority Population %','FFIEC Median Family Income',
         'Tract to MSA/MD Median Family Income %','Number of Owner Occupied Units','Number of 1- to 4-Family units']


dtypes=['int64','object','int64','int64','int64','int64','int64','int64','int64','int64','object','object','object','object',
        'int64','int64','int64','int64','int64','int64','int64','int64','int64','int64','int64','int64','int64','int64',
        'object','int64','int64','int64','int64','object','object','object','object','float64','int64','float64','int64',
        'int64']


df = pd.read_csv('2017_lar.txt', sep="|", header=None, names=headers, dtype=dtypes, nrows=100)

print(df)

错误: TypeError:无法理解数据类型


Tags: ofobjecttypecodefamily数据类型reasonco
1条回答
网友
1楼 · 发布于 2024-09-29 23:15:22

您使用的参数不正确。您只能指定一个类型名,或将列标题与类型匹配的dict。你知道吗

文件中明确说明了这一点:

dtype : Type name or dict of column -> type, optional

Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.


因为您传递的是一个列表,所以它假设整个列表都是数据类型,这是不可理解的。你知道吗


这是一个正确的用法。你知道吗

import io
import pandas as pd

i = io.StringIO("""
1|2|3
4|5|6
7|8|9
""")

headers = ['a', 'b', 'c']
dtypes = ['int64', 'object', 'int']

df = pd.read_csv(i, header=None, names=headers, sep='|', dtype=dict(zip(headers, dtypes)))

>>> df
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

>>> df.dtypes
a     int64
b    object
c     int32
dtype: object

相关问题 更多 >

    热门问题