将列float64/int64转换为pandas datafram中类型为float/int的列

2024-09-26 22:53:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我想将pandas dataframe保存为Stata文件,但似乎存在一个问题,即列具有int64或{}类型,因此需要转换为标准Python类型int和{}。我找了很多东西,但没有找到解决问题的办法,因为没有一个办法对我有效。在

我试过使用类似的方法:

import numpy as np
def conversion(obj):
    if isinstance(obj, np.generic):
        return np.asscalar(obj)

mergeddfnew["speech_main_wordspersentcount_wc"]=mergeddfnew["speech_main_wordspersentcount_wc"].apply(conversion)

我也试过打字。列的类型始终保持不变。在


Tags: 文件obj类型dataframepandasmainnpspeech
1条回答
网友
1楼 · 发布于 2024-09-26 22:53:43

参见IO section of the docs

Stata data files have limited data type support; only strings with 244 or fewer characters, int8, int16, int32, float32 and float64 can be stored in .dta files. Additionally, Stata reserves certain values to represent missing data. Exporting a non-missing value that is outside of the permitted range in Stata for a particular data type will retype the variable to the next larger size. For example, int8 values are restricted to lie between -127 and 100 in Stata, and so variables with values above 100 will trigger a conversion to int16. nan values in floating points data types are stored as the basic missing data type (. in Stata).

不过,熊猫会尽力克服这些限制,为你转化:

The Stata writer gracefully handles other data types including int64, bool, uint8, uint16, uint32 by casting to the smallest supported type that can represent the data. For example, data with a type of uint8 will be cast to int8 if all values are less than 100 (the upper bound for non-missing int8 data in Stata), or, if values are outside of this range, the variable is cast to int16.

也就是说,你的专栏似乎不满足这些条件。在

我会尝试手动将其转换为dta支持的int32(假设它是int):

df["speech_main_wordspersentcount_wc"].astype(np.int32)
df["speech_main_wordspersentcount_wc"] = df["speech_main_wordspersentcount_wc"].astype(np.int32)

相关问题 更多 >

    热门问题