引用str's;unquote float's in pandas

2024-09-28 17:23:08 发布

您现在位置:Python中文网/ 问答频道 /正文

处理文件后不干净.csv

Date,Wave,Wavelength
2019-08-28,Theta,0.112358472
2019-08-27,Eta,571.5499015
2019-08-27,Lambda,286.4175921
2019-08-26,Iota,0.220237736

带着密码

import os
import csv
import pandas as pd

myfile = ('path/to/'
          'unclean.csv')

os.chdir(os.path.dirname(myfile))
df = pd.read_csv(os.path.basename(myfile))

df['Date'] = pd.to_datetime(df['Date'])
df[['Wave']] = df[['Wave']].astype(str)
df[['Wavelength']] = df[['Wavelength']].astype(float)

df.to_csv('clean.csv',
          float_format='%g',
          index=False,
          quotechar='"',
          quoting=csv.QUOTE_NONNUMERIC)

我得到输出清除.csv

"Date","Wave","Wavelength"
"2019-08-28","Theta","0.112358"
"2019-08-27","Eta","571.55"
"2019-08-27","Lambda","286.418"
"2019-08-26","Iota","0.220238"

这里引用了所有内容,尽管我已经特别地将列Wavelength的类型设置为float,并且作为to_csv的参数,我要求只引用非数字字段。你知道吗

我怎样才能引用字符串而不引用数字呢?你知道吗

许多讨论(例如: 1234) 建议quoting=csv.QUOTE_NONNUMERIC应该这样做。你知道吗

使用pandas==0.24.2unicodecsv==0.14.1,两者都来自anaconda-project==0.8.2。你知道吗

评论

瓦伦蒂诺的回答指出了问题所在,但我知道除了float_format='%g'之外别无选择

"Date","Wave","Wavelength"
"2019-08-28","Theta",0.11235847199999999
"2019-08-27","Eta",571.5499014999999
"2019-08-27","Lambda",286.41759210000004
"2019-08-26","Iota",0.22023773600000002

以避免引入9999990000001的喷洒。你知道吗


Tags: csvtolambdaimportdfdateosfloat
1条回答
网友
1楼 · 发布于 2024-09-28 17:23:08

来自pandas to_csv文档:

quoting : optional constant from csv module
Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.

(重点是我的)

只要删除float_format='%g'参数,您的float就不会被引用。你知道吗

编辑

据我所知,如果您需要格式化浮点数,没有直接的方法可以使用to_csv参数来实现您想要的结果。
但你仍然可以自己“伪造”格式。你知道吗

#make a new dataframe with formatted strings
ddf = df.applymap(lambda x : '{:g}'.format(x) if isinstance(x, float) else '"{}"'.format(x))

#write the new dataframe to csv, now using QUOTE_NOTE because we already added quote characters where needed
ddf.to_csv('clean.csv',
      index=False,
      quoting=csv.QUOTE_NONE)

clean.csv文件将如下所示:

Date,Wave,Wavelength
"2019-08-28 00:00:00","Theta",0.112358
"2019-08-27 00:00:00","Eta",571.55
"2019-08-27 00:00:00","Lambda",286.418
"2019-08-26 00:00:00","Iota",0.220238

相关问题 更多 >