在python中使用csv.DictReader进行数据类型转换的最快方法

3条回答

网友

1楼 · 编辑于 2024-06-01 09:18:00

有两个截然不同的东西： “数据源”和“数据表”。

“数据源”是Google Visualization API服务器作为可视化web服务提供的格式化数据的名称：

This page describes how you can implement a data source to feed data
to visualizations built on the Google Visualization API. 

http://code.google.com/intl/fr/apis/visualization/documentation/dev/implementing_data_source.html

“数据源”包括“有线协议”的概念：

In response [to a request], the data source returns properly formatted data 
that the visualization can use to render the graphic on the page. 
This request-response protocol is known as the Google Visualization API wire protocol,

http://code.google.com/intl/fr/apis/visualization/documentation/dev/implementing_data_source_overview.html

要实现“数据源”，有两种可能：

• Use one of the data source libraries listed in the Data Sources and Tools Gallery. 
All the data source libraries listed on that page implement the wire protocol.

• Write your own data source from scratch, 

http://code.google.com/intl/fr/apis/visualization/documentation/dev/implementing_data_source_overview.html

从以下方面：

• ... Data Sources and Tools Gallery : (....) You therefore need write only the
code needed to make your data available to the library in the form of a data table. 

• Write your own data source from scratch, as described in the
Writing your own Data Source

我明白，从头开始，我们需要实现我们自己的wire协议+创建一个“数据表”，而对于一个数据源库，我们只需要创建“数据表”。

有关于创建“数据源”的页面

http://code.google.com/intl/fr/apis/visualization/documentation/dev/implementing_data_source_overview.html

http://code.google.com/intl/fr/apis/visualization/documentation/dev/gviz_api_lib.html

在我看来，地址http://groups.google.com/group/google-visualization-api/browse_thread/thread/9d1d941e0f0b32ed的例子是关于创建一个“数据源”的，这里的答案是可疑的。但我不太清楚。

但这些页面和主题并不是你感兴趣的，事实上，如果我理解的很好，你会想知道如何准备数据，即所谓的“数据表”，通过“数据源”提供服务，而不是“数据源”的构建。

3.Prepare your data. You'll need to prepare the data to visualize; 
this means either specifying the data yourself in code, 
or querying a remote site for data.

http://code.google.com/intl/fr/apis/visualization/documentation/using_overview.html#keycomponents

A visualization stores the data that it visualizes as two-dimensional data table with 
rows and columns.
Cells are referenced by (row, column) where row is a zero-based row number, and column
is either a zero-based column index or a unique ID that you can specify. 

http://code.google.com/intl/fr/apis/visualization/documentation/using_overview.html#preparedata

所以，编制“数据表”是关键。

这里是：

There are two ways to create/populate your visualization's data table:

•Query a data provider. A data provider is another site that returns
a populated DataTable in response to a request from your code. 
Some data providers also accept SQL-like query strings to sort or 
filter the data. See Data Queries for more information and an example
of a query.

•Create and populate your own DataTable by hand. You can populate your
DataTable in code on your page. The simplest way to do this is to create
a DataTable object without any data and populate it by calling addRows()
on it. You can also pass a JavaScript literal representation of the data
table into the DataTable constructor, but this is more complex and is
covered on the reference page.

http://code.google.com/intl/fr/apis/visualization/documentation/using_overview.html#preparedata

更多信息请参见：

2. Describe your table schema
The table schema is specified by the table_description parameter
passed into the constructor. You cannot change it later. 
The schema describes all the columns in the table: the data type of
each column, the ID, and an optional label.

Each column is described by a tuple: (ID [,data_type [,label [,custom_properties]]]). 



The table schema is a collection of column descriptor tuples. 
Every list member, dictionary key or dictionary value must be either 
another collection or a descriptor tuple. You can use any combination 
of dictionaries or lists, but every key, value, or member must
eventually evaluate to a descriptor tuple. Here are some examples.

•List of columns: [('a', 'number'), ('b', 'string')]
•Dictionary of lists: {('a', 'number'): [('b', 'number'), ('c', 'string')]}
•Dictionary of dictionaries: {('a', 'number'): {'b': 'number', 'c': 'string'}}
•And so on, with any level of nesting.


3. Populate your data
To add data to the table, build a structure of data elements in the
exact same structure as the table schema. So, for example, if your
schema is a list, the data must be a list: 

•schema: [("color", "string"), ("shape", "string")] 
•data: [["blue", "square"], ["red", "circle"]] 
If the schema is a dictionary, the data must be a dictionary:

•schema: {("rowname", "string"): [("color", "string"), ("shape", "string")] }
•data: {"row1": ["blue", "square"], "row2": ["red", "circle"]}

http://code.google.com/intl/fr/apis/visualization/documentation/dev/gviz_api_lib.html#populatedata

最后，我想说，对于您的问题，您必须定义一个“表模式”，并处理您的CSV文件，以便获得a structure of data elements in the exact same structure as the table schema.

列中数据类型的定义在“表架构”的定义中完成。如果填充“数据表”必须使用正确类型的数据（不是字符串，我想说）我将帮助您编写从CSV提取数据的代码，这很简单。

目前，我希望这一切都是正确的，并将有助于

网友

2楼 · 编辑于 2024-06-01 09:18:00

首先，如果您只需要将这些数据可视化，则不需要进行任何转换：gviz可以处理JSON（基于文本的，您知道）或CSV（您已经拥有了它，不需要解析！）。您可以将该文件放在任何合理的web服务器上，并允许使用奇特的GET requests gviz问题访问它，基本上可以忽略这些参数。

但假设你需要处理。看起来您不仅读取了CSV文件，而且还尝试将其完全存储在RAM中。这可能不切实际：随着添加更多处理，您将越来越快地达到RAM限制。一次处理一行数据（如果应用窗口过滤器，则为合理的行数），并将处理过的行放入数据存储，而不是任何列表等。同样，当通过GET请求提供数据时，读取/处理一行，将其写入响应，而不要将其放入任何列表或其他内容中。

我认为转换技术没有问题，只要您稍后在代码中合理地使用i，并且在运行时不要记住所有的i。

网友

3楼 · 编辑于 2024-06-01 09:18:00

我首先使用regex开发了CSV文件，但是由于文件中的数据在每一行中的排列非常严格，我们可以简单地使用split（）函数

import gviz_api

scheme = [('col1','string','SURNAME'),('col2','number','ONE'),('col3','number','TWO')]
data_table = gviz_api.DataTable(scheme)

#  --- lines in surnames.csv are : --- 
#  surname,percent,cumulative percent,rank\n
#  SMITH,1.006,1.006,1,\n
#  JOHNSON,0.810,1.816,2,\n
#  WILLIAMS,0.699,2.515,3,\n

with open('surnames.csv') as f:

    def transf(surname,x,y):
        return (surname,float(x),float(y))

    f.readline()
    # to skip the first line surname,percent,cumulative percent,rank\n

    data_table.LoadData( transf(*line.split(',')[0:3]) for line in f )
    # to populate the data table by iterating in the CSV file

或者没有要定义的函数：

import gviz_api

scheme = [('col1','string','SURNAME'),('col2','number','ONE'),('col3','number','TWO')]
data_table = gviz_api.DataTable(scheme)

#  --- lines in surnames.csv are : --- 
#  surname,percent,cumulative percent,rank\n
#  SMITH,1.006,1.006,1,\n
#  JOHNSON,0.810,1.816,2,\n
#  WILLIAMS,0.699,2.515,3,\n

with open('surnames.csv') as f:

    f.readline()
    # to skip the first line surname,percent,cumulative percent,rank\n

    datdata_table.LoadData( [el if n==0 else float(el) for n,el in enumerate(line.split(',')[0:3])] for line in f )    
    # to populate the data table by iterating in the CSV file

有一次，我认为我必须一次用一行填充数据表，因为我使用的是regex，它需要在浮动数字字符串之前获取匹配的组。使用split（）所有操作都可以用LoadData（）在一条指令中完成

是的。

因此，您的代码可以缩短。顺便说一下，我不明白为什么它应该继续定义一个类。相反，一个函数对我来说似乎足够了：

def GvizFromCsv(filename):
  """ creates a gviz data table from a CSV file """

  data_table = gviz_api.DataTable([('col1','string','SURNAME'),
                                   ('col2','number','ONE'    ),
                                   ('col3','number','TWO'    ) ])

  #  --- with such a table schema , lines in the file must be like that: ---  
  #  blah, number, number, ...anything else...\n 
  #  SMITH,1.006,1.006, ...anything else...\n 
  #  JOHNSON,0.810,1.816, ...anything else...\n 
  #  WILLIAMS,0.699,2.515, ...anything else...\n

  with open(filename) as f:
    data_table.LoadData( [el if n==0 else float(el) for n,el in enumerate(line.split(',')[0:3])]
                         for line in f )
  return data_table

是的。

现在您必须检查是否可以在此代码中插入从另一个API读取CSV数据的方式，以保持填充数据表的迭代原则。

相关问题更多 >

编程相关推荐

热门问题

热门文章