用datatyp按多列对xls文件内容排序

2024-09-28 22:33:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我必须按4列升序排序xls文件内容

我将xls文件内容转换为列表列表。以下是输入

输入

data = """ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015"""

字符串格式的相应输出

 data = """ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
"""

首先,我将数据拆分为列表格式:

    # Split data to list.
>>> data_list = [i.split(", ") for i in  data.split("\n")]

>>> print "\n".join([", ".join(i) for i in  data_list])
ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015

以下是排序要求:

- We have to sort by index0 , 
   if index0 have same values for multiple items then sort by Index2 
       if index0 and index2 are same for multiple items then sort by Index3
           if index0, index2 and index3 are same for multiple items then sort by Index5

我的逻辑是

  1. 创建index0、index2、index5和index5的字符串
  2. 使用步骤1中的键创建字典
  3. 使用排序函数对键列表进行排序
  4. 再次创建xls文件

代码:

>>> from collections import defaultdict
>>> data_dict = defaultdict(list)
>>> for i in data_list:
...     key = "%s%s%s%s"%(i[0].strip(), i[2].strip(), i[3].strip(), i[5].strip())
...     data_dict[key].append(i)
... 
>>> sorted_keys = sorted(data_dict.keys())
>>> 
>>> for i in sorted_keys:
...     for j in data_dict[i]:
...         print j
...         
... 
['ABC', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '30/12/2015']
['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '31/12/2015']
['ABC', 'Do not Consider1', '101', 'Title and Subtitle', 'Do not Consider2', '30/12/2015']
['ABC', 'Do not Consider1', '98', 'Title and Subtitle', 'Do not Consider2', '25/12/2015 ']
['ABC', 'Do not Consider1', '99', 'BIC Codes', 'Do not Consider2', '31/12/2015']
['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']

但是Index2中有数字,即第二列,Index5中有日期,即第5列,所以不能得到排序的数据

你能帮我修一下吗


Tags: andinfordata排序titlenotdo
2条回答

您可以使用sorted函数按多个键进行排序follows:- 你知道吗

sorted_list = sorted(data_list, key=lambda item: (item[0], int(item[2]), item[3]))
print "\n".join([", ".join(i) for i in  sorted_list])

退货

ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015

诀窍是让keylambda返回一个元组,其中包含要排序的所有值,并使用int()函数将第三列的值转换为整数

您应该可以通过一个sorted()调用来做您需要的事情。csv模块可用于解析数据:

import csv
import StringIO
from itertools import groupby

data = """ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015"""

csv_input = csv.reader(StringIO.StringIO(data), skipinitialspace=True)
rows = sorted(list(csv_input), key=lambda x: (x[0], int(x[2]), x[3], x[5]))

for row in rows:
    print row

这将为您提供以下信息:

['ABC', 'Do not Consider1', '98', 'Title and Subtitle', 'Do not Consider2', '25/12/2015 ']
['ABC', 'Do not Consider1', '99', 'BIC Codes', 'Do not Consider2', '31/12/2015']
['ABC', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '30/12/2015']
['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '31/12/2015']
['ABC', 'Do not Consider1', '101', 'Title and Subtitle', 'Do not Consider2', '30/12/2015']
['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015'] 

相关问题 更多 >