如何创建唯一的列表单元格？问题的回答

如何创建唯一的列表单元格？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

选项1:<code>.csv</code>，<code>.txt</code>文件 本机Python无法读取<code>.xls</code>文件。如果将文件转换为<code>.csv</code>或<code>.txt</code>，则可以使用标准库中的<code>csv</code>模块： <pre><code># `csv` module, Standard Library import csv filepath = "./test.csv" with open(filepath, "r") as f: reader = csv.reader(f, delimiter=',') header = next(reader) # skip 'A', 'B' items = set() for line in reader: line = [word.replace(" ", "") for word in line if word] line = filter(str.strip, line) items.update(line) print(list(items)) # ['uyete', 'NHYG', 'QHD', 'SGDH', 'AFD', 'DNGS', 'lkd', 'TTT'] </code></pre> <hr/> 选项2:<code>.xls</code>，<code>.xlsx</code>文件 如果要保留原始的<code>.xls</code>格式，则必须安装<a href="http://www.python-excel.org/" rel="nofollow noreferrer">third-party module</a>到{a2}。在 从命令提示符安装<code>xlrd</code>： ^{pr2}$ 在Python中： <pre><code># `xlrd` module, third-party import itertools import xlrd filepath = "./test.xls" with xlrd.open_workbook(filepath) as workbook: worksheet = workbook.sheet_by_index(0) # assumes first sheet rows = (worksheet.row_values(i) for i in range(1, worksheet.nrows)) cells = itertools.chain.from_iterable(rows) items = list({val.replace(" ", "") for val in cells if val}) print(list(items)) # ['uyete', 'NHYG', 'QHD', 'SGDH', 'AFD', 'DNGS', 'lkd', 'TTT'] </code></pre> <hr/> 选项3：数据帧 您可以使用pandas数据帧处理csv和文本文件。<a href="http://pandas.pydata.org/pandas-docs/stable/io.html" rel="nofollow noreferrer">See documentation</a>用于其他格式。在 <pre><code>import pandas as pd import numpy as np # Using data from gist.github.com/anonymous/a822647a00087abc12de3053c700b9a8 filepath = "./test2.txt" # Determines columns from the first line, so add commas in text file, else may throw an error df = pd.read_csv(filepath, sep=",", header=None, error_bad_lines=False) df = df.replace(r"[^A-Za-z0-9]+", np.nan, regex=True) # remove special chars stack = df.stack() clean_df = pd.Series(stack.unique()) clean_df </code></pre> 数据帧输出 <pre><code>0 India1 1 India2 2 myIndia 3 Where 4 Here 5 India 6 uyete 7 AFD 8 TTT dtype: object </code></pre> 另存为文件 <pre><code># Save as .txt or .csv without index, optional # target = "./output.csv" target = "./output.txt" clean_df.to_csv(target, index=False) </code></pre> 注意：选项1&2的结果也可以用<code>pd.Series(list(items))</code>转换成无序的pandas列式对象。在 最后：作为脚本 将上面三个选项中的任何一个保存在一个名为<code>restack.py</code>的函数（<code>stack</code>）中。将此脚本保存到一个目录。在 <pre><code># restack.py import pandas as pd import numpy as np def stack(filepath, save=False, target="./output.txt"): # Using data from gist.github.com/anonymous/a822647a00087abc12de3053c700b9a8 # Determines columns from the first line, so add commas in text file, else may throw an error df = pd.read_csv(filepath, sep=",", header=None, error_bad_lines=False) df = df.replace(r"[^A-Za-z0-9]+", np.nan, regex=True) # remove special chars stack = df.stack() clean_df = pd.Series(stack.unique()) if save: clean_df.to_csv(target, index=False) print("Your results have been saved to '{}'".format(target)) return clean_df if __name__ == "__main__": # Set up input prompts msg1 = "Enter path to input file e.g. ./test.txt: " msg2 = "Save results to a file? y/[n]: " try: # Python 2 fp = raw_input(msg1) result = raw_input(msg2) except NameError: # Python 3 fp = input(msg1) result = input(msg2) if result.startswith("y"): save = True else: save = False print(stack(fp, save=save)) </code></pre> 从其工作目录中，通过命令行运行脚本。回答提示： <pre><code>> python restack.py Enter path to input file e.g. ./test.txt: ./@data/test2.txt Save results to a file? y/[n]: y Your results have been saved to './output.txt' </code></pre> 您的结果应该在您的控制台中打印，并且可以选择保存到一个文件<code>output.txt</code>。根据您的兴趣调整任何参数。在

如何创建唯一的列表单元格？

1 个回答

相关Python问题