如何读取data.txt文本文件，对数据进行排序，然后使用Python将其转换为数据帧？问题的回答

如何读取data.txt文本文件，对数据进行排序，然后使用Python将其转换为数据帧？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h2>最新答复：</h2> <p>正如OP提到的，只有一条记录的文本文件，以下解决方案是合适的：</p> <pre class="lang-py prettyprint-override"><code>import pandas as pd import re from os import sep, getcwd from path import glob, Path from collections import OrderedDict def oneFileSingleRecordParser(textFilePath): fileName = textFilePath.rsplit(sep, 1)[-1] with open(textFilePath, "r") as textFile: # The structure is: # Yield: # Timestamp # Angle # ErrorCode 10 # ErrorCode 12 # ErrorCode 16 # ErrorCode 20 # The error codes can be present or absent lines = textFile.readlines() dataDict = OrderedDict() dataDict["File Name"] = fileName for line in lines: matchObject = re.match(r"(\w+\s?\d*):\s(.*)", line.strip()) if matchObject is not None: key, value = matchObject.groups() dataDict[key] = value return dict(dataDict) def convertAllFilesToDataFrame(textFilePathsRoot, parser = oneFileSingleRecordParser): if not os.path.isdir(textFilePathsRoot): raise Exception("Please pass in a valid path to the root of the text files") textFilePaths = list(map(lambda path: str(path), Path(textFilePathsRoot).glob("*.txt"))) dataDicts = [] for textFilePath in textFilePaths: dataDicts.append(parser(textFilePath)) dataFrame = pd.DataFrame(dataDicts) return dataFrame </code></pre> <p><code>convertAllFilesToDataFrame("path/to/your/text/file/directory")</code>仍应产生以下输出（在我的情况下，我只有两个具有完全相同记录的文件）：</p> <p><a href="https://i.stack.imgur.com/cvqU6.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/cvqU6.png" alt="enter image description here"/></a></p> <h2>原始答案</h2> <p>根据文本文件的结构，可以通过两种方式解决此问题：</p> <ul> <li>一个文本文件正好包含五行（一条记录）</li> <li>单个文本文件可能包含5行的倍数（多条记录）</li> </ul> <p>以下是我的应对策略：</p> <pre class="lang-py prettyprint-override"><code>import pandas as pd import re from os import sep, getcwd from path import glob, Path from collections import OrderedDict def oneFileSingleRecordParser(textFilePath): fileName = textFilePath.rsplit(sep, 1)[-1] with open(textFilePath, "r") as textFile: # The structure is: # Yield: # Timestamp # Angle # ErrorCode 10 # ErrorCode 12 lines = textFile.readlines() if len(lines) != 5: raise Exception("The file at {} doesn't have a proper single record.".format(textFilePath)) dataDict = OrderedDict() dataDict["File Name"] = fileName for line in lines: # regex to extract the key and value name matchObject = re.match(r"(\w+\s?\d*):\s(.*)", line.strip()) if matchObject is not None: key, value = matchObject.groups() dataDict[key] = value return dict(dataDict) def oneFileMultiRecordParser(textFilePath): fileName = textFilePath.rsplit(sep, 1)[-1] with open(textFilePath, "r") as textFile: # The structure is: # Yield_1: # Timestamp_1: # Angle_1: # ErrorCode 10_1: # ErrorCode 12_1: # Yield_2: # Timestamp_2: # Angle_2: # ErrorCode 10_2: # ErrorCode 12_2: # ... lines = textFile.readlines() if len(lines) % 5 != 0: raise Exception("The file at {} doesn't have a uniform structure.".format(textFilePath)) records = [] dataDict = OrderedDict() dataDict["File Name"] = fileName for index, line in enumerate(lines): # regex to extract the key and value name matchObject = re.match(r"(\w+\s?\d*):\s(.*)", line.strip()) if matchObject is not None: key, value = matchObject.groups() dataDict[key] = value else: raise Exception("Line={}, content=\"{}\" has some formatting issues, regex failed".format(index + 1, line)) if (index + 1) % 5 == 0: records.append(dataDict) dataDict = OrderedDict() # reset for next iteration dataDict["File Name"] = fileName return records def convertAllFilesToDataFrame( parser = oneFileSingleRecordParser, validParserNames = ("oneFileSingleRecordParser", "oneFileMultiRecordParser",) ): if not parser.__name__ in validParserNames: raise Exception("Proper parser was not used") pathToFiles = getcwd() textFilePaths = list(map(lambda path: str(path), Path(pathToFiles).glob("*.txt"))) dataDicts = [] for textFilePath in textFilePaths: if parser.__name__ == validParserNames[0]: dataDicts.append(parser(textFilePath)) elif parser.__name__ == validParserNames[1]: dataDicts.extend(parser(textFilePath)) dataFrame = pd.DataFrame(dataDicts) return dataFrame </code></pre> <p><code>convertAllFilesToDataFrame(parser = oneFileMultiRecordParser)</code>将产生： <a href="https://i.stack.imgur.com/9FqBU.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/9FqBU.png" alt="enter image description here"/></a></p> <p><code>convertAllFilesToDataFrame(parser = oneFileSingleRecordParser)</code>将产生： <a href="https://i.stack.imgur.com/cvqU6.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/cvqU6.png" alt="enter image description here"/></a></p> <p>代码并不完全枯燥，但您可能需要更多的时间来完成</p>

如何读取data.txt文本文件，对数据进行排序，然后使用Python将其转换为数据帧？

1 个回答

相关Python问题