如何在不使用Python中任何内置库的情况下处理其中一列中存在的列分隔符？问题的回答

如何在不使用Python中任何内置库的情况下处理其中一列中存在的列分隔符？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

不能使用任何内置模块是一个奇怪的限制，但是创建自己的csv解析器非常简单 正如您所注意到的，您必须处理值包含逗号的情况，CSV通过引用整个字符串来处理逗号 在完整数据链接中，还有一行添加了另一个褶皱： <pre><code>889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S </code></pre> 这是一个带有嵌入逗号的值，因此它被引用。但是，它在中也有一个引号，因此CSV格式通过将引号加倍来“转义”这些引号。我假设您需要保留这些转义的引号 <pre><code>def csv_values(text_line, delim=','): row = [] embedded = False parts = [] for word in text_line.split(delim): # Set flag marking start of quoted value if word.startswith('"'): embedded = True if embedded: # If scanning a quoted value (with embedded commas), # add the current portion to the accumulator # word = word.replace('""', r'"') parts.append(word) else: # Otherwise, append the value to the collection row.append(word) # Unset flag, marking end of quoted value if word.endswith('"'): embedded = False # Add the accumulated value # row.append(','.join(parts)[1:-1]) row.append(','.join(parts)) # Reset the accumulator parts = [] return row </code></pre> 这个实现是我的“原样”方法，这意味着我所做的唯一事情就是积累具有嵌入逗号的值。我使用第882-891行得到这个结果： <pre><code>['882', '0', '3', '"Markun, Mr. Johann"', 'male', '33', '0', '0', '349257', '7.8958', '', 'S'] ['883', '0', '3', '"Dahlberg, Miss. Gerda Ulrika"', 'female', '22', '0', '0', '7552', '10.5167', '', 'S'] ['884', '0', '2', '"Banfield, Mr. Frederick James"', 'male', '28', '0', '0', 'C.A./SOTON 34068', '10.5', '', 'S'] ['885', '0', '3', '"Sutehall, Mr. Henry Jr"', 'male', '25', '0', '0', 'SOTON/OQ 392076', '7.05', '', 'S'] ['886', '0', '3', '"Rice, Mrs. William (Margaret Norton)"', 'female', '39', '0', '5', '382652', '29.125', '', 'Q'] ['887', '0', '2', '"Montvila, Rev. Juozas"', 'male', '27', '0', '0', '211536', '13', '', 'S'] ['888', '1', '1', '"Graham, Miss. Margaret Edith"', 'female', '19', '0', '0', '112053', '30', 'B42', 'S'] ['889', '0', '3', '"Johnston, Miss. Catherine Helen ""Carrie"""', 'female', '', '1', '2', 'W./C. 6607', '23.45', '', 'S'] ['890', '1', '1', '"Behr, Mr. Karl Howell"', 'male', '26', '0', '0', '111369', '30', 'C148', 'C'] ['891', '0', '3', '"Dooley, Mr. Patrick"', 'male', '32', '0', '0', '370376', '7.75', '', 'Q'] </code></pre> 如果您希望不使用封闭引号并取消对嵌入引号的转义，可以取消注释行14&amp；24，并注释掉第25行。然后，该方法将给出： <pre><code>['882', '0', '3', 'Markun, Mr. Johann', 'male', '33', '0', '0', '349257', '7.8958', '', 'S'] ['883', '0', '3', 'Dahlberg, Miss. Gerda Ulrika', 'female', '22', '0', '0', '7552', '10.5167', '', 'S'] ['884', '0', '2', 'Banfield, Mr. Frederick James', 'male', '28', '0', '0', 'C.A./SOTON 34068', '10.5', '', 'S'] ['885', '0', '3', 'Sutehall, Mr. Henry Jr', 'male', '25', '0', '0', 'SOTON/OQ 392076', '7.05', '', 'S'] ['886', '0', '3', 'Rice, Mrs. William (Margaret Norton)', 'female', '39', '0', '5', '382652', '29.125', '', 'Q'] ['887', '0', '2', 'Montvila, Rev. Juozas', 'male', '27', '0', '0', '211536', '13', '', 'S'] ['888', '1', '1', 'Graham, Miss. Margaret Edith', 'female', '19', '0', '0', '112053', '30', 'B42', 'S'] ['889', '0', '3', 'Johnston, Miss. Catherine Helen "Carrie"', 'female', '', '1', '2', 'W./C. 6607', '23.45', '', 'S'] ['890', '1', '1', 'Behr, Mr. Karl Howell', 'male', '26', '0', '0', '111369', '30', 'C148', 'C'] ['891', '0', '3', 'Dooley, Mr. Patrick', 'male', '32', '0', '0', '370376', '7.75', '', 'Q'] </code></pre> 在任何情况下，您都可以使用如下功能： <pre><code>with open(file_name, 'r') as in_file: csv_lines = in_file.splitlines() # Separate header from rest headers, lines = csv_lines[0], csv_lines[1:] for line in lines: print(csv_values(line)) </code></pre>

如何在不使用Python中任何内置库的情况下处理其中一列中存在的列分隔符？

1 个回答

相关Python问题