<p>我想投入我的两分钱作为更快的解决方案,因为您提到性能很重要。与Code_Different的解决方案相比,该方法的执行速度约为每个文件的<strong>5-10倍。使用数据示例-<em>如何处理更大的文件,您必须测试自己</p>
<pre><code>def parse(file):
columns = []
#general_values = [] # use this if the meta data columns are different
column_values = ['SENSORID', 'DATESMPL', 'TRE', 'ISRC']
measurement_values = []
with open('tmp.csv', "r") as f:
for index, row in enumerate(f):
if index > 3: # test for measurement rows first as you will do it most often
measurement_values.append(row[:-1].split(';')[1:])
# uncomment next elif-clause if the meta data column names differ per file
#elif index == 0: # first row -> SENSORID;DATESMPL;TRE;ISRC
# columns += row[:-1].split(';') # get rid of newline and split
elif index == 1: # second row -> meta data
general_values = row[:-1].split(';') # get rid of newline and split
elif index == 2: # fourth row -> Lambdas as column names
columns += row[:-1].split(';')[1:] # get rid of newline, split and delete 'LAMBDAS'
df_array = [columns]
for measurement in measurement_values:
df_array.append(general_values + measurement)
return pd.DataFrame(df_array)
df = parse('tmp.csv')
</code></pre>