多个for循环的python方式,在每次迭代中创建新的列表,并清理数据?

2024-10-05 19:12:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我对Python很陌生(使用Anaconda的3.5版本)——以前在MATLAB中有过经验。非常感谢你的帮助。如果有更简单的方法,请告诉我。你知道吗

我从一些实验设备的pdf文件中读取并清理了一些数据,并将其添加到一个列表中:

>print(outputdata)

[[['2.37701'], ['-'], ['-'], ['-'], ['-'], ['18.95276'], ['5.07365e-1']], [['2.75613'], ['-'], ['-'], ['-'], ['-'], ['16.99642'], ['4.10023e-1']], [['1.80527'], ['-'], ['-'], ['-'], ['-'], ['20.75384'], ['4.58238e-1']], [['1.58721'], ['-'], ['-'], ['-'], ['-'], ['18.06942'], ['3.81128e-1']], [['1.98336'], ['-'], ['-'], ['-'], ['-'], ['18.20776'], ['3.64733e-1']], [['1.75710'], ['-'], ['-'], ['-'], ['-'], ['23.03760'], ['4.36234e-1']], [['1.58967'], ['-'], ['-'], ['-'], ['-'], ['21.43884'], ['3.88509e-1']], [['2.37701'], ['-'], ['-'], ['-'], ['-'], ['18.95276'], ['5.07365e-1']]]

我正在尝试从列表的每个元素中提取每个元素,并将其保存到一个新列表中。我还想清理数据,去掉括号和引号,保留数字。我需要对这个做一些操作,所以我计划转换成一个numpy数组,然后将它添加到一个DataFrame中,以便更容易地导出到Excel(我已经有了导出的代码)。每个列向量对应一个特定的标题:

Molecule = ["H2", "Ar", "Methane", "Ethane", "Ethylene", "Propane(C3H8)", "Propylene"]

以下是所需H2数据的示例:

2.37701
2.75613
1.80527
1.58721
1.98336
1.75710
1.58967
2.37701

我首先完成了这个任务:

outputdatalist = [x[0] for x in outputdata]

具有以下输出:

[['2.37701'], ['2.75613'], ['1.80527'], ['1.58721'], ['1.98336'], ['1.75710'], ['1.58967'], ['2.37701']]

然后呢

for row in outputdatalist:
    print(' '.join(row))  # I need to append this on every iteration

我做这件事的不太成功的方法是做两倍(三倍?)for循环如下:

outputdatalist = []
for counter, elem in enumerate(Molecule):
for counter1, elem1 in enumerate(outputdata):
     outputdatalist[counter] = [x[counter1] for x in outputdata]

然后将每个outputdatalist[i]转换为np数组,然后通过pd.数据帧比如说

pd.DataFrame({Molecule[i]: outputdatalist[i]})

Tags: 数据方法in元素dataframe列表forcounter
1条回答
网友
1楼 · 发布于 2024-10-05 19:12:52

您可以使用nested list comprehension,这似乎比使用apply的解决方案更快:

df = pd.DataFrame([[y[0] for y in x] for x in outputdata], columns=Molecule)
print (df)
        H2 Ar Methane Ethane Ethylene Propane(C3H8)   Propylene
0  2.37701  -       -      -        -      18.95276  5.07365e-1
1  2.75613  -       -      -        -      16.99642  4.10023e-1
2  1.80527  -       -      -        -      20.75384  4.58238e-1
3  1.58721  -       -      -        -      18.06942  3.81128e-1
4  1.98336  -       -      -        -      18.20776  3.64733e-1
5  1.75710  -       -      -        -      23.03760  4.36234e-1
6  1.58967  -       -      -        -      21.43884  3.88509e-1
7  2.37701  -       -      -        -      18.95276  5.07365e-1

计时:(小数据帧)

In [21]: %timeit pd.DataFrame([[y[0] for y in x] for x in outputdata], columns=Molecule)
1000 loops, best of 3: 1.04 ms per loop

In [22]: %timeit (pd.DataFrame(outputdata, columns=Molecule).apply(lambda x: x.str[0]))
100 loops, best of 3: 4.59 ms per loop

相关问题 更多 >