如何在循环中填充pandas数据帧？

df=pd.read_csv('data.csv') cdf = df.drop(['DateTime'], axis=1) wells = ['N1','N2','N3','N4','N5','N6','N7','N8','N9'] for well in wells: wellname = well well = pd.DataFrame() well_cols = [col for col in cdf.columns if wellname in col] well = cdf[well_cols]

[27884 rows x 10 columns] N9_Inj_Casing_Gas_Valve ... N9_Inj_Casing_Gas_Pressure 0 74.375000 ... 2485.602364 1 74.520833 ... 2485.346000 2 74.437500 ... 2485.341091

3条回答

网友

1楼 · 编辑于 2024-09-27 07:35:06

IIUC这就足够了：

df=pd.read_csv('data.csv')
cdf = df.drop(['DateTime'], axis=1)

wells = ['N1','N2','N3','N4','N5','N6','N7','N8','N9']
well_dict={}
for well in wells:

    well_cols = [col for col in cdf.columns if well in col]
    well_dict[well] = cdf[well_cols]

如果你想填充一些东西，字典通常是一种方法。在本例中，如果您输入well_dict['N1']，您将得到第一个数据帧，依此类推。在

网友

2楼 · 编辑于 2024-09-27 07:35:06

迭代数组时，数组的元素不是可变的。也就是说，基于您的示例，它正在做什么：

# 1st iteration
well = 'N1' # assigned by the for loop directive
...
well = <empty DataFrame> # assigned by `well = pd.DataFrame()`
...
well = <DataFrame, subset of cdf where col has 'N1' in name> # assigned by `well = cdf[well_cols]`
# 2nd iteration
well = 'N2' # assigned by the for loop directive
...
well = <empty DataFrame> # assigned by `well = pd.DataFrame()`
...
well = <DataFrame, subset of cdf where col has 'N2' in name> # assigned by `well = cdf[well_cols]`
...

但是在任何时候您都没有更改数组，也没有为此存储新的数据帧（尽管在迭代结束时，您仍将最后一个数据帧存储在well中）。在

在我看来，将数据帧存储在dict中更容易使用：

^{pr2}$

但是，如果您真的希望它出现在列表中，您可以执行以下操作：

df=pd.read_csv('data.csv')
cdf = df.drop(['DateTime'], axis=1)

wells = ['N1','N2','N3','N4','N5','N6','N7','N8','N9']
for ix, well in enumerate(wells):
    well_cols = [col for col in cdf.columns if well in col]
    wells[ix] = cdf[well_cols]

网友

3楼 · 编辑于 2024-09-27 07:35:06

解决这个问题的一种方法是使用pd.MultiIndex和{}。在

您可以添加一个由井标识符和变量名组成的多重索引。如果您有df：

   N1_a  N1_b  N2_a  N2_b
1     2     2     3     4
2     7     8     9    10

您可以使用df.columns.str.split('_', expand=True)来解析好标识符对应的变量名（即a或{}）。在

^{pr2}$

  N1    N2    
   a  b  a   b
0  2  2  3   4
1  7  8  9  10

然后您可以转置数据帧和groupby多索引级别0。在

grouped = df.T.groupby(level=0)

要返回未转换的子数据帧列表，可以使用：

wells = [group.T for _, group in grouped]

其中wells[0]是：

而wells[1]是：

最后一步是相当不必要的，因为可以从分组对象grouped访问数据。在

总而言之：

import pandas as pd
from io import StringIO

data = """
N1_a,N1_b,N2_a,N2_b
1,2,2,3,4
2,7,8,9,10
"""

df = pd.read_csv(StringIO(data)) 

# Parse Column names to add well name to multiindex level
df = pd.DataFrame(df.values, columns=df.columns.str.split('_', expand=True)).sort_index(1)

# Group by well name
grouped = df.T.groupby(level=0)

#bulist list of sub dataframes
wells = [group.T for _, group in grouped]

相关问题更多 >

编程相关推荐

热门问题

热门文章