在python中使用for循环进行过滤并返回多个数据帧

2024-05-18 10:53:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个df,看起来是这样的:

+--------+------------+-------+
| Fruit  |    Date    | Sales |
+--------+------------+-------+
| Apple  | 01/01/2020 |    20 |
| Apple  | 01/02/2020 |    30 |
| Orange | 01/01/2019 |    55 |
| Orange | 01/02/2018 |    15 |
+--------+------------+-------+

我想创建一个循环,通过水果过滤df,然后创建多个df,每个水果一个。O例如,我的目标是拥有以下两个单独的df:

Apple:
+--------+------------+-------+
| Fruit  |    Date    | Sales |
+--------+------------+-------+
| Apple  | 01/01/2020 |    20 |
| Apple  | 01/02/2020 |    30 |
+--------+------------+-------+
Orange:
+--------+------------+-------+
| Fruit  |    Date    | Sales |
+--------+------------+-------+
| Orange | 01/01/2020 |    55 |
| Orange | 01/02/2020 |    15 |
+--------+------------+-------+ 

我已经尝试了以下代码:

# list of fruits
fruits= df['Fruits'].unique()

for fruit in fruits:
  fruit= pd.DataFrame()
  fruit= df[df['Fruit']==fruit].reset_index(drop=True) 

也许我需要先创建一个列表,然后再转换成df,但我很困惑,所以任何帮助都将不胜感激


Tags: of代码apple目标dfdatelistunique
2条回答

我希望您正在尝试创建多个与唯一水果名称相同的数据帧名称

下面的代码段将不起作用,因为变量fruit将被替换为pd.DataFrame(),并且不会是“Apple”或“Orange”

for fruit in fruits:
  fruit = pd.DataFrame() # fruit will not be anymore Apple or Orange
  fruit = df[df['Fruit']==fruit].reset_index(drop=True) 

有两种方法可以根据需要创建数据帧

  1. 使用exec()方法(不推荐太多)
fruits = df['Fruit'].unique()

for fruit in fruits:
    # To use the fruit as variable name put it in {} and as the direct value use it straight

    exec(f"{fruit} = df[df['Fruit']==fruit].reset_index(drop=True)")

print(Orange)

Fruit   Date        Sales
Orange  01/01/2019  55
Orange  01/02/2018  15


print(Apple)

Fruit   Date        Sales
Apple   01/01/2020  20
Apple   01/02/2020  30
  1. 创建类以将动态变量存储为对象,并稍后从类中检索它
class df_names:
    pass

fruit_df = df_names()  #fruit_df will now hold all the variables that we are going to create


fruits = df['Fruit'].unique()

for fruit in fruits:

    # To use the fruit as variable name put it in {} and as the direct value use it straight
    # setattr(variable_holder, variable_name, value)
    setattr(fruit_df, f"{fruit}", df[df['Fruit']==fruit].reset_index(drop=True))


for fruit in fruits:
    print(getattr(fruit_df, f"{fruit}"))


Fruit  Date           Sales
Apple  01/01/2020     20
Apple  01/02/2020     30

Fruit   Date          Sales
Orange  01/01/2019     55
Orange  01/02/2018     15

df.groupby('Fruit') 

确实如此!它还有一个额外的好处,即它是多线程的,因此对其执行进一步的操作仍然很快,而且您仍然可以将所有数据保留在一个位置,这样在输入数据变大时就不会复制它了

相关问题 更多 >