同时循环以不断重新检查数据帧中的更改

2024-10-04 09:24:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个相同的数据帧newoldnew数据帧将在一天中随机更新。下面的代码检查是否有任何更改

import pandas as pd
import numpy as np

new = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Howard'],
                'episodes': [42, 24, 31, 29, 37, 40],
                'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
old = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Howard'],
                'episodes': [12, 32, 31, 32, 37, 40],
                'gender': ['male', 'female', 'female', 'female', 'male', 'male']}    

df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
df = pd.DataFrame(old, columns = ['name','episodes', 'gender'])

while True:
    df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
    print(df[~df.episodes.eq(df1.episodes)])
    df1 = df

我需要在while循环中写入条件,其中df[~df.episodes.eq(df1.episodes)]仅在检测到更改时才打印。打印新数据后,它会将两个数据框设置为相同的值(因为不再需要旧数据),并重新检查更改。上述代码将打印:

Columns: [name, episodes, gender]
Index: []
Empty DataFrame
Columns: [name, episodes, gender]
Index: []
Empty DataFrame
Columns: [name, episodes, gender]
Index: []
Empty DataFrame

因此,如果实际打印了更改,则可能会遗漏。你能建议一种更有效的方法来完成这项工作吗

==编辑==

根据@BENY的回答,如果我这样做:

import pandas as pd
import numpy as np

new = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Sheldon'],
                'episodes': [42, 24, 31, 29, 37, 40],
                'gender': ['male', 'female', 'female', 'female', 'male', 'male']}
old = {'name': ['Sheldon', 'Penny', 'Amy', 'Bernadette', 'Raj', 'Sheldon'],
                'episodes': [12, 32, 31, 32, 37, 40],
                'gender': ['male', 'female', 'female', 'female', 'male', 'male']}    

df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
df = pd.DataFrame(old, columns = ['name','episodes', 'gender'])

while True:
    df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
    out = df.merge(df1[['name','episodes']],on=['name','episodes'],how='left',indicator=True).loc[lambda x : x['_merge']=='left_only']
    print(out)
    df = df1

它会在整个whileloop过程中打印出来:

         name  episodes  gender     _merge
0     Sheldon        12    male  left_only
1       Penny        32  female  left_only
3  Bernadette        32  female  left_only
         name  episodes  gender     _merge
0     Sheldon        12    male  left_only
1       Penny        32  female  left_only
3  Bernadette        32  female  left_only
         name  episodes  gender     _merge
0     Sheldon        12    male  left_only
1       Penny        32  female  left_only
3  Bernadette        32  female  left_only

有没有办法只打印一次。直到有另一个变化。如果idf= df1,它将按如下方式打印,我将错过更改:

Columns: [name, episodes, gender, _merge]
Index: []
Empty DataFrame
Columns: [name, episodes, gender, _merge]

我需要在检测到更改的地方干净地获取这些数据


Tags: nameonlydataframedfnewgenderleftmale
2条回答

如果要比较2数据帧并检查任何更改/差异,为什么不使用^{}函数

以下是基于示例数据的示例输出:

df.compare(df1)

输出:

    episodes      
    self other
0   12.0  42.0
1   32.0  24.0
3   32.0  29.0

默认情况下,它仅突出显示差异。这里,它显示只有列episodes有差异。
self对应于df的值,other对应于df1的值

左侧的索引,即013显示了不同的行索引

如果要显示整个原始形状,还可以使用keep_shape=参数,如下所示:

df.compare(df1, keep_shape=True)

输出:

  name       episodes       gender      
  self other     self other   self other
0  NaN   NaN     12.0  42.0    NaN   NaN
1  NaN   NaN     32.0  24.0    NaN   NaN
2  NaN   NaN      NaN   NaN    NaN   NaN
3  NaN   NaN     32.0  29.0    NaN   NaN
4  NaN   NaN      NaN   NaN    NaN   NaN
5  NaN   NaN      NaN   NaN    NaN   NaN

仅显示不同的值NaN值是没有差异的值

当然,如果愿意,也可以选择显示所有值,包括相等值,如下所示:

df.compare(df1, keep_shape=True, keep_equal=True)

输出

         name             episodes        gender        
         self       other     self other    self   other
0     Sheldon     Sheldon       12    42    male    male
1       Penny       Penny       32    24  female  female
2         Amy         Amy       31    31  female  female
3  Bernadette  Bernadette       32    29  female  female
4         Raj         Raj       37    37    male    male
5      Howard      Howard       40    40    male    male

此选项允许您并排比较以检查差异。不管怎么说,要找出差异就不那么容易了

我建议您首先使用默认选项仅显示差异(可能是写下具有差异的行的索引),并且可以选择仅在需要详细检查另一侧值(相等)时使用其他2个选项

要在while循环下使用,可以使用:

while True:
    df1 = pd.DataFrame(new, columns = ['name','episodes', 'gender'])    
    out = df.compare(df1)
    print(out)
    df = df1

编辑

如果要查看name,同时保持只查看其他列的差异,可以使用append=True设置索引,如下所示:

df.set_index('name', append=True).compare(df1.set_index('name', append=True))

输出

                 episodes      
                 self other
  name                     
0 Sheldon        12.0  42.0
1 Penny          32.0  24.0
3 Bernadette     32.0  29.0

通过这种方式,您可以看到name和具有差异的行索引

让我们试试merge

out = df.merge(df1[['name','episodes']],on=['name','episodes'],how='left',indicator=True).loc[lambda x : x['_merge']=='left_only']
         name  episodes  gender     _merge
0     Sheldon        12    male  left_only
1       Penny        32  female  left_only
3  Bernadette        32  female  left_only

相关问题 更多 >