如何在连接数据帧之后突出显示它们之间的差异?

2024-10-03 04:40:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧,如下所示:

XYZ
Year Quantity Car     Colour
2001 1000     Swift   Red
2001 16       Wagonar White
2001 16       Wagonar Black
2001 200      Baleno  Silver
2001 20       Zen     White

ABC  
Year Quantity Car     Colour
2001 1000     Swift   Red
2001 16       Wagonar White
2001 200      Baleno  Silver
2001 44       Alto    Blue

输出应该是这样的:

Year      Quantity Car             Colour
XYZ  ABC  XYZ  ABC XYZ     ABC     XYZ    ABC
2001 2001 1000 100 Swift   Swift   Red    Red
2001 2001 16   16  Wagonar Wagonar White  White
2001 2001 16       Wagonar         Black 
2001 2001 200  200 Baleno  Baleno  Silver Silver
2001 2001 20       Zen             White
2001 2001      44          Alto           Blue

我试过这个

df_all = pd.concat([df_temp, df_temp1], axis='columns', keys=['XYZ', 'ABC'])
print(df_all)
df_final = df_all.swaplevel(axis='columns')[df_temp.columns]
print(df_final)
def highlight_diff(data, color='yellow'):
    attr = 'background-color: {}'.format(color)
    other = data.xs('First', axis='columns', level=-1)
    return pd.DataFrame(np.where(data.ne(other, level=0), attr,''),index=data.index, columns=data.columns)

 df_final.style.apply(highlight_diff, axis=None)
 print(df_final)

应突出显示数据帧之间的差异。你知道吗

例如,在本例中,必须突出显示汽车:Wagonar Zen和Alto,因为它们在两个数据帧中是不同的

我尝试了这种连接方式:

    YEAR Quantity  CAR    COLOR  car     color
0   2001    16    Wagonar white  Wagonar white
1   2001    16    Wagonar black  Wagonar white
2   2001    20    Zen     white  NaN     NaN
3   2001    44    NaN     NaN    Alto    blue
4   2001   200    Baleno  silver Baleno  silver
5   2001  1000    Swift   red    Swift   red

所有大写标题属于xyz公司,小标题属于abc公司 如何比较“CAR”列和“CAR”列,以及“COLOR”列和“COLOR”列,并突出显示值不匹配的整行。你知道吗

我试过:

def highlight_rows(s):        
if not (s['CAR'] == s['car'] and s['COLOR'] == s['color']):
    return 'background-color: green'

df_final.style.apply(highlight_rows, axis = None)

但这行不通


Tags: columnsdfdatasilverredquantitycolorfinal
1条回答
网友
1楼 · 发布于 2024-10-03 04:40:10

重复对YearQuantity存在问题,因此可能的解决方案是在concat之前使用计数器创建唯一的MultiIndex

df_temp.index = df_temp.groupby(['Year','Quantity']).cumcount()
df_temp1.index = df_temp1.groupby(['Year','Quantity']).cumcount()

df_all = (pd.concat([df_temp.set_index(['Year','Quantity'], append=True), 
                     df_temp1.set_index(['Year','Quantity'], append=True)], 
                     axis='columns', 
                     keys=['XYZ', 'ABC']))
print(df_all)
                     XYZ              ABC        
                     Car  Colour      Car  Colour
  Year Quantity                                  
0 2001 16        Wagonar   White  Wagonar   White
       20            Zen   White      NaN     NaN
       44            NaN     NaN     Alto    Blue
       200        Baleno  Silver   Baleno  Silver
       1000        Swift     Red    Swift     Red
1 2001 16        Wagonar   Black      NaN     NaN

然后再将index转换成DataFrameconcat再转换成MultiIndex

df = df_all.index.to_frame().drop(0, axis=1)
df1 = pd.concat([df, df], axis=1, keys=('XYZ','ABC'))
print (df1)
                  XYZ            ABC         
                 Year Quantity  Year Quantity
  Year Quantity                              
0 2001 16        2001       16  2001       16
       20        2001       20  2001       20
       44        2001       44  2001       44
       200       2001      200  2001      200
       1000      2001     1000  2001     1000
1 2001 16        2001       16  2001       16

df_final = df_all.join(df1).reset_index(drop=True).swaplevel(axis='columns')[df_temp.columns]
print(df_final)
   Year       Quantity            Car           Colour        
    XYZ   ABC      XYZ   ABC      XYZ      ABC     XYZ     ABC
0  2001  2001       16    16  Wagonar  Wagonar   White   White
1  2001  2001       20    20      Zen      NaN   White     NaN
2  2001  2001       44    44      NaN     Alto     NaN    Blue
3  2001  2001      200   200   Baleno   Baleno  Silver  Silver
4  2001  2001     1000  1000    Swift    Swift     Red     Red
5  2001  2001       16    16  Wagonar      NaN   Black     NaN

最后添加新掩码并按位或组合-|

def highlight_diff(data, color='yellow'):
    attr = 'background-color: {}'.format(color)
    other1 = data.xs('XYZ', axis='columns', level=-1)
    other2 = data.xs('ABC', axis='columns', level=-1)
    return pd.DataFrame(np.where(data.ne(other1, level=0) | 
                                 data.ne(other2, level=0), attr,''),
                        index=data.index, columns=data.columns)

df_final = pd.DataFrame({('Year', 'XYZ'): {0: 2001, 1: 2001, 2: 2001, 3: 2001, 4: 2001, 5: 2001}, ('Year', 'ABC'): {0: 2001, 1: 2001, 2: 2001, 3: 2001, 4: 2001, 5: 2001}, ('Quantity', 'XYZ'): {0: 16, 1: 20, 2: 44, 3: 200, 4: 1000, 5: 16}, ('Quantity', 'ABC'): {0: 16, 1: 20, 2: 44, 3: 200, 4: 1000, 5: 16}, ('Car', 'XYZ'): {0: 'Wagonar', 1: 'Zen', 2: np.nan, 3: 'Baleno', 4: 'Swift', 5: 'Wagonar'}, ('Car', 'ABC'): {0: 'Wagonar', 1: np.nan, 2: 'Alto', 3: 'Baleno', 4: 'Swift', 5: np.nan}, ('Colour', 'XYZ'): {0: 'White', 1: 'White', 2: np.nan, 3: 'Silver', 4: 'Red', 5: 'Black'}, ('Colour', 'ABC'): {0: 'White', 1: np.nan, 2: 'Blue', 3: 'Silver', 4: 'Red', 5: np.nan}})

pic

相关问题 更多 >