如何像excel（在这种情况下是工作表）那样搜索重复项，然后在数据框中突出显示它们？

import pandas as pd #Exporting raw data from a csv file DataOrigin = pd.read_csv('RAWDATA.csv') #Sorting raw data per interesting columns DataOriginSorted = DataOrigin.sort_values(['srcip','attack','dstip']) #Exporting some columns of historical data and sorting them Historicaldata2 = pd.read_excel('Historicaldata.xlsx', sheet_name=1, usecols = ['Source_IP','Ticket','Customer_Notification','Hostname','Service_desk_ticket']) Historicaldata2Sorted = Historicaldata2.sort_values(['Source_IP','Ticket']) #Creating a multindex variable with sorted raw data index = pd.MultiIndex.from_frame(DataOriginSorted) Sorted_DataOrigin = pd.DataFrame(index=index) #Making a count of events per source IP and exporting them as a csv for the code to work (rename column oepration) Daily_IncidentsIPS = pd.crosstab(DataOrigin.srcip,DataOrigin.attack).to_csv('ControlFile1.csv') Daily_IncidentsIPS = pd.read_csv('ControlFile1.csv').rename(columns = {'srcip': 'Source_IP'}, inplace = False ) #Mergin 2 dataframes to find coincident data and exporting them to a csv for the next operations to take place and using only interesting columns Historical2vsSortedOrigin = Historicaldata2Sorted.merge(Daily_IncidentsIPS,left_on='Source_IP',right_on='Source_IP', how='inner').to_csv('ControlFile2.csv') Historical2vsSortedOrigin = pd.read_csv('ControlFile2.csv', usecols = ['Ticket','Hostname','Source_IP','Customer_Notification','Service_desk_ticket']) #Searching for duplicated data between two interesting dataframes duplicated = Daily_IncidentsIPS['Source_IP'].isin(Historical2vsSortedOrigin['Source_IP']) #Creating a rule to color the rows where the duplicated values are present def row_styler(row): return ['background-color: yellow' if duplicated[row.name] else ''] * len(row) #Creating a multindex variable to show the data as I want it index2 = pd.MultiIndex.from_frame(Historical2vsSortedOrigin) IncidentMatching = pd.DataFrame(index=index2) #Saving 3 interesting dataframes in an excel file, highlighting the results of previous "search for duplicated" operation writer = pd.ExcelWriter('C:\\Users\myuser\Documents\Spyder\Results_IPS.xlsx', engine='xlsxwriter') Daily_IncidentsIPS.style.apply(row_styler, axis=1).to_excel(writer, sheet_name='Sheet1') Sorted_DataOrigin.to_excel(writer, sheet_name='Sheet2') IncidentMatching.to_excel(writer, sheet_name='Sheet3') writer.save()

1条回答

网友

1楼 · 发布于 2024-09-30 16:37:16

您可以在熊猫中使用样式系统：

# Some mock data
df1 = pd.DataFrame({
    'Name': ['David', 'Sue', 'Mary'],
    'Location': ['San Francisco', 'New York', 'Boston']
})

df2 = pd.DataFrame({
    'Name': ['Sue', 'Mary', 'Joe', 'Jack']
})

# Now determine what Name is duplicated across the frames
duplicated = df1['Name'].isin(df2['Name'])

下一步取决于要突出显示的内容。如果只想突出显示重复值（如Excel中的值）：

def col_styler(col):
    if col.name != 'Name':
        return [''] * len(col)

    return duplicated.map({
        True: 'background-color: yellow',
        False: ''
    })

df1.style.apply(col_styler)

输出：

如果要高亮显示整行，请执行以下操作：

def row_styler(row):
    return ['background-color: yellow' if duplicated[row.name] else ''] * len(row)

df1.style.apply(row_styler, axis=1)

输出：

相关问题更多 >

编程相关推荐

热门问题

热门文章