Python：从数据帧中的值（忽略NaN值）形成矩阵（2D数组）

DRG01 DRG02 ... DRG11 DRG12 0 AMOXYCILLIN ORAL SOLIDS AMOEBICIDES ORAL SOLIDS ... NaN NaN 1 VITAMIN DROPS NaN ... NaN NaN 2 AMOXYCILLIN ORAL SOLIDS ANTIHISTAMINES ORAL LIQ ... NaN NaN 3 AMOEBICIDES ORAL LIQUID NaN ... NaN NaN ... ... ... ... ... ... 81531 NaN NaN ... NaN NaN [81532 rows x 12 columns]

AMOXYCILLIN ORAL SOLIDS AMOEBICIDES ORAL SOLIDS ANTIHISTAMINES ORALLIQ VITAM.. AMOXYCILLIN ORAL SOLIDS 0 1 1 0 AMOEBICIDES ORAL SOLIDS 1 1 0 0 ANTIHISTAMINES ORAL LIQ 1 0 0 0 VITAMIN DROPS 0 0 0 1 ..... .....

2条回答

网友

1楼 · 编辑于 2024-09-30 16:37:15

使用itertools.combinations和一些panda函数，您可以很好地完成这项工作：

pairs_df = pd.DataFrame(df.apply(lambda x: pd.Series(map(sorted, combinations(x, 2))), axis=1).stack().to_list())
# pairs_df has a row for every pair of drugs (in columns 0, 1).
pairs_df["occurrences"] = 1
pairs_df = pairs_df.groupby([0, 1]).sum()  # Group identical combinations and count occurences.
result_df = pairs_df.reset_index(level=1).pivot(columns=1)  # Pivot to create the requested shape.

网友

2楼 · 编辑于 2024-09-30 16:37:15

像这样

from collections import Counter
from collections import defaultdict as dd
import pandas as pd

connection_counter = dd(lambda: Counter()) # count for every drug the time it appears with every other drug
def to_counter(row): #send each row to the connection_counter and add a connection to each value in the row with all other drugs in row  
    for drug_name in row:
        connection_counter[drug_name].update(row)
        connection_counter[drug_name].pop(drug_name,None) # so it won't count an appearance with itself

df.apply(lambda x: to_counter(x), axis = 1)  #df is the table you have 

df1 = pd.DataFrame()  # the table you want

for drug_name in connection_counter:
    df1 = df1.append(pd.DataFrame(connection_counter[drug_name],index = [drug_name]))

相关问题更多 >

编程相关推荐

热门问题

热门文章