删除包含嵌套数据的重复项（图形）

import pandas as pd from numpy import nan d = {'start': {0: 4, 1: 3, 2: 2, 3: 1, 4: 12, 5: 11, 6: 23, 7: 22, 8: 21}, 'name': {0: 'Vitamin', 1: 'Vitamin D', 2: 'Vitamin D3', 3: 'Colecalciferol', 4: 'Vitamin D2', 5: 'Ergocalcifero', 6: 'Vitamin K', 7: 'Vitamin K2', 8: 'Menachinon'}, 'end': {0: nan, 1: 4.0, 2: 3.0, 3: 2.0, 4: 3.0, 5: 12.0, 6: 4.0, 7: 23.0, 8: 22.0}} df = pd.DataFrame(d) l1 = ['Colecalciferol', 'Vitamin D'] l2 = ['Colecalciferol', 'Ergocalcifero', 'Vitamin D3']

1条回答

网友
1楼 · 发布于 2024-09-28 16:20:56

你很接近！这里有一种使用图形方法的方法：我们只需检查节点是否有任何前置项，如果有，这意味着它不是最低级别的术语，我们不想保留它
import networkx as nx G = nx.Graph() G = nx.from_pandas_edgelist(df, 'start', 'end', create_using=nx.DiGraph()) filtered_l1 = [] for elmt in l1: node = int(df[df.name == elmt].start) if list(G.predecessors(node)) == []: filtered_l1.append(elmt) print(filtered_l1)
上面的for循环可以压缩成一行：[x for x in l1 if list(G.predecessors(int(df[df.name == x].start))) == []]
完全消除对networkx依赖的一种更简单的方法是，只需检查产品的start是否是任何产品的end，在这种情况下，它不是底层，我们希望将其过滤掉：
all_ends = df.end.unique() filtered_l1 = [x for x in l1 if int(df[df.name == x].start) not in all_ends]

相关问题更多 >

编程相关推荐

热门问题

热门文章