使用python的社交网络中的同性恋

2024-09-27 00:20:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图确定一个数据集的同质性,然后是同质性,该数据集的节点作为键,颜色作为值

例如:

Node  Target   Colors 
A       N        1
N       A        0 
A       D        1
D       A        1
C       X        1
X       C        0
S       D        0
D       S        1
B                0
R       N        2
N       R        2

颜色与节点列关联,范围从0到2(int)。 计算特征z(在我的例子中为颜色)上的嗜同性几率的步骤如下所示:

c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')
print("\nChance of same color:", round(chance_homophily(c_list),2))

其中chance_homophily定义如下:

#  The function below takes a dictionary with characteristics as keys and the frequency of their occurrence as values.
# Then it computes the chance homophily for that characteristic (color)

def chance_homophily(dataset):
    freq_dict = Counter([tuple(x) for x in dataset.values()])
    df_freq_counter = freq_dict
    c_list = list(df_freq_counter.values())
    
    chance_homophily = 0
    for class_count in c_list:
        chance_homophily += (class_count/sum(c_list))**2
    return chance_homophily

然后,嗜同性的计算如下:

def homophily(G, chars, IDs):
    """
    Given a network G, a dict of characteristics chars for node IDs,
    and dict of node IDs for each node in the network,
    find the homophily of the network.
    """
    num_same_ties = 0
    num_ties = 0
    for n1, n2 in G.edges():
        if IDs[n1] in chars and IDs[n2] in chars:
            if G.has_edge(n1, n2):
                num_ties+=1
                if chars[IDs[n1]] == chars[IDs[n2]]:
                    num_same_ties+=1
    return (num_same_ties / num_ties) 

G应该从上面的数据集构建(因此同时考虑节点和目标列)。 我并不完全熟悉这个网络属性,但我认为我在实现中遗漏了一些东西(例如,它是否正确地计算了网络中节点之间的关系?)。在另一个示例中(使用不同的数据集)在web上找到

https://campus.datacamp.com/courses/using-python-for-research/case-study-6-social-network-analysis?ex=1

该特性也是颜色(虽然它是一个字符串,但我有一个数字变量)。我不知道他们是否考虑到节点之间的关系来确定,可能是使用邻接矩阵:这部分还没有在我的代码中实现,我使用的是

G = nx.from_pandas_edgelist(df, source='Node', target='Target')

Tags: ofthe数据inidsfor节点颜色
1条回答
网友
1楼 · 发布于 2024-09-27 00:20:17

你的代码运行得非常好。您唯一缺少的是IDs dict,它会将节点的名称映射到图G中的节点名称。通过从pandas edgelist创建图,您已经命名了数据中的节点

这使得“IDs”dict的使用变得不必要。查看下面的示例,一次使用IDs dict,一次使用普通dict使用原始函数:

import networkx as nx
import pandas as pd
from collections import Counter

df = pd.DataFrame({"Node":["A","N","A","D","C","X","S","D","B","R","N"],
                  "Target":["N","A","D","A","X","C","D","S","","N","R"],
                  "Colors":[1,0,1,1,1,0,0,1,0,2,2]})

c_list=df[['Node','Colors']].set_index('Node').T.to_dict('list')

G = nx.from_pandas_edgelist(df, source='Node', target='Target')

def homophily_without_ids(G, chars):
    """
    Given a network G, a dict of characteristics chars for node IDs,
    and dict of node IDs for each node in the network,
    find the homophily of the network.
    """
    num_same_ties = 0
    num_ties = 0
    for n1, n2 in G.edges():
        if n1 in chars and n2 in chars:
            if G.has_edge(n1, n2):
                num_ties+=1
                if chars[n1] == chars[n2]:
                    num_same_ties+=1
    return (num_same_ties / num_ties)

print(homophily_without_ids(G, c_list))


#create node ids map - trivial in this case
nodes_ids = {i:i for i in G.nodes()}

def homophily(G, chars, IDs):
    """
    Given a network G, a dict of characteristics chars for node IDs,
    and dict of node IDs for each node in the network,
    find the homophily of the network.
    """
    num_same_ties = 0
    num_ties = 0
    for n1, n2 in G.edges():
        if IDs[n1] in chars and IDs[n2] in chars:
            if G.has_edge(n1, n2):
                num_ties+=1
                if chars[IDs[n1]] == chars[IDs[n2]]:
                    num_same_ties+=1
    return (num_same_ties / num_ties) 

print(homophily(G, c_list, nodes_ids))

相关问题 更多 >

    热门问题