用于在Pandas中创建所有可能的列对的代码

data=[['TAMU', 54, 0, 0, 6, 5, 0,],['UIUC', 33, 43, 5, 0, 76, 81], ['USC',4, 1, 0, 7, 21, 4], ['Austin',22,31, 0, 0,55, 0], ['UCLA', 55, 6, 7, 9, 11,12]] pd.DataFrame(data,columns = ['Name', 'Research', 'Thesis', 'Proposal', 'AI', 'Analytics', 'Data'])

def overflow(school1,school2,alpha): pvals_list=[] data=[['TAMU', 54, 0, 0, 6, 5, 0,],['UIUC', 33, 43, 5, 0, 76, 81], ['USC',4, 1, 0, 7, 21, 4], ['Austin',22,31, 0, 0,55, 0], ['UCLA', 55, 6, 7, 9, 11,12]] pd.DataFrame(data,columns = ['Name', 'Research', 'Thesis', 'Proposal', 'AI', 'Analytics', 'Data']) df=df[(df['Unnamed: 0'] == school1) | (df['Unnamed: 0'] == school2)] df=df.loc[:, df.ne(0).all()] df=df.set_index('Name') ### ####code to create columns pairs [for loop?]to feed to data_crosstab below ### data_crosstab = pd.crosstab() chi,p_vals = stats.chi2_contingency(data_crosstab)[:2] if p > alpha: pvals_list.appned(p_vals) return(pvals_list) overflow('USC','UCLA',0.05)

3条回答

网友

1楼 · 编辑于 2024-10-02 12:36:48

这是你想要的吗

[x for x in combinations(['Name', 'Research', 'Thesis', 
'Proposal', 'AI', 'Analytics', 'Data'], 2)]

输出：

[('Name', 'Research'),
 ('Name', 'Thesis'),
 ('Name', 'Proposal'),
 ('Name', 'AI'),
 ('Name', 'Analytics'),
 ('Name', 'Data'),
 ('Research', 'Thesis'),
 ('Research', 'Proposal'),
 ('Research', 'AI'),
 ('Research', 'Analytics'),
 ('Research', 'Data'),
 ('Thesis', 'Proposal'),
 ('Thesis', 'AI'),
 ('Thesis', 'Analytics'),
 ('Thesis', 'Data'),
 ('Proposal', 'AI'),
 ('Proposal', 'Analytics'),
 ('Proposal', 'Data'),
 ('AI', 'Analytics'),
 ('AI', 'Data'),
 ('Analytics', 'Data')]

网友

2楼 · 编辑于 2024-10-02 12:36:48

您需要将这两个数据传递到pd.crosstab以创建RxC Table：

>>> data_crosstab = pd.crosstab(df.loc['USC'], df.loc['UCLA'])
UCLA  6   7   9   11  12  55
USC                         
0      0   1   0   0   0   0
1      1   0   0   0   0   0
4      0   0   0   0   1   1
7      0   0   1   0   0   0
21     0   0   0   1   0   0

然后您可以将其传递给scipy.stats.chi2_contingency以获得结果：

>>> stats.chi2_contingency(pd.crosstab(df.loc['USC'], df.loc['UCLA']))
(24.000000000000014,
 0.24239216167051175,
 20,
 array([[0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667,
        0.16666667],
       [0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667,
        0.16666667],
       [0.33333333, 0.33333333, 0.33333333, 0.33333333, 0.33333333,
        0.33333333],
       [0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667,
        0.16666667],
       [0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667,
        0.16666667]]))

#chi is the first value i.e. 24 and p_vals is second value i.e. 0.24232

对于上面的一对行索引，可以正常工作，只需替换USC和UCLA

如果要对所有行执行此操作，可以在索引值上使用itertools中的combinations进行循环：

from itertools import combinations
for left, right in combinations(df.index.tolist(), 2):
    data_crosstab = pd.crosstab(df.loc[left], df.loc[right])

    #rest of the code

网友

3楼 · 编辑于 2024-10-02 12:36:48

IIUC，你想要itertools.combinations：

from itertools import combinations
for col1, col2 in combinations(df.set_index("Name").columns,2):
    #add your code here

使用combinations的结果是：

>>> list(combinations(df.set_index("Name").columns,2))
[('Research', 'Thesis'),
 ('Research', 'Proposal'),
 ('Research', 'AI'),
 ('Research', 'Analytics'),
 ('Research', 'Data'),
 ('Thesis', 'Proposal'),
 ('Thesis', 'AI'),
 ('Thesis', 'Analytics'),
 ('Thesis', 'Data'),
 ('Proposal', 'AI'),
 ('Proposal', 'Analytics'),
 ('Proposal', 'Data'),
 ('AI', 'Analytics'),
 ('AI', 'Data'),
 ('Analytics', 'Data')]

相关问题更多 >

编程相关推荐

热门问题

热门文章