Python通过字典键值对循环，并为每个唯一对过滤数据帧，以计算余弦相似性

2024-09-27 02:15:50 发布

您现在位置：Python中文网/ 问答频道 /正文

858

网友

男 | 程序猿一只，喜欢编程写python代码。

我有下面的字典。我需要使用字典中的每个键、值对运行softcosine相似度来过滤我的数据帧。如果手动分别输入search query和productid，我的原始函数将返回按分数排序的数据帧（分数\排序列）

但是，我如何循环使用每个productid和关键短语列表的字典，这将在每次唯一运行结束时向我的dataframe添加新行。我试过下面这样的东西；但是它已经运行了很长时间，我不认为它最终会为每个新的productId更新数据帧

我使用“glove-wiki-gigaword-50预训练单词嵌入”计算产品评论与查询搜索查询值的相似性，这些值是给定产品的mydict值

mydict={'id1': ['comfortable', 'happy', 'size', 'shrink'],
    'id2': ['black', 'comfortable', 'happy', 'true'],
    'id3': ['happy', 'wrong', 'size', 'usa'],
    'id4': ['comfortable', 'happy', 'length', 'belly']}



if 'glove' not in locals():  # only load if not already in memory
     glove = api.load("glove-wiki-gigaword-50")



def softCosinesim(df,mydict):
    products=[k for k in mydict.keys()]
    phraselist=[v for v in mydict.values()]
    for i in range(len(products)):
        for j in range(len(phraselist)):
            df = df[df["productId"] == str(products[i])]
            search_query=np.array(phraselist[j]).tolist()
            #search_query=np.array(['comfortable', 'happy', 'size', 'shrink']).tolist()
            #df=df[df['productId']=='id1']
            texts=df['reviews']
            corpus=[text for text in texts]
            sim_idx = WordEmbeddingSimilarityIndex(glove)
            dictionary = Dictionary(corpus+[search_query[:]])
            tfidf = TfidfModel(dictionary=dictionary)
            sim_matrix = SparseTermSimilarityMatrix(sim_idx, dictionary, tfidf)
            query_tf = tfidf[dictionary.doc2bow(search_query[:])]
            index = SoftCosineSimilarity(tfidf[[dictionary.doc2bow(document) for document in corpus]],sim_matrix)
            doc_sim_scores = index[query_tf]
            sorted_idx = np.argsort(doc_sim_scores)[::-1]
            sorted_idx=sorted_idx[:].tolist()
            df_sorted=pd.DataFrame(df[['productId','reviewid','title','reviews']].iloc[sorted_idx])
            df_sorted['score_sort'] = df_sorted.groupby(['productId']).cumcount()
        df_sorted=pd.merge(df_sorted,on='productId',how='left')
    return  df_sorted

Tags： in df for search dictionary sim query mydict

0条回答

目前没有回答

Python通过字典键值对循环，并为每个唯一对过滤数据帧，以计算余弦相似性

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python通过字典键值对循环，并为每个唯一对过滤数据帧，以计算余弦相似性

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >