在Python中解析和避免嵌套循环问题的回答

在Python中解析和避免嵌套循环

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>我有一个sql表，其中12000个条目存储在数据帧df1中，如下所示：</p> <div class="s-table-container"> ^{tb1}$ </div> <p>我还有另一个表，它包含20000个条目，存储在dataframe df中：</p> <div class="s-table-container"> ^{tb2}$ </div> <p>目的是在一个条件为CA单元格值的长度应大于2的句子中，将df1中的名称与df中的CA（用“”空格分隔）的名称的每个可能组合进行匹配。最简单的逻辑是在句子中搜索df1中的所有名称值，如果找到匹配项，则在同一句子中搜索CA值。但这样做会限制资源的使用</p> <p>下面是我尝试过的代码，我只能想到嵌套循环来完成任务。如果我使用两个函数，那么我将创建一个函数调用开销，如果我尝试递归，那么如果我超过Python中的递归函数调用，这将迫使内核关闭。通过向其传递一个句子（我必须解析500k个句子）来调用以下函数：</p> <pre><code> def disease_search(nltk_tokens_sen): for dis_index in range(len(df1)): disease_name=df1.at[dis_index,'name'] regex_for_dis = rf"\b{disease_name}\b" matches_for_dis= re.findall(regex_for_dis, nltk_tokens_sen, re.IGNORECASE | re.MULTILINE) if len(matches_for_dis)!=0: disease_marker(nltk_tokens_sen, disease_name) </code></pre> <p>如果上述函数找到匹配项，则调用此函数：</p> <pre><code> def disease_marker(nltk_tokens_sen, disease_name): for zz in range(len(df)): biomarker_txt=((df.at[zz,'CA'])) biomarker = biomarker_txt.split(" ") for tt in range(len(biomarker)): if len(biomarker[tt])>2: matches_for_marker = re.findall(rf"\b{re.escape(biomarker[tt])}\b", nltk_tokens_sen) if len(matches_for_marker)!=0: print("Match_found:", disease_name, biomarker[tt] ) </code></pre> <p>我是否需要完全改变我的逻辑，或者是否有一种Pythonic运行时有效的方法来实现它</p>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在Python中解析和避免嵌套循环

1 个回答

相关Python问题