在datafram中搜索字符串

2024-09-30 06:20:27 发布

您现在位置:Python中文网/ 问答频道 /正文

你好,我有两个数据帧。一个是master db1(它有许多行),第二个是sourcetarget(它较小)。我想查看db1sourcetarget中的所有单词,如果匹配,我将创建一个新的布尔列(0,1)。我试过这个代码(复杂度很高),但我总是得到0。怎么了

start_time = time.time()

compt=0
for i in db1.clean_nomComplet:
    for j in sourcetarget.sourcetarget:
        res0 = i.find(j)
        if res0 >= 0:     
            db1['top'] = 1
        else:
            db1['top'] = 0
    compt+=1    
    print(compt/len(db1)*100,end="\r")
    if compt%50000 == 0:
        print("../data_out/sauve"+str(compt)+'.csv')
        db1.to_csv('../data_out/sauve'+str(compt)+'.csv', encoding='utf-8-sig')

print("--- %s seconds ---" % (time.time() - start_time))```

Tags: csvinfordataiftimetopout
1条回答
网友
1楼 · 发布于 2024-09-30 06:20:27

我发现做这种比较最好的方法是:

#1. You transform the values you want to check on as a set
# because you don't care about having them ordered. This saves A LOT of complexity
source = set(sourcetarget.sourcetarget.values)

# 2. Use the isin function
db1['top'] = 0
db1.loc[db1['clean_nomComplet'].isin(source), 'top'] = 1

脚本上的问题是更改整个列的值。您应该使用:

for index, row in db1.iterrows():
    [...]
    if res0 >= 0:     
        db1.loc[index,'top'] = 1
    else:
        db1[index, 'top'] = 0

相关问题 更多 >

    热门问题