从字符串列中计算单词的唯一时间

# create dummy data frame with text columns x=[1,2,3,4,5] y=['apple google microsoft spotify alibaba','google microsoft','spotify google microsoft amazon','amazon google apple','amazon google spotify amazon'] df=pd.DataFrame({'ID':x,'text':y}) df

# search amd count df2 = list() for company in listtry : df2.append(df.text.str.count(company).sum()) df3=pd.DataFrame({'company':listtry,'count':df2}) df4=df3.sort_values('count',ascending=False) df4 # gives results company count 1 google 5 5 amazon 4 2 microsoft 3 3 spotify 3 0 apple 2 4 alibaba 1 6 structo 0

2条回答

网友

1楼 · 编辑于 2024-10-02 06:25:15

再次尝试，将count更改为contains，并取df的长度：

for company in listtry :
    df2.append(len(df[df.text.str.contains(company)]))  # only changes here

网友

2楼 · 编辑于 2024-10-02 06:25:15

为什么不使用set删除重复项呢？（见第3行）

   x=[1,2,3,4,5]
   y=['apple google microsoft spotify alibaba','google microsoft','spotify google microsoft 
   amazon','amazon google apple','amazon google spotify amazon']
   y=[' '.join(set(yy.split(' '))) for yy in y] 
   df=pd.DataFrame({'ID':x,'text':y})

相关问题更多 >

编程相关推荐

热门问题

热门文章