按列中字符串值的一部分对数据帧进行切片

import pandas as pd # sample dataframe: cid=[1,2,3,4,5,6,7,8,9,10] strings=[ 'tncduuqcr', 'xqjfykalt', 'arzouazgz', 'tncknojbi', 'xqjgfcekh', 'arzupnzrx', 'tncfjxyox', 'xqjeboxdn', 'arzphbdcs', 'tnctnfoyi', ] df=pd.DataFrame(list(zip(cid,strings)),columns=['cid','strings']) # This is the step I would like to avoid doing: df['short_strings']=df['strings'].str[0:3] out_dict={} for x in df['short_strings'].unique(): df2=df[df['short_strings']==x] out_dict[x]=df2 # the separate dataframes: for x in out_dict.keys(): print(out_dict[x])

cid strings short_strings 0 1 tncduuqcr tnc 3 4 tncknojbi tnc 6 7 tncfjxyox tnc 9 10 tnctnfoyi tnc cid strings short_strings 1 2 xqjfykalt xqj 4 5 xqjgfcekh xqj 7 8 xqjeboxdn xqj cid strings short_strings 2 3 arzouazgz arz 5 6 arzupnzrx arz 8 9 arzphbdcs arz

1条回答

网友

1楼 · 发布于 2024-09-28 19:04:53

对于这种类型的操作，我们使用^{}+^{}，这里使用Series.unique的索引速度较慢：

mydict = dict(df.groupby(df.strings.str[:3]).__iter__())
print(mydict)

输出

{'arz':    cid    strings
 2    3  arzouazgz
 5    6  arzupnzrx
 8    9  arzphbdcs,
 'tnc':    cid    strings
 0    1  tncduuqcr
 3    4  tncknojbi
 6    7  tncfjxyox
 9   10  tnctnfoyi,
 'xqj':    cid    strings
 1    2  xqjfykalt
 4    5  xqjgfcekh
 7    8  xqjeboxdn}

相关问题更多 >

编程相关推荐

热门问题

热门文章