从数据帧中查找字符串中的子字符串索引

2024-10-04 01:31:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含两列(和许多行)的数据帧,一列是完整序列,另一列是contains a sub sequence.

我想找到完整序列中子序列开始位置的索引,并将其添加为另一列:

我试过这个:

df["start"] = df.sequence.index(df.sub_sequence)

但这返回:TypeError: 'RangeIndex' object is not callable

我做错了什么

以下是df和我希望结束的df:

示例数据帧:

import pandas as pd 

data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}    
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])

  sequence sub_sequence
0    abcde          cde
1    fghij           gh
2    klmno           no

预期结果:

data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

Tags: columns数据nodataframedfdata序列gh
1条回答
网友
1楼 · 发布于 2024-10-04 01:31:19

在列表理解中使用^{}^{}

df['start'] = [seq.index(sub) for seq, sub in zip(df['sequence'], df['sub_sequence'])]

或者沿着{}+{a2}使用{a3}:

df['start'] = df[['sequence', 'sub_sequence']].apply(lambda s: str.index(*s), axis=1)

结果:

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

相关问题 更多 >