哪个值在df中首先出现?

2024-09-19 23:44:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我的代码和输入:

import pandas as pd

df1 = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
                    'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})
df2 = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
                    'label':['other','heaptic','other','other','splenic','other','other','other','splenic','other','hepatic','other']})
label = ['other','hepatic','splenic']
for i in range(0,len(label)):
    if label[i] not in 'other':
        start_frame = df1.loc[(df1['label']=='splenic'),'frame']
        end_frame = df1.loc[(df1['label']=='hepatic'),'frame']
    else: print('other')

假设我想要在两个发生标签(splenichepatic)之间的掩码开始/结束帧进行进一步计算。我的问题是,在数据帧中,标签以不同的顺序出现。例如,在df1中,splenic首先出现在frame=2处,因此它将是我的start_frame。下一个我的end_frame是当我出现时hepatic,其中frame=5。如果我们继续下一步,我的start_frame将是hepatic,其中frame=9,end_framesplenic,其中frame=11。在df2中,顺序相反。对于哪个先出现,没有真正的模式。
所以我不能说splenic将是我的“开始”hepatic将是“结束”。它取决于哪个先出现:hepaticsplenic,所以它将是“开始”,第二个标签将分别是“结束”。 我对df1的期望:

start_frame=[2,9]
end_frame=[5,11]

Tags: indataframe顺序标签framestartloclabel
3条回答

你可以这样试试

import pandas as pd

df1 = pd.DataFrame({'frame': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                    'label': ['other', 'splenic', 'other', 'other', 'hepatic', 'other', 'other', 'other', 'hepatic',
                              'other', 'splenic', 'other']})
df2 = pd.DataFrame({'frame': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                    'label': ['other', 'heaptic', 'other', 'other', 'splenic', 'other', 'other', 'other', 'splenic',
                              'other', 'hepatic', 'other']})


def get_label(dataframe):
    label = ['hepatic', 'splenic']
    start_val = end_val = ''

    for _, df in dataframe.iterrows():
        if df['label'] in label:
            start_val = df['label']
            end_val = label[label.index(start_val)-1]
            break

    start_frame = list(dataframe.loc[(dataframe['label'] == start_val), 'frame'].values)
    end_frame = list(dataframe.loc[(dataframe['label'] == end_val), 'frame'].values)
    
    return start_frame, end_frame


if __name__ == '__main__':
    start, end = get_label(df1)
    print(start, end)

试试这个:

import numpy as np
import pandas as pd

df = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
                    'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})

# keep only rows with splenic or hepatic
df = df[(df.label == "splenic") | (df.label == "hepatic")]

# assign start/end, assumes there will be an even number of splenic/hepatic
df['tag'] = np.tile(['start','end' ], len(df)//2)

# from here you can extract the values you want
print(df)

# output

    frame    label    tag
1       2  splenic  start
4       5  hepatic    end
8       9  hepatic  start
10     11  splenic    end

我建议每次出现一个目标标签:

import numpy as np
import pandas as pd

df = pd.DataFrame({'frame':[1,2,3,4,5,6,7,8,9,10,11,12],
                    'label':['other','splenic','other','other','hepatic','other','other','other','hepatic','other','splenic','other']})

is_key = (df.label == "splenic") | (df.label == "hepatic")

现在我们可以提取事件的索引,并将each(index of index in id_key)设置为偶数start,将each奇数设置为end

id_key = np.where(is_key)[0]
start_frame_id, end_frame_id = id_key.reshape(-1, 2).T

正确的起始帧和结束帧为:

start_frame = df.loc[start_frame_id, "frame"]
end_frame = df.loc[end_frame_id, "frame"]

其结果是:

>>> start_frame
1    2
8    9
Name: frame, dtype: int64

>>> end_frame
4      5
10    11
Name: frame, dtype: int64

相关问题 更多 >