相对于其他列中的值更改DataFrame中的列

2024-09-30 10:31:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个像这样的数据框

Head    CHR Start   End Trans   Num 
A   1   29554   30039   ENST473358  1 
A   1   30564   30667   ENST473358  2 
A   1   30976   31097   ENST473358  3 
B   1   36091   35267   ENST417324  1 
B   1   35491   34544   ENST417324  2 
B   1   35184   35711   ENST417324  3 
B   1   36083   35235   ENST461467  1 
B   1   35491   120765  ENST461467  2

我需要改变列的开始和结束相对于列Trans和Num。意思是,列Trans有重复的值,这在列Num中提到过,以此类推。意思是我想把所有行的Start改为-End+10,End改为-Start from next row(具有相同的Trans)-10,依此类推

 Head  CHR   Start  End       Trans    Num 
    A   1   30564   30667   ENST473358  1
    A   1   30976   31097   ENST473358  2
    A   1   30267   NA      ENST473358  3
    B   1   35277   35481   ENST417324  1
    B   1   34554   35174   ENST417324  2
    B   1   35721   NA      ENST417324  3
    B   1   35245   35481   ENST461467  1
    B   1   120775  NA      ENST461467  2

任何帮助是非常感谢我可以做它没有考虑与以下脚本转换,但我不会得到我想要的输出。你知道吗

start = df['Start'].copy()
df['Start'] = df.End + 10
df['End'] = ((start.shift(-1) - 10))
df.iloc[-1, df.columns.get_loc('Start')] = ''
df.iloc[-1, df.columns.get_loc('End')] = ''
print (df)

Tags: columnsdftransgetstartheadnumend
2条回答

您可能需要考虑根据您希望如何利用数据来重新编制数据索引。你知道吗

您可以基于列“Trans”和“Num”索引数据,如下所示:

#Change how we index the frame
df.set_index(["Trans", "Num"], inplace=True)

接下来,我们将获取每个唯一的索引,这样我们就可以将它们全部替换掉(我非常确定这部分和下面的迭代可以批量完成,但我只是很快就完成了。如果您有效率问题,请研究如何不在所有索引上循环。)

#Get only unique indexes
unique_trans = list(set(df.index.get_level_values('Trans')))

然后我们可以迭代并应用你想要的。你知道吗

# Access each index
for trans in unique_trans:

    # Get the higher number in "Num" for each so we know which to set to NaN
    max_num = max(df.ix[trans].index.values)

    # Copy your start column as a temp variable
    start = df.ix[trans]["Start"].copy()

    # Apply the transform to the start column (Equal to end + 10)        
    df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10

    # Apply the transform to the end column
    df.loc[trans, "End"] = np.array(start.shift(-1) - 10)

    # By passing a tuple as a row index, we get the element that is both in trans and the max number, 
    #which is the one you want to set to NaN
    df.loc[(trans, max_num), "End"] = np.nan

print(df)

运行数据时得到的结果是:

                Head  Chr     Start      End
Trans      Num                             
ENST473358 1      A    1   30049.0  30554.0
           2      A    1   30677.0  30966.0
           3      A    1   31107.0      NaN
ENST417324 1      B    1   35277.0  35481.0
           2      B    1   34554.0  35174.0
           3      B    1   35721.0      NaN
ENST461467 1      B    1   35245.0  35481.0
           2      B    1  120775.0      NaN

我用来生成测试用例的完整代码如下:

import pandas as pd
import numpy as np
# Setup your dataframe
df = pd.DataFrame(columns=["Head", "Chr", "Start", "End", "Trans", "Num"])
df["Head"] = ["A", "A", "A", "B", "B", "B", "B", "B"]
df["Chr"] = [1]*8
df["Start"] = [29554, 30564, 30976, 36091, 35491, 35184, 36083, 35491]
df["End"] = [30039, 30667, 31097, 35267, 34544, 35711, 35235, 120765]
df["Trans"] = ["ENST473358", "ENST473358", "ENST473358",
               "ENST417324", "ENST417324", "ENST417324",
               "ENST461467","ENST461467"]
df["Num"] = [1, 2, 3, 1, 2, 3, 1, 2]

# Change how we index the frame
df.set_index(["Trans", "Num"], inplace=True)

# Get only unique indexes
unique_trans = list(set(df.index.get_level_values('Trans')))

# Access each index
for trans in unique_trans:
    max_num = max(df.ix[trans].index.values)

    start = df.ix[trans]["Start"].copy()
    df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10
    df.loc[trans, "End"] = np.array(start.shift(-1) - 10)
    df.loc[(trans, max_num), "End"] = np.nan

print(df)

您可以将现有代码放入函数中,然后按Trans分组并应用函数:

def func(df):
    start = df['Start'].copy()
    df['Start'] = df.End + 10
    df['End'] = ((start.shift(-1) - 10))
    df.iloc[-1, df.columns.get_loc('Start')] = ''
    df.iloc[-1, df.columns.get_loc('End')] = ''
    return df

df.groupby('Trans').apply(func)

结果:

  Head  CHR  Start     End       Trans  Num
0    A    1  30677   30966  ENST473358    1
1    A    1  31107   30257  ENST473358    2
2    A    1                 ENST473358    3
3    B    1  35491   34544  ENST417324    1
4    B    1  35184   35711  ENST417324    2
5    B    1                 ENST417324    3
6    B    1  35491  120765  ENST461467    1
7    B    1                 ENST461467    2

相关问题 更多 >

    热门问题