如何将字符串拆分为字符并用浮点值替换字符，以在Python中找到原始字符串的总和？

def hydrophobicity_score(peptide): hydro = { 'A': -0.5, 'C': -1.0, 'D': 3.0, 'E': 3.0, 'F': -2.5, 'G': 0.0, 'H': -0.5, 'I': -1.8, 'K': 3.0, 'L': -1.8, 'M': -1.3, 'N': 0.2, 'P': 0.0, 'Q': 0.2, 'R': 3.0, 'S': 0.3, 'T': -0.4, 'V': -1.5, 'W': -3.4, 'Y': -2.3, } hydro_score = [hydro.get(aa,0.0)for aa in peptide] return sum(hydro_score) og_pep['Hydro'] = og_pep['Peptide'].apply(hydrophobicity_score) og_pep

2条回答

网友

1楼 · 编辑于 2024-09-24 22:27:31

def hydrophobicity_score(peptide):
     hydro = { 
        'A': -0.5,
        'C': -1.0,
        'D': 3.0,
        'E': 3.0,
        'F': -2.5,
        'G': 0.0,
        'H': -0.5,
        'I': -1.8,
        'K': 3.0,
        'L': -1.8,
        'M': -1.3,
        'N': 0.2,
        'P': 0.0,
        'Q': 0.2,
        'R': 3.0,
        'S': 0.3,
        'T': -0.4,
        'V': -1.5,
        'W': -3.4,
        'Y': -2.3,
    }
    hydro_score = [hydro[aa] for aa in peptide]
    return sum(hydro_score)

og_peptide= og_pep['Peptide']
og_peptide = og_peptide.str.replace('\W+','')
og_peptide = og_peptide.str.replace('\d+','')
og_peptide = pd.DataFrame(og_peptide)
og_peptide['Hydro_Score'] = og_peptide.apply(hydrophobicity_score)
og_peptide

我没有得到预期的输出

Output

Here is og_pep DataFrame

网友

2楼 · 编辑于 2024-09-24 22:27:31

好的，首先，您不想在数据帧中的行上循环。这些行被设计为并行处理。了解这一点有些费劲，但一旦定义了一些行级操作并将其应用于大型数据帧，就会变得更平滑。（行上循环的问题是速度中的一个问题。它有时在调试或玩具问题中很有用，但现代计算硬件试图尽可能地并行计算。数据帧利用这一点一次处理所有行，而不是在循环中单独处理它们。）

要进行转换，您需要定义一个自定义函数来对每一行进行操作。然后将该自定义函数传递给dataframe，并告诉它apply将该行级函数传递给一列，以便生成一个新列

因此，这里有一个可能的函数让您开始：

def peptide_score(peptide_string):
    '''Returns a numerical score given a sequence of peptide characters.'''
    # Replace the values in this dict (dictionary / map) with whatever values you need
    amino_acid_scores = { 
        'A': 0.1,
        'C': 1.4,
        'G': 0.32342,
        'T': -0.23,
        'U': 74.22
    }
    # This is called a "list comprehension." It's great for transforming sequences.
    score_list = [amino_acid_scores[character] for character in peptide_string]
    return sum(score_list)

# I'm assuming your pre-existing dataframe is called "gluc_dataframe" and that the
# column with your strings is called "Peptide".  Output scores will be stored in a new
# column, "score". Replace those names with whatever fits.
gluc_dataframe['score'] = gluc_dataframe['Peptide'].apply(peptide_score)

如果您有很多要忽略的字符（空格、标点符号等），可以将列表中的amino_acid_scores[character]替换为amino_acid_scores.get(character, 0.0)

相关问题更多 >

编程相关推荐

热门问题

热门文章