相对于其他列中的值更改DataFrame中的列问题的回答

相对于其他列中的值更改DataFrame中的列

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

您可能需要考虑根据您希望如何利用数据来重新编制数据索引。你知道吗 您可以基于列“Trans”和“Num”索引数据，如下所示： <pre><code>#Change how we index the frame df.set_index(["Trans", "Num"], inplace=True) </code></pre> 接下来，我们将获取每个唯一的索引，这样我们就可以将它们全部替换掉（我非常确定这部分和下面的迭代可以批量完成，但我只是很快就完成了。如果您有效率问题，请研究如何不在所有索引上循环。） <pre><code>#Get only unique indexes unique_trans = list(set(df.index.get_level_values('Trans'))) </code></pre> 然后我们可以迭代并应用你想要的。你知道吗 <pre><code># Access each index for trans in unique_trans: # Get the higher number in "Num" for each so we know which to set to NaN max_num = max(df.ix[trans].index.values) # Copy your start column as a temp variable start = df.ix[trans]["Start"].copy() # Apply the transform to the start column (Equal to end + 10) df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10 # Apply the transform to the end column df.loc[trans, "End"] = np.array(start.shift(-1) - 10) # By passing a tuple as a row index, we get the element that is both in trans and the max number, #which is the one you want to set to NaN df.loc[(trans, max_num), "End"] = np.nan print(df) </code></pre> 运行数据时得到的结果是： <pre><code> Head Chr Start End Trans Num ENST473358 1 A 1 30049.0 30554.0 2 A 1 30677.0 30966.0 3 A 1 31107.0 NaN ENST417324 1 B 1 35277.0 35481.0 2 B 1 34554.0 35174.0 3 B 1 35721.0 NaN ENST461467 1 B 1 35245.0 35481.0 2 B 1 120775.0 NaN </code></pre> 我用来生成测试用例的完整代码如下： <pre><code>import pandas as pd import numpy as np # Setup your dataframe df = pd.DataFrame(columns=["Head", "Chr", "Start", "End", "Trans", "Num"]) df["Head"] = ["A", "A", "A", "B", "B", "B", "B", "B"] df["Chr"] = [1]*8 df["Start"] = [29554, 30564, 30976, 36091, 35491, 35184, 36083, 35491] df["End"] = [30039, 30667, 31097, 35267, 34544, 35711, 35235, 120765] df["Trans"] = ["ENST473358", "ENST473358", "ENST473358", "ENST417324", "ENST417324", "ENST417324", "ENST461467","ENST461467"] df["Num"] = [1, 2, 3, 1, 2, 3, 1, 2] # Change how we index the frame df.set_index(["Trans", "Num"], inplace=True) # Get only unique indexes unique_trans = list(set(df.index.get_level_values('Trans'))) # Access each index for trans in unique_trans: max_num = max(df.ix[trans].index.values) start = df.ix[trans]["Start"].copy() df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10 df.loc[trans, "End"] = np.array(start.shift(-1) - 10) df.loc[(trans, max_num), "End"] = np.nan print(df) </code></pre>

相对于其他列中的值更改DataFrame中的列

1 个回答

相关Python问题