回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>上下文:</p>
<p>我的数据帧包含以下列:HapID、Marker、Start_position、End_position。
对于每个HapID,我想得到:
-具有最小起始位置的标记(称为leftMarker)
-具有最大结束位置的标记(称为rightMarker)
-间隔为差值(最大结束位置-最小起始位置)</p>
<p>我的问题是,既然知道了标记名的索引,如何检索它们。
我得到下面的错误,我不知道如何解决它,虽然我花了几个小时。在</p>
<p>这是错误消息</p>
<blockquote>
<p>AttributeError: Cannot access callable attribute 'iloc' of 'SeriesGroupBy' objects, try using the 'apply' method</p>
</blockquote>
<p>以下是数据</p>
<pre><code>HapID Marker Start_position End_position
hap_1 mk1 1107207 1107256
hap_1 mk2 1104711 1104760
hap_1 mk3 1106845 1106894
hap_2 mk4 11901413 11901462
hap_2 mk5 206031250 206031299
hap_2 mk6 11498893 11498942
hap_2 mk7 17236023 17236072
hap_2 mk8 11692209 11692258
hap_2 mk9 11691512 11691561
hap_2 mk10 11615664 11615713
</code></pre>
<p>这是预期的输出</p>
^{pr2}$
<p>代码:</p>
<pre><code>import pandas as pd
data = {
'HapID':['hap_1','hap_1','hap_1','hap_2','hap_2','hap_2','hap_2','hap_2','hap_2','hap_2'],
'Marker':['mk1','mk2','mk3','mk4','mk5','mk6','mk7','mk8','mk9','mk10'],
'Start_position':[1107207,1104711,1106845,11901413,206031250,11498893,17236023,11692209,11691512,11615664],
'End_position':[1107256,1104760,1106894,11901462,206031299,11498942,17236072,11692258,11691561,11615713]}
df = pd.DataFrame(data)
haplotypes = df.groupby(df['HapID'])
posi_1 = haplotypes.Start_position.min()
posi_2 = haplotypes.End_position.max()
diff_posi = posi_2 - posi_1
a = haplotypes.Start_position.idxmin()#index at minimum Start_position
b = haplotypes.End_position.idxmax() #index at maximum End_position
#print('{} {} {}'.format(posi_1,posi_2,diff_posi))
#print('{} {}'.format(a,b)) #just to se if I'm getting the index
</code></pre>
<p>现在,问题是如何检索每个单倍型在这些位置的标记</p>
<pre><code>leftMarker = haplotypes.Marker.iloc(a)
rightMarker = haplotypes.Marker.iloc(b)
</code></pre>