从两个列表开始,例如:
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
我想让用户输入他们想要提取多少项,占整个列表长度的百分比,以及从每个列表中随机提取的相同索引。比如说我想要50%的产量
newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']
我使用以下代码实现了这一点:
from random import randrange
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
LengthOfList = len(lstOne)
print LengthOfList
PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []
HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake
for x in lstOne:
if len(RangeOfListIndices)==int(HowManyIndicesToMake):
break
else:
random_index = randrange(0,LengthOfList)
RangeOfListIndices.append(random_index)
print RangeOfListIndices
newlstOne = []
newlstTwo = []
for x in RangeOfListIndices:
newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
newlstTwo.append(lstTwo[int(x)])
print newlstOne
print newlstTwo
但我想知道是否有一种更有效的方法来实现这一点,在我的实际用例中,这是从145000个项目中进行的子抽样。此外,randrange在这个尺度上是否足够没有偏见?
谢谢你
只需将两个列表放在一起,使用
random.sample
进行采样,然后再次将zip
转换回两个列表。演示:
我看你这样做还行。
如果要避免多次对同一对象进行采样,可以执行以下操作:
Q.
I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.
A.最直接的方法直接符合您的规范:
Q.
in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
A.在Python2和Python3中,random.randrange()函数完全消除了偏差(它使用内部的方法进行多个随机选择,直到找到无偏差的结果)。
在Python2中,random.sample()函数稍微有点偏差,但仅在53位的最后一个舍入。在Python 3中,random.sample()函数使用内部的方法,并且没有偏见。
相关问题 更多 >
编程相关推荐