无重叠的随机重采样

1.Original data (O) a. Randomly resampled dataset1 (RD1) b. Randomly resampled dataset2 (RD2) c. Randomly resampled dataset3 (RD3) d. Randomly resampled dataset4 (RD4) e. Randomly resampled dataset5 (RD5) 2. remove RD from O a. O - RD1 = New dataset1 b. O - RD2 = New dataset2 c. O - RD3 = New dataset3 d. O - RD4 = New dataset4 e. O - RD5 = New dataset5

2条回答

网友

1楼 · 编辑于 2024-06-17 10:24:42

如果你不想改变原始数据，你可以将一个索引数组随机移动到包含这些行的数组中，然后对前5组300行做任何你想做的事情，然后从剩下的部分中删除它们。在

例如，使用30行输入（数字1->30）而不是3000行：

$ cat tst.awk
function shuf(array,    i, j, t) {
    # Shuffles an array indexed by numbers from 1 to its length
    # Copied from https://www.rosettacode.org/wiki/Knuth_shuffle#AWK
    for (i=length(array); i > 1; i ) {
        # j = random integer from 1 to i
        j = int(i * rand()) + 1

        # swap array[i], array[j]
        t = array[i]
        array[i] = array[j]
        array[j] = t
    }
}

{ arr[NR] = $0 }

END {
    srand()
    shuf(arr)
    numBlocks = 5
    pct10 = length(arr) * 0.1
    for (i=1; i<=numBlocks; i++) {
        print "   - Block", i
        for (j=1; j<=pct10; j++) {
            print ++c, arr[c]
            delete arr[c]
        }
    }
    print "\n   - Remaining"
    for (i in arr) {
        print i, arr[i]
    }
}

一。在

^{pr2}$

再次证明输出是随机的：

$ seq 30 | awk -f tst.awk
   - Block 1
1 17
2 15
3 22
   - Block 2
4 19
5 1
6 13
   - Block 3
7 7
8 10
9 28
   - Block 4
10 5
11 2
12 8
   - Block 5
13 16
14 11
15 30

   - Remaining
16 14
17 18
18 26
19 4
20 29
21 12
22 21
23 27
24 3
25 24
26 6
27 9
28 23
29 20
30 25

网友

2楼 · 编辑于 2024-06-17 10:24:42

# Reproducible data    
data <- mtcars
n <- nrow(data)
K <- 5
# Get indices for splitting
ind <- integer(n)
new <- rep(1:K, each = 0.1 * n)
ind[sample(n, size = length(new))] <- new
# Split data
split(data, ind)

相关问题更多 >

编程相关推荐

热门问题

热门文章