将我的数据分成一个网格,在每个框中选择一个点

2024-06-26 00:26:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用一个大的数据集做一些计算,我想在不丢失较大的几何图形的情况下减少我使用的点的数量。我们的想法是将整个数据集划分成一个10 x 10的网格,每个框内只有一个点(最好尽可能靠近框的中心)进入我的另一个代码

假设我有一个这样的随机群体

x_rand = np.random.uniform(low=-20, high=20, size=(1000))
y_rand = np.random.uniform(low=-20, high=20, size=(1000))

我希望结果看起来像这样(在油漆上做得很快,所以不是很严格)。红色点是代码将选择的点) (https://i.imgur.com/s8j04uB.png) 我不知道我是否应该使用np.split或使布尔映射或矩阵


Tags: 数据代码网格size数量np情况random
2条回答

谢谢一堆sbaby171!我能把你的答案归纳成一个函数

def range_find(lowx,highx,lowy,highy,xs,ys): 
    _xs = []; _ys = [];
    i = 0
    while (i < len(xs)): 
        j = 0
        while (j < len(ys)): 
            if (lowx < xs[i] < highx) and (lowy < ys[j] < highy): 
                _xs.append(xs[i]); _ys.append(ys[j]);
            j += 1
        i += 1
    xmean = np.mean(np.asarray(_xs))
    ymean = np.mean(np.asarray(_ys))
    return xmean, ymean 

def make_grid(x,y,n):
    xsort = np.sort(x)
    ysort = np.sort(y)
    splitx = np.split(xsort, n)
    splity = np.split(ysort, n)

    lowx = np.zeros(n)
    lowy = np.zeros(n)
    highx = np.zeros(n)
    highy = np.zeros(n)

    for k in range (0,n):
        lowx[k] = np.nanmin(splitx[k])
        lowy[k] = np.nanmin(splity[k])
        highx[k] = np.nanmax(splitx[k])
        highy[k] = np.nanmax(splity[k])
    return lowx, highx, lowy, highy

def get_GridPoints(x,y,n):
    lowx,highx,lowy,highy = make_grid(x,y,n)
    xms = []
    yms = []
    print(lowx)
    print(highx)
    print(lowy)
    print(highy)
    for w in range(0,n):
        for t in range(0,n):
            xn,yn = range_find(lowx[w],highx[w],lowy[t],highy[t], x, y) 
            xms.append(xn)
            yms.append(yn)

    return np.array(xms), np.array(yms)

最后,我的代码返回了:https://i.imgur.com/mhazN9C.png

这不是一个接近正确答案的地方:在每次range_find调用的冗余检查中使用内存是浪费的,而且它只沿着x=y轴运行。然而,它展示了如何从零开始编写它的基本知识

基本上,选择一个区域,找到该区域中的所有X和Y,收集它们,然后计算平均值,并存储在单独的数组中

import numpy as np
import matplotlib.pyplot as plt

x = np.random.uniform(low=-20, high=20, size=(1000))
y = np.random.uniform(low=-20, high=20, size=(1000)) 

def range_find(low,high,xs,ys): 
    _xs = []; _ys = [];
    if len(xs) != len(ys):
        print("Inputs must be the same size")
        return None
    i = 0
    while (i < len(xs)): 
        j = 0
        while (j < len(ys)): 
            if (low < float(xs[i]) < high) and (low < float(ys[j]) < high):
                #print(" ~xs[%d],ys[%d] -> (%f,%f)"%(i,j,xs[i],ys[j]))  
                _xs.append(xs[i]); _ys.append(ys[j]);
            j += 1
        i += 1
    xmean = np.mean(np.asarray(_xs))
    ymean = np.mean(np.asarray(_ys))
    #print("x-Mean: %f"%(xmean))
    #print("y-Mean: %f"%(ymean))
    return xmean, ymean 

xms = []; yms = [];
vs = range_find(-20.1,-16.0,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(-15.999,-12.0,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(-11.999,-8.0,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(-7.999,-4.0,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(-3.999,0.0,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(0.0,3.999,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(4.0,7.999,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(8.0,11.999,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(12.0,15.999,x,y); xms.append(vs[0]);yms.append(vs[1]);
vs = range_find(16.0,20.1,x,y); xms.append(vs[0]);yms.append(vs[1]);

xms = np.asarray(xms)
yms = np.asarray(yms)

fig=plt.figure()
ax=fig.add_subplot(111)
ax.scatter(x,y)
ax.set_yticks([-20.0, -16.0, -12.0, -8.0, -4.0, 0, 4.0, 8.0, 12.0, 16.0, 20.0])
ax.set_xticks([-20.0, -16.0, -12.0, -8.0, -4.0, 0, 4.0, 8.0, 12.0, 16.0, 20.0])
ax.yaxis.grid(True)
ax.xaxis.grid(True)
plt.scatter(xms,yms, color = "red")

Scatter-plot

编辑:为10x10案例添加了网格记号标记

相关问题 更多 >