用numpy构建一个基本立方体？

3条回答

网友

1楼 · 编辑于 2024-06-01 07:12:05

这是一个解决方案的示意图，显然您需要包装辅助函数和类以提供一个简单的接口。其思想是将每个唯一的名称映射到一个索引（为了简单起见，这里是顺序的），然后使用该索引将值存储在数组中。它是次优的，因为您必须将数组填充到最大数量的不同项的最大大小。数组为零，否则不要包含在和中。如果想避免添加零元素，可以考虑掩码数组和掩码和。p>

import numpy as np

def get_dict(x):
    return {a:i for i, a in enumerate(set(x))}

#Mapping name to unique contiguous numbers (obviously put in a fn or class)
author = 4*["Shakespeare"]+ 2*["Dante"]
book = 2*["Hamlet"] + 2*["Romeo"] + 2*["Inferno"]
year = 3*["2000", "2001"]
sales = [104.2, 99.0, 27.0, 19.0, 11.6, 12.6]

#Define dictonary of indices
d = get_dict(author)
d.update(get_dict(book))
d.update(get_dict(year)) 

#Index values to put in multi-dimension array
ai = [d[i] for i in author]
bi = [d[i] for i in book]
yi = [d[i] for i in year]

#Pad array up to maximum size
A = np.zeros([np.max(ai)+1, np.max(bi)+1, np.max(yi)+1])

#Store elements with unique name as index in 3D datacube
for n in range(len(sales)):
    i = ai[n]; j = bi[n]; k = yi[n]
    A[i,j,k] = sales[n]

#Now we can get the various sums, for example all sales
print("Total=", np.sum(A))

#All shakespeare (0)
print("All shakespeare=", np.sum(A[d["Shakespeare"],:,:]))

#All year 2001
print("All year 2001", np.sum(A[:,:,d["2001"]]))

#All Shakespeare in 2000
print("All Shakespeare in 2000", np.sum(A[d["Shakespeare"],:,d["2000"]]))

网友

2楼 · 编辑于 2024-06-01 07:12:05

我认为numpy记录数组可以用于此任务，下面是我基于记录数组的解决方案

class rec_array():
    
    def __init__(self,author=None,book=None,year=None,sales=None):
        self.dtype = [('author','<U20'), ('book','<U20'),('year','<U20'),('sales',float)]
        self.rec_array = np.rec.fromarrays((author,book,year,sales),dtype=self.dtype)
        
    def add_record(self,author,book,year,sales):
        new_rec = np.rec.fromarrays((author,book,year,sales),dtype=self.dtype)
        if not self.rec_array.shape == ():
            self.rec_array = np.hstack((self.rec_array,new_rec))
        else:
            self.rec_array = new_rec
    
    def get_view(self,conditions):
        """
        conditions: 
            A list of conditions, for example 
            [["author",<,"Shakespeare"],["year","<=","2000"]]
        """
        mask = np.ones(self.rec_array.shape[0]).astype(bool)
        for item in conditions:
            field,op,target = item
            field_op = "self.rec_array['%s'] %s '%s'" % (field,op,target)
            mask &= eval(field_op)
        
        selected_sales = self.rec_array['sales'][mask]
        
        return np.sum(selected_sales)

基于此rec_array，给定数据

author = 4*["Shakespeare"]+ 2*["Dante"]
book = 2*["Hamlet"] + 2*["Romeo"] + 2*["Inferno"]
year = 3*["2000", "2001"]
sales = [104.2, 99.0, 27.0, 19.0, 11.6, 12.6]

我们创建一个实例

test = rec_array()
test.add_record(author,book,year,sales)

例如，如果你想卖掉莎士比亚的《罗密欧》，你可以这么做

test.get_view([["author","==","Shakespeare"],["book","==","Romeo"]])

输出为46.0

或者，你也可以这样做

test.get_view([["author","==","Shakespeare"],["year","<=","2000"]])

输出为131.2

网友
3楼 · 编辑于 2024-06-01 07:12:05

对于数据结构，可以定义以下类：

class Cube:

    def __init__(self, row_index, col_index, data):
        self.row_index = {r: i for i, r in enumerate(row_index)}
        self.col_index = {c: i for i, c in enumerate(col_index)}
        self.data = data

    def __getitem__(self, item):
        row, col = item
        return self.data[self.row_index[row] , self.col_index[col]]

    def __repr__(self):
        return repr(self.data)

基本上是一个二维numpy数组的轻包装。要计算交叉列表，您可以执行以下操作：

def _x_tab(rows, columns, values):
    """Function for computing the cross-tab of simple arrays"""
    unique_values_all_cols, idx = zip(*(np.unique(col, return_inverse=True) for col in [rows, columns]))

    shape_xt = [uniq_vals_col.size for uniq_vals_col in unique_values_all_cols]

    xt = np.zeros(shape_xt, dtype=np.float)
    np.add.at(xt, idx, values)

    return unique_values_all_cols, xt


def make_index(a, r):
    """Make array of tuples"""
    l = [tuple(row) for row in a[:, r]]
    return make_object_array(l)


def make_object_array(l):
    a = np.empty(len(l), dtype=object)
    a[:] = l
    return a


def fill_label(ar, le):
    """Fill missing parts with ALL label"""
    missing = tuple(["ALL"] * le)
    return [(e + missing)[:le] for e in ar]

def x_tab(rows, cols, values):
    """Main function for cross tabulation"""
    _, l_cols = rows.shape

    total_agg = []
    total_idx = []
    for i in range(l_cols + 1):
        (idx, _), agg = _x_tab(make_index(rows, list(range(i))), cols, values)
        total_idx.extend(fill_label(idx, l_cols))
        total_agg.append(agg)

    stacked_agg = np.vstack(total_agg)
    stacked_agg_total = stacked_agg.sum(axis=1).reshape(-1, 1)

    return Cube(total_idx, list(dict.fromkeys(cols)), np.concatenate((stacked_agg, stacked_agg_total), axis=1))

假设输入一个arr数组：

[['Shakespeare' 'Hamlet' 2000 104.2]
 ['Shakespeare' 'Hamlet' 2001 99.0]
 ['Shakespeare' 'Romeo' 2000 27.0]
 ['Shakespeare' 'Romeo' 2001 19.0]
 ['Dante' 'Inferno' 2000 11.6]
 ['Dante' 'Inferno' 2001 12.6]]

那么x_tab可以这样调用：

result = x_tab(arr[:, [0, 1]], arr[:, 2], arr[:, 3])
print(result)

输出

array([[142.8, 130.6, 273.4],
       [ 11.6,  12.6,  24.2],
       [131.2, 118. , 249.2],
       [ 11.6,  12.6,  24.2],
       [104.2,  99. , 203.2],
       [ 27. ,  19. ,  46. ]])

请注意，此表示法（repr）仅用于显示结果，您可以根据需要进行更改。然后可以按如下方式访问多维数据集的单元格：

print(result[('Dante', 'ALL'), 2001])
print(result[('Dante', 'Inferno'), 2001])
print(result[('Shakespeare', 'Hamlet'), 2000])

输出

12.6
12.6
104.2

请注意，大部分操作都在x_tab函数中，该函数使用纯numpy函数。同时，它为您选择的任何聚合函数提供了灵活的接口，只需更改此行的ufunc：

np.add.at(xt, idx, values)

由本list的任何其他人。有关更多信息，请参阅at运算符的文档

代码的工作副本可在here找到。以上是基于这个gist

注意这假设您正在为索引传递多个列（rows参数）

相关问题更多 >

编程相关推荐

热门问题

热门文章