如何更好地执行此numpy计算

>>> c1=[[ [ [0,1,2],[1,3,1] ],[ [0,4,1],[1,2,3] ] ]] >>> import numpy as np >>> c1arr = np.array(c1) >>> c1arr #when I actually load from file, its not loading as this (check Q2 below) array([[[[0, 1, 2], [1, 3, 1]], [[0, 4, 1], [1, 2, 3]]]]) >>> np.sum(c1arr[0,0][:,2]*(c1arr[0,0][:,1]+V)) #sum over t*(r+V) 45.0

>>> type(a) #a is populated by parsing file <class 'list'> >>> print(a) [[[[0, -0.9, 0.3], [1, 0.9, 0.6]], [[0, -0.2, 0.6], [1, 0.7, 0.3]]], [[[1, 0.2, 1.0]], [[0, -0.8, 1.0]]]] >>> np.array(a) #note that this is not same as c1arr above <string>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray array([[list([[0, -0.9, 0.3], [1, 0.9, 0.6]]), list([[0, -0.2, 0.6], [1, 0.7, 0.3]])], [list([[1, 0.2, 1.0]]), list([[0, -0.8, 1.0]])]], dtype=object)

1条回答

网友

1楼 · 发布于 2024-09-28 05:27:44

在我看来，最直观和可维护的方法是使用熊猫，您可以在其中为列指定名称。另一个重要因素是，仅在熊猫中分组就容易得多

由于您的输入样本只包含整数，因此我定义了V 也作为整数数组：

V = np.array([10, 20])

我阅读了您的输入文件，如下所示：

df = pd.read_csv('Input.txt', sep=' ', names=['s1', 'a', 's2', 'r', 't'])

（打印以查看已读内容）

然后，为了获得s1和a的每个组合的结果，您可以运行：

result = df.groupby(['s1', 'a']).apply(lambda grp:
    (grp.t * (grp.r + V[grp.s1])).sum())

请注意，在引用命名列时，此代码很容易阅读

结果是：

s1  a
0   0     35
    1     50
1   0    138
    1    146
dtype: int64

每个结果都是整数，因为V也是一个 int类型。但是如果你像在你的帖子中一样定义它（一个数组float），结果也将是float类型（你的选择）

如果要获得每个s1的最大结果，请运行：

result.max(level=0)

这一次的结果是：

s1
0     50
1    146
dtype: int64

Numpy版本

如果你真的被限制为Numpy，还有一个解决方案，虽然更难阅读和更新

读取输入文件：
```
data = np.genfromtxt('Input.txt')
```
最初我尝试了int类型，就像在pandasonic解决方案中一样，但您的一条评论指出，最右边的两列是float。因此，由于Numpy数组必须是单一类型，因此整个数组必须是浮点类型

运行以下代码：

res = []
# First level grouping - by "s1" (column 0)
for s1 in np.unique(data[:,0]).astype(int):
    dat1 = data[np.where(data[:,0] == s1)]
    res2 = []
    # Second level grouping - by "a" (column 1)
    for a in np.unique(dat1[:,1]):
        dat2 = dat1[np.where(dat1[:,1] == a)]
        # t - column 4, r - column 3
        res2.append((dat2[:,4] * (dat2[:,3] + V[s1])).sum())
    res.append([s1, max(res2)])
result = np.array(res)

结果（aNumpy数组）为：

array([[  0.,  50.],
       [  1., 146.]])

左列包含s1值和右-最大值将第二级分组中的值分组

具有结构化数组的Numpy版本

实际上，您还可以使用Numpy结构化数组。那么代码至少更具可读性，因为您引用了列名，而不是列号

读取传递带有列名和类型的dtype的数组：

data = np.genfromtxt(io.StringIO(txt), dtype=[('s1', '<i4'),
    ('a', '<i4'), ('s2', '<i4'), ('r', '<f8'), ('t', '<f8')])

然后运行：

res = []
# First level grouping - by "s1"
for s1 in np.unique(data['s1']):
    dat1 = data[np.where(data['s1'] == s1)]
    res2 = []
    # Second level grouping - by "a"
    for a in np.unique(dat1['a']):
        dat2 = dat1[np.where(dat1['a'] == a)]
        res2.append((dat2['t'] * (dat2['r'] + V[s1])).sum())
    res.append([s1, max(res2)])
result = np.array(res)

Numpy版本

具有结构化数组的Numpy版本

相关问题更多 >

编程相关推荐

热门问题

热门文章