Seemingly simple question: I have an array with two columns, the first represents an ID and the second a count. I'd like to update it with another, similar array such that


import numpy as np

a = np.array([[1, 2],
              [2, 2],
              [3, 1],
              [4, 5]])

b = np.array([[2, 2],
              [3, 1],
              [4, 0],
              [5, 3]])

a.update(b)  # ????
>>> np.array([[1, 2],
              [2, 4],
              [3, 2],
              [4, 5],
              [5, 3]])

Is there a way to do this with indexing/slicing such that I don't simply have to iterate over each row?


Generic case

Approach #1: You can use np.add.at to do such an ID-based adding operation like so -

方法#1:您可以使用np.add.at来执行这样的基于ID的添加操作 -

# First column of output array as the union of first columns of a,b              
out_id = np.union1d(a[:,0],b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Find indices where the first columns of a,b are placed in out_id
_,a_idx = np.where(a[:,None,0]==out_id)
_,b_idx = np.where(b[:,None,0]==out_id)

# Place second column of a into out_id & add in second column of b
out_count[a_idx] = a[:,1]
np.add.at(out_count, b_idx,b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

To find a_idx and b_idx, as probably a faster alternative, np.searchsorted could be used like so -

要找到a_idx和b_idx,可能是一个更快的替代方案,可以像这样使用np.searchsorted -

a_idx = np.searchsorted(out_id, a[:,0], side='left')
b_idx = np.searchsorted(out_id, b[:,0], side='left')

Sample input-output :


In [538]: a
array([[1, 2],
       [4, 2],
       [3, 1],
       [5, 5]])

In [539]: b
array([[3, 7],
       [1, 1],
       [4, 0],
       [2, 3],
       [6, 2]])

In [540]: out
array([[1, 3],
       [2, 3],
       [3, 8],
       [4, 2],
       [5, 5],
       [6, 2]])

Approach #2: You can use np.bincount to do the same ID based adding -

方法#2:您可以使用np.bincount进行相同的ID添加 -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Get all IDs and counts in a single arrays
id_arr = np.concatenate((a[:,0],b[:,0]))
count_arr = np.concatenate((a[:,1],b[:,1]))

# Get binned summations
summed_vals = np.bincount(id_arr,count_arr)

# Get mask of valid bins
mask = np.in1d(np.arange(np.max(out_id)+1),out_id)

# Mask valid summed bins for final counts array output
out_count = summed_vals[mask]

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

Specific case

If the ID columns in a and b are sorted, it becomes easier, as we can just use masks with np.in1d to index into the output ID array created with np.union like so -

如果对a和b中的ID列进行排序,则会变得更容易,因为我们可以使用带有np.in1d的掩码来索引使用np.union创建的输出ID数组,如下所示 -

# First column of output array as the union of first columns of a,b  
out_id = np.union1d(a[:,0],b[:,0])

# Masks of first columns of a and b matches in the output ID array
mask1 = np.in1d(out_id,a[:,0])
mask2 = np.in1d(out_id,b[:,0])

# Initialize second column of output array
out_count = np.zeros_like(out_id)

# Place second column of a into out_id & add in second column of b
out_count[mask1] = a[:,1]
np.add.at(out_count, np.where(mask2)[0],b[:,1])

# Stack the ID and count arrays into a 2-column format
out = np.column_stack((out_id,out_count))

Sample run -

样品运行 -

In [552]: a
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [8, 5]])

In [553]: b
array([[2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [6, 2],
       [8, 2]])

In [554]: out
array([[1, 2],
       [2, 4],
       [3, 2],
       [4, 5],
       [5, 3],
       [6, 2],
       [8, 7]])



>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]
>>> result=np.concatenate((a,val))
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 5],
       [5, 3]])

Note that if you want the result become sorted you can use np.lexsort :



Explanation :


First you can find the unique ids with following command :


>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> col
array([1, 2, 3, 4, 5])

Then find the different between the ids if a and all of ids :


>>> dif=np.setdiff1d(col,a[:,0])
>>> dif

Then find the items within b with the ids in diff :


>>> val=b[np.in1d(b[:,0],dif)]
>>> val
array([[5, 3]])

And at last concatenate the result with list a:


>>> np.concatenate((a,val))

consider another example with sorting :


>>> a = np.array([[1, 2],
...               [2, 2],
...               [3, 1],
...               [7, 5]])
>>> b = np.array([[2, 2],
...               [3, 1],
...               [4, 0],
...               [5, 3]])
>>> col=np.unique(np.hstack((b[:,0],a[:,0])))
>>> dif=np.setdiff1d(col,a[:,0])
>>> val=b[np.in1d(b[:,0],dif)]

>>> result=np.concatenate((a,val))
>>> result[np.lexsort((result[:,0],result[:,0]))]
array([[1, 2],
       [2, 2],
       [3, 1],
       [4, 0],
       [5, 3],
       [7, 5]])

