Introduction

RaggedTensor that represents variable length data introduced in TensorFlow 2.1 or later, but if you try to write with ordinary Tensor glue, there are various things I'm addicted to it. This time is Indexing. Try to retrieve the value from ` RaggedTensor` by specifying a specific index. As you get used to it, you will be able to perform complicated operations ...

Verification environment

• Ubuntu 18.04
• Python 3.6.9
• TensorFlow 2.2.0 (CPU)

Indexing example

Suppose that ` x` is created as the ` RaggedTensor` to be indexed as follows.

``````x = tf.RaggedTensor.from_row_lengths(tf.range(15), tf.range(1, 6))
print(x)
# <tf.RaggedTensor [[0], [1, 2], [3, 4, 5], [6, 7, 8, 9], [10, 11, 12, 13, 14]]>
``````
Column index 0 1 2 3 4
Line 0 0
The first line 1 2
2nd line 3 4 5
3rd line 6 7 8 9
4th line 10 11 12 13 14

Slicing on a specific line

The first operation is to retrieve a line, which is the same as a normal ` Tensor`. You can think of it as ` numpy.ndarray`. If you specify a range, ** includes the first index and does not include the last index. ** If you are a Python user, I think there is no problem.

``````print(x[2])
# tf.Tensor([3 4 5], shape=(3,), dtype=int32)
print(x[1:4])
# <tf.RaggedTensor [[1, 2], [3, 4, 5], [6, 7, 8, 9]]>
``````

However, unlike ` numpy.ndarray`, it seems that slicing that specifies discrete rows cannot be used.

``````#This can be done for ndarray
print(x.numpy()[[1, 3]])
# [array([1, 2], dtype=int32) array([6, 7, 8, 9], dtype=int32)]

# Tensor/Not available for Ragged Tensor
print(x[[1, 3]])
# InvalidArgumentError: slice index 3 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/
``````

``````# Tensor/Fancy Indexing with Ragged Tensor
print(tf.gather(x, [1, 3], axis=0))
# <tf.RaggedTensor [[1, 2], [6, 7, 8, 9]]>
``````

Slicing with fixed column index

The following is an example of slicing with a fixed column index. Unlike a normal ` Tensor`, the presence or absence of an element at that index depends on the row, so it's simply

``````print(x[:, 2])
# ValueError: Cannot index into an inner ragged dimension.
``````

It is not possible to do like. If you specify the range

``````print(x[:, 2:3])
# <tf.RaggedTensor [[], [], [5], [8], [12]]>
``````

It works like. It is `[]` for the row where the specified index does not exist.

Column index 0 1 2 3 4
Line 0 0
The first line 1 2
2nd line 3 4 5
3rd line 6 7 8 9
4th line 10 11 12 13 14

Slicing with different column indexes for each row

If you have a ` Tensor` that lists the 2D indexes you want to collect, you can use `tf.gather_nd ()`.

``````ind = tf.constant([[0, 0], [1, 1], [2, 0], [4, 3]])
#x(0, 0), (1, 1), (2, 0), (4, 3)I want to collect elements
print(tf.gather_nd(x, ind))
# tf.Tensor([ 0  2  3 13], shape=(4,), dtype=int32)
``````
Column index 0 1 2 3 4
Line 0 0
The first line 1 2
2nd line 3 4 5
3rd line 6 7 8 9
4th line 10 11 12 13 14

On the other hand, I fetch one element for each row, but I think there are times when you want to fetch from different columns.

``````col = tf.constant([0, 0, 2, 1, 2])
#x(0, 0), (1, 0), (2, 2), (3, 1), (4, 2)I want to collect elements
#Add line numbers to the index, then use the same method as before
ind = tf.transpose(tf.stack([tf.range(tf.shape(col)[0]), col]))
print(tf.gather_nd(x, ind))
# tf.Tensor([ 0  1  5  7 12], shape=(5,), dtype=int32)
``````
Column index 0 1 2 3 4
Line 0 0
The first line 1 2
2nd line 3 4 5
3rd line 6 7 8 9
4th line 10 11 12 13 14

But I feel like it's going to be late, so I thought about a smarter way.

``````print(tf.gather(x.values, x.row_starts() + col))
# tf.Tensor([ 0  1  5  7 12], shape=(5,), dtype=int32)
``````

This is OK. The entity of the value of ` x` is contained in ` Tensor` (not ` RaggedTensor`) that connects each line (one dimension less) and can be obtained by accessing ` x.values`. I will. It also holds information about the start index of each row (`x.row_starts ()`) to represent the shape of ` x`. Therefore, you can add the specified offset to this index and slice against ` x.values`.

``````%timeit tf.gather_nd(x, tf.transpose(tf.stack([tf.range(tf.shape(col)[0]), col])))
# 739 µs ± 75.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit tf.gather(x.values, x.row_starts() + col)
# 124 µs ± 6.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
``````

This one is faster (^_^)

If you want to master the operation around here, it is good to see the official document.

If the column index is in a Ragged Tensor

Apply the fact that the substance of the value is in the one-dimensional ` Tensor`.

``````col = tf.ragged.constant([[0], [], [0, 2], [1, 3], [2]])
#x(0, 0), (2, 0), (2, 2), (3, 1), (3, 3), (4, 2)I want to collect elements

#Get the start index of each row of x
row_starts = tf.cast(x.row_starts(), "int32")
#Get the line number to which each component of col belongs, convert it to the starting index at x, and add the offset
ind_flat = tf.gather(row_starts, col.value_rowids()) + col.values
ret = tf.gather(x.values, ind_flat)
print(ret)
# tf.Tensor([ 0  3  5  7  9 12], shape=(6,), dtype=int32)
``````
Column index 0 1 2 3 4
Line 0 0
The first line 1 2
2nd line 3 4 5
3rd line 6 7 8 9
4th line 10 11 12 13 14

If you want to save the information of the original line

The result above is a normal ` Tensor` with the values listed, and the information in the original row is lost, but what if you want to save the row information? You can create a ` RaggedTensor` by giving the ` Tensor` information about the starting index of the row. The length of each row should match ` col`, so you can get this starting index from `col.value_rowids ()`.

``````print(tf.RaggedTensor.from_value_rowids(ret, col.value_rowids()))
# <tf.RaggedTensor [[0], [], [3, 5], [7, 9], [12]]>
``````

When the target Ragged Tensor is 3D or more

Even if the data of 2 dimensions or more are arranged in chronological order (3 dimensions or more for ` RaggedTensor` including batch dimension), the conventional method can be used as it is.

``````x = tf.RaggedTensor.from_row_lengths(tf.reshape(tf.range(30), (15, 2)), tf.range(1, 6))
print(x)
# <tf.RaggedTensor [[[0, 1]], [[2, 3], [4, 5]], [[6, 7], [8, 9], [10, 11]], [[12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29]]]>
``````

The structure of this ` x` can be interpreted as follows.

Column index 0 1 2 3 4
Line 0 [0, 1]
The first line [2, 3] [4, 5]
2nd line [6, 7] [8, 9] [10, 11]
3rd line [12, 13] [14, 15] [16, 17] [18, 19]
4th line [20, 21] [22, 23] [24, 25] [26, 27] [28, 29]

The rest is exactly the same as before. However, note that the returned ` Tensor` is two-dimensional.

``````ind = tf.constant([[0, 0], [1, 1], [2, 0], [4, 3]])
#x(0, 0), (1, 1), (2, 0), (4, 3)I want to collect elements
print(tf.gather_nd(x, ind))
# tf.Tensor(
# [[ 0  1]
#  [ 4  5]
#  [ 6  7]
#  [26 27]], shape=(4, 2), dtype=int32)
``````
``````col = tf.constant([0, 0, 2, 1, 2])
#x(0, 0), (1, 0), (2, 2), (3, 1), (4, 2)I want to collect elements
print(tf.gather(x.values, x.row_starts() + col))
# tf.Tensor(
# [[ 0  1]
#  [ 2  3]
#  [10 11]
#  [14 15]
#  [24 25]], shape=(5, 2), dtype=int32)
``````
``````col = tf.ragged.constant([[0], [], [0, 2], [1, 3], [2]])
#x(0, 0), (2, 0), (2, 2), (3, 1), (3, 3), (4, 2)I want to collect elements

#Get the start index of each row of x
row_starts = tf.cast(x.row_starts(), "int32")
#Get the line number to which each component of col belongs, convert it to the starting index at x, and add the offset
ind_flat = tf.gather(row_starts, col.value_rowids()) + col.values
ret = tf.gather(x.values, ind_flat)
print(ret)
# tf.Tensor(
# [[ 0  1]
#  [ 6  7]
#  [10 11]
#  [14 15]
#  [18 19]
#  [24 25]], shape=(6, 2), dtype=int32)

#If you want to save the information of the original line
print(tf.RaggedTensor.from_value_rowids(ret, col.value_rowids()))
# <tf.RaggedTensor [[[0, 1]], [], [[6, 7], [10, 11]], [[14, 15], [18, 19]], [[24, 25]]]>
``````