Introduction

RaggedTensor that represents variable length data introduced in TensorFlow 2.1 or later, but if you try to write with ordinary Tensor glue, there are various things I'm addicted to it. This time, we will use window processing for signal processing. A frame with a certain time width is shifted little by little to extract a waveform that falls within the frame range.

Verification environment

Ubuntu 18.04
Python 3.6.9
TensorFlow 2.2.0 (CPU)

Thing you want to do

Think of x as a batch and cut out a short interval waveform for each row of data. The length of the data varies. Here, the frame width is set to 2, and the cutout position is shifted by 1. [3, 1, 4, 1] is like [[3, 1], [1, 4], [4, 1]].

For the usual Tensor, there is a handy function called tf.signal.frame, but unfortunately It cannot be used forRaggedTensor``.

x = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
print(tf.signal.frame(x, 2, 1)) # NG
# ValueError: TypeError: object of type 'RaggedTensor' has no len()
print(tf.signal.frame(x.to_tensor(), 2, 1)) #It works, but a lot of extra 0s come out
# tf.Tensor(
# [[[3 1]
#   [1 4]
#   [4 1]]
# 
#  [[0 0]
#   [0 0]
#   [0 0]]
# 
#  [[5 9]
#   [9 2]
#   [2 0]]
# 
#  [[6 0]
#   [0 0]
#   [0 0]]
# 
#  [[0 0]
#   [0 0]
#   [0 0]]], shape=(5, 3, 2), dtype=int32)

solution

Think based on x.values, which represents the values that are flattened, and the length and offset of each row from RaggedTensor.

print(x.values)        #Tensor with values
# tf.Tensor([3 1 4 1 5 9 2 6], shape=(8,), dtype=int32)
print(x.row_starts())  #Start index (offset) of each row in values
# tf.Tensor([0 4 4 7 8], shape=(5,), dtype=int64)
print(x.row_lengths()) #Length of each line
# tf.Tensor([4 0 3 1 0], shape=(5,), dtype=int64)

For each row of x, consider from which index of x.values the values should be taken (*).

--Line 0 is [0, 1], [1, 2], [2, 3] --The first line is empty --The second line is [4, 5], [5, 6] --The third line is empty --The fourth line is empty

First of all, if you make a RaggedTensor that has the first index of (*)

s = x.row_starts()
e = s + x.row_lengths() - 1
r = tf.ragged.range(s, e)
print(r)
# <tf.RaggedTensor [[0, 1, 2], [], [4, 5], [], []]>

In addition, you can combine the one-advanced indexes to see where in x.values you should get the values for the expected result after windowing. The results correspond to the previous bullet points (*).

ind = tf.stack([r, r+1], axis=2)
print(ind)
# <tf.RaggedTensor [[[0, 1], [1, 2], [2, 3]], [], [[4, 5], [5, 6]], [], []]>

After that, you can use tf.gather () to get the values from x.values based on the index entered in ```ind``.

ret = tf.gather(x.values, ind)
print(ret)
# <tf.RaggedTensor [[[3, 1], [1, 4], [4, 1]], [], [[5, 9], [9, 2]], [], []]>

When the frame length is 3 or more

The way to make ʻe`` and ʻind is slightly different, but the general idea is the same. Broadcast is used to create `ʻind. For that, we add a dimension of length 1 at the end as r [:,:, tf.newaxis].

len_frame = 3
s = x.row_starts()
e = s + x.row_lengths() + 1 - len_frame
r = tf.ragged.range(s, e)
ind = r[:, :, tf.newaxis] + tf.range(0, len_frame, dtype="int64")
ret = tf.gather(x.values, ind)
print(ret)
# <tf.RaggedTensor [[[3, 1, 4], [1, 4, 1]], [], [[5, 9, 2]], [], []]>

Of course, it can be used even when len_frame = 2.

When the frameshift is 2 or more

It is OK if you change the step size of r.

len_frame = 2
shift_frame = 2
s = x.row_starts()
e = s + x.row_lengths() + 1 - len_frame
r = tf.ragged.range(s, e, shift_frame)
ind = r[:, :, tf.newaxis] + tf.range(0, len_frame, dtype="int64")
ret = tf.gather(x.values, ind)
print(ret)
# <tf.RaggedTensor [[[3, 1], [4, 1]], [], [[5, 9]], [], []]>

You can also use shift_frame = 1.

If the sample is multidimensional

For example, in stereo audio, the L and R values are stored in pairs.

x = tf.ragged.constant([[[3, 2], [1, 7], [4, 1], [1, 8]], [], [[5, 2], [9, 8], [2, 1]], [[6, 8]], []])

In fact, it works in exactly the same way as before.

len_frame = 2
shift_frame = 1
s = x.row_starts()
e = s + x.row_lengths() + 1 - len_frame
r = tf.ragged.range(s, e, shift_frame)
ind = r[:, :, tf.newaxis] + tf.range(0, len_frame, dtype="int64")
ret = tf.gather(x.values, ind)
print(ret)
# <tf.RaggedTensor [[[[3, 2], [1, 7]], [[1, 7], [4, 1]], [[4, 1], [1, 8]]], [], [[[5, 2], [9, 8]], [[9, 8], [2, 1]]], [], []]>

The number of dimensions has increased so much that I can't tell if it fits just by looking at it, but it should be okay ...

[PYTHON] [TensorFlow] I want to process windows with Ragged Tensor