[PYTHON] Why you have to specify dtype when using keras pad_sequences

What is keras pad_sequences?

When the lengths of each vector are not the same, it is a method to make them the same length by adding 0 or cutting.

For example ...

>>> from keras.preprocessing import sequence
>>> import numpy as np
>>> data = [np.array([[1,2,3],[4,5,6]]),
...         np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])]
>>> data
[array([[1, 2, 3],
       [4, 5, 6]]), array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])]
>>> #Align the length to 4.
>>> data = sequence.pad_sequences(data, maxlen=4,padding="post", truncating="post")
>>> data
array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 0,  0,  0],
        [ 0,  0,  0]],

       [[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]]], dtype=int32)

Why you have to specify dtype

If you do not specify dtype, the value of int32 is returned by default.

Then, ** if the original data has floating point, it will be forcibly converted to int32 **.

For example, 0.1 becomes 0.

↓ When dtype is not specified

>>> from keras.preprocessing import sequence
>>> import numpy as np
>>> #data mixed with float
>>> data = [np.array([[0.1,0.2,0.3],[0.4,0.5,0.6]]),
...         np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])]
>>> data
[array([[0.1, 0.2, 0.3],
       [0.4, 0.5, 0.6]]), array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15]])]
>>> #Align the length to 4.
>>> data = sequence.pad_sequences(data, maxlen=4,padding="post", truncating="post")
>>> #The value that was float is automatically cast to int32 and becomes 0
>>> data
array([[[ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0]],

       [[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]]], dtype=int32)

Conclusion

When using pad_sequences, specify dtype.

sequence.pad_sequences(data, maxlen=4, padding="post", 
truncating="post", dtype=float32)

Recommended Posts

Why you have to specify dtype when using keras pad_sequences
Day 67 [Introduction to Kaggle] Have you tried using Random Forest?
When you want to send an object with requests using flask
When you want to use it as it is when using it with lambda memo
Try using n to downgrade the version of Node.js you have installed
"Lie ... What have you been up to?"
Defense Techniques When You Have to Fight the Performance of Unfamiliar Applications (Part 2)