Neural Machine Translation with Attention

If you wanna read more about this type of Machine Translation, Wikipedia is a good source.

This notebook has been inspited by the Deeplearning.ai - Sequence models course - Attention mechanism.

The idea is to build a Neural Machine Translation (NMT) model to translate human readable dates ("10th of September, 1978") into machine readable dates ("1978-09-10"), using attention model.

Attention mechanism

If you had to translate a book's paragraph from French to English, you would not read the whole paragraph, then close the book and translate. Even during the translation process, you would read/re-read and focus on the parts of the French paragraph corresponding to the parts of the English you are writing down.

The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step. Andrew Ng explains this mechanism quite well in these tho videos: Attention model intuition and Attention model

Here is a figure to remind you how the model works. The diagram on the left shows the attention model:

Here are some properties of the model:

  • There are two separate LSTMs in this model (see diagram on the left). Because the one at the bottom of the picture is a Bi-directional LSTM and comes before the attention mechanism, we will call it pre-attention Bi-LSTM. The LSTM at the top of the diagram comes after the attention mechanism, so we will call it the post-attention LSTM. The pre-attention Bi-LSTM goes through $T_x$ time steps; the post-attention LSTM goes through $T_y$ time steps.

  • The post-attention LSTM passes $s^{\langle t \rangle}, c^{\langle t \rangle}$ from one time step to the next. The LSTM has both the output activation $s^{\langle t\rangle}$ and the hidden cell state $c^{\langle t\rangle}$. In this model the post-activation LSTM at time $t$ does will not take the specific generated $y^{\langle t-1 \rangle}$ as input; it only takes $s^{\langle t\rangle}$ and $c^{\langle t\rangle}$ as input. We have designed the model this way, because (unlike language generation where adjacent characters are highly correlated) there isn't as strong a dependency between the previous character and the next character in a YYYY-MM-DD date.

  • We use $a^{\langle t \rangle} = [\overrightarrow{a}^{\langle t \rangle}; \overleftarrow{a}^{\langle t \rangle}]$ to represent the concatenation of the activations of both the forward-direction and backward-directions of the pre-attention Bi-LSTM.

  • The diagram on the right uses a RepeatVector node to copy $s^{\langle t-1 \rangle}$'s value $T_x$ times, and then Concatenation to concatenate $s^{\langle t-1 \rangle}$ and $a^{\langle t \rangle}$ to compute $e^{\langle t, t'}$, which is then passed through a softmax to compute $\alpha^{\langle t, t' \rangle}$. We'll explain how to use RepeatVector and Concatenation in Keras below.

Let's begin importing the necessary packages:

In [1]:
from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import numpy as np
from faker import Faker
import random
from tqdm import tqdm
from babel.dates import format_date
import matplotlib.pyplot as plt
%matplotlib inline
Using TensorFlow backend.

Step1: Generate Dataset

Following we'll use Faker to generate our own dataset of human readable date with their iso format (machine readable date) as labels.

In [2]:
fake = Faker()

# We need to seed these guys. For some reason I always use 101
fake.seed(101)
random.seed(101)

We're gonna generate a dataset with different formats. In this case I'm giving more chances to the most humman readable formats.

In [3]:
FORMATS = ['short', # d/M/YY
           'medium', # MMM d, YYY
           'medium',
           'medium',
           'long', # MMMM dd, YYY
           'long',
           'long',
           'long',
           'long',
           'full', # EEEE, MMM dd, YYY
           'full',
           'full',
           'd MMM YYY', 
           'd MMMM YYY',
           'd MMMM YYY',
           'd MMMM YYY',
           'd MMMM YYY',
           'd MMMM YYY',
           'dd/MM/YYY',
           'EE d, MMM YYY',
           'EEEE d, MMMM YYY']

Let's have a look at those formats:

In [4]:
for format in FORMATS:
    print('%s => %s' %(format, format_date(fake.date_object(), format=format, locale='en')))
short => 7/19/09
medium => Apr 3, 1983
medium => Sep 11, 2006
medium => May 29, 1994
long => October 15, 2001
long => April 20, 1973
long => February 24, 2015
long => April 7, 2004
long => August 6, 1984
full => Monday, December 20, 2010
full => Friday, February 1, 1985
full => Sunday, August 20, 1989
d MMM YYY => 14 Jan 2003
d MMMM YYY => 13 February 2017
d MMMM YYY => 14 June 1984
d MMMM YYY => 23 May 1992
d MMMM YYY => 22 October 1999
d MMMM YYY => 15 October 1974
dd/MM/YYY => 16/05/1987
EE d, MMM YYY => Sat 26, Feb 1983
EEEE d, MMMM YYY => Wednesday 17, December 1980

random_date() will generate a random date using a random format picked from our list FORMATS defined before. It'll return a tuple with the human and machine readable date plus the date object:

In [5]:
def random_date():
    dt = fake.date_object()

    try:
        date = format_date(dt, format=random.choice(FORMATS), locale='en')
        human_readable = date.lower().replace(',', '')
        machine_readable = dt.isoformat()

    except AttributeError as e:
        return None, None, None

    return human_readable, machine_readable, dt

create_dataset(m) will generate our dataset, taking m as the number of samples to create. It returns the dataset as a list, two dictionaries mapping index to character (these are our vocabularies), human and machine, and the inverse mapping, inv_machine, chars to index:

In [6]:
def create_dataset(m):
    human_vocab = set()
    machine_vocab = set()
    dataset = []
    
    for i in tqdm(range(m)):
        h, m, _ = random_date()
        if h is not None:
            dataset.append((h, m))
            human_vocab.update(tuple(h))
            machine_vocab.update(tuple(m))
    
    # We also add two special chars, <unk> for unknown characters, and <pad> to add padding at the end
    human = dict(zip(sorted(human_vocab) + ['<unk>', '<pad>'], list(range(len(human_vocab) + 2))))
    inv_machine = dict(enumerate(sorted(machine_vocab)))
    machine = {v: k for k, v in inv_machine.items()}
 
    return dataset, human, machine, inv_machine

Let's generate a dataset with 30k samples. That's probably way too much, but it should do a good job:

In [7]:
m = 30000
dataset, human_vocab, machine_vocab, inv_machine_vocab = create_dataset(m)
100%|██████████| 30000/30000 [00:01<00:00, 16483.42it/s]

Inspecting the first 10 entries. Remember it contains a list of tuples => (human readable, machine readable):

In [8]:
dataset[:10]
Out[8]:
[('18/01/1976', '1976-01-18'),
 ('april 18 2000', '2000-04-18'),
 ('26 july 2006', '2006-07-26'),
 ('saturday december 10 1994', '1994-12-10'),
 ('15 january 1983', '1983-01-15'),
 ('feb 5 2002', '2002-02-05'),
 ('9 march 1992', '1992-03-09'),
 ('april 2 1986', '1986-04-02'),
 ('sat 18 may 2013', '2013-05-18'),
 ('june 26 1997', '1997-06-26')]

Let's have a look at our human readable vocabulary:

In [9]:
human_vocab
Out[9]:
{' ': 0,
 '/': 1,
 '0': 2,
 '1': 3,
 '2': 4,
 '3': 5,
 '4': 6,
 '5': 7,
 '6': 8,
 '7': 9,
 '8': 10,
 '9': 11,
 'a': 12,
 'b': 13,
 'c': 14,
 'd': 15,
 'e': 16,
 'f': 17,
 'g': 18,
 'h': 19,
 'i': 20,
 'j': 21,
 'l': 22,
 'm': 23,
 'n': 24,
 'o': 25,
 'p': 26,
 'r': 27,
 's': 28,
 't': 29,
 'u': 30,
 'v': 31,
 'w': 32,
 'y': 33,
 '<unk>': 34,
 '<pad>': 35}

Machine readable vocabulary:

In [10]:
machine_vocab
Out[10]:
{'-': 0,
 '0': 1,
 '1': 2,
 '2': 3,
 '3': 4,
 '4': 5,
 '5': 6,
 '6': 7,
 '7': 8,
 '8': 9,
 '9': 10}

... and its inverse dictionary:

In [11]:
inv_machine_vocab
Out[11]:
{0: '-',
 1: '0',
 2: '1',
 3: '2',
 4: '3',
 5: '4',
 6: '5',
 7: '6',
 8: '7',
 9: '8',
 10: '9'}

Step 2: Preprocessing

preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty) is gonna do some beautiful magic with our dataset. It takes the whole dataset, and both human and machine vocabularie, plus some max length arguments, and it'll spit out out training set and target labels, plus the one hot encoding of both:

In [12]:
def preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty):
    X, Y = zip(*dataset)
    
    X = np.array([string_to_int(i, Tx, human_vocab) for i in X])
    Y = [string_to_int(t, Ty, machine_vocab) for t in Y]
    
    Xoh = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), X)))
    Yoh = np.array(list(map(lambda x: to_categorical(x, num_classes=len(machine_vocab)), Y)))

    return X, np.array(Y), Xoh, Yoh

string_to_int(string, length, vocab) will return a list of indexes based on a string and vocabulary, vocab, cropping or padding it depending on the max length passed in:

In [13]:
def string_to_int(string, length, vocab):
    string = string.lower()
    string = string.replace(',','')
    
    if len(string) > length:
        string = string[:length]
        
    rep = list(map(lambda x: vocab.get(x, '<unk>'), string))
    
    if len(string) < length:
        rep += [vocab['<pad>']] * (length - len(string))
    
    return rep

Let's have a look at an example. By the way, that's my birthday 😉:

In [14]:
string_to_int('September 10, 1978', 30, human_vocab)
Out[14]:
[28,
 16,
 26,
 29,
 16,
 23,
 13,
 16,
 27,
 0,
 3,
 2,
 0,
 3,
 11,
 9,
 10,
 35,
 35,
 35,
 35,
 35,
 35,
 35,
 35,
 35,
 35,
 35,
 35,
 35]

Let's run the preprocessing and print out some shapes:

In [15]:
Tx = 30
Ty = 10
X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)

print("X.shape:", X.shape)
print("Y.shape:", Y.shape)
print("Xoh.shape:", Xoh.shape)
print("Yoh.shape:", Yoh.shape)
X.shape: (30000, 30)
Y.shape: (30000, 10)
Xoh.shape: (30000, 30, 36)
Yoh.shape: (30000, 10, 11)

... and see what a training sample, target label and their respective one hot encoding look like:

In [16]:
index = 0
print("Source date:", dataset[index][0])
print("Target date:", dataset[index][1])
print()
print("Source after preprocessing (indices):", X[index])
print("Target after preprocessing (indices):", Y[index])
print()
print("Source after preprocessing (one-hot):", Xoh[index])
print("Target after preprocessing (one-hot):", Yoh[index])
Source date: 18/01/1976
Target date: 1976-01-18

Source after preprocessing (indices): [ 3 10  1  2  3  1  3 11  9  8 35 35 35 35 35 35 35 35 35 35 35 35 35 35
 35 35 35 35 35 35]
Target after preprocessing (indices): [ 2 10  8  7  0  1  2  0  2  9]

Source after preprocessing (one-hot): [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]]
Target after preprocessing (one-hot): [[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]

This is what we have now:

  • X: a processed version of the human readable dates in the training set, where each character is replaced by an index mapped to the character via human_vocab. Each date is further padded to Tx values with a special character <pad>. X.shape = (m, Tx)

  • Y: a processed version of the machine readable dates in the training set, where each character is replaced by the index it is mapped to in machine_vocab. You should have Y.shape = (m, Ty).

  • Xoh: one-hot version of X, the "1" entry's index is mapped to the character thanks to human_vocab. Xoh.shape = (m, Tx, len(human_vocab)).

  • Yoh: one-hot version of Y, the "1" entry's index is mapped to the character thanks to machine_vocab. Yoh.shape = (m, Tx, len(machine_vocab)). Here, len(machine_vocab) = 11 since there are 11 characters ('-' as well as 0-9).

Step 3: Define Model

Let's define some layers we need as global variables. RepeatVector(), Concatenate(), Dense(), Activation(), Dot()

In [17]:
repeator = RepeatVector(Tx)
concatenator = Concatenate(axis=-1)
densor1 = Dense(10, activation = "tanh")
densor2 = Dense(1, activation = "relu")
activator = Activation('softmax', name='attention_weights')
dotor = Dot(axes = 1)

one_step_attention(a, s_prev): At step $t$, given all the hidden states of the Bi-LSTM ($[a^{<1>},a^{<2>}, ..., a^{<T_x>}]$) and the previous hidden state of the second LSTM ($s^{<t-1>}$), one_step_attention() will compute the attention weights ($[\alpha^{<t,1>},\alpha^{<t,2>}, ..., \alpha^{<t,T_x>}]$) and output the context vector: $$context^{<t>} = \sum_{t' = 0}^{T_x} \alpha^{<t,t'>}a^{<t'>}\tag{1}$$

In [18]:
def one_step_attention(a, s_prev):
    s_prev = repeator(s_prev)
    concat = concatenator([a, s_prev])
    e = densor1(concat)
    energies = densor2(e)
    alphas = activator(energies)
    context = dotor([alphas, a])
    
    return context
In [19]:
n_a = 32
n_s = 64
post_activation_LSTM_cell = LSTM(n_s, return_state = True)
output_layer = Dense(len(machine_vocab), activation='softmax')

model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size): Implements the entire model. It first runs the input through a Bidirectional LSTM to get back $[a^{<1>},a^{<2>}, ..., a^{<T_x>}]$. Then, it calls one_step_attention() $T_y$ times (for loop). At each iteration of this loop, it gives the computed context vector $c^{<t>}$ to the second LSTM, and runs the output of the LSTM through a dense layer with softmax activation to generate a prediction $\hat{y}^{<t>}$.

In [20]:
def model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):
    X = Input(shape=(Tx, human_vocab_size))
    s0 = Input(shape=(n_s,), name='s0')
    c0 = Input(shape=(n_s,), name='c0')
    s = s0
    c = c0
    
    outputs = []
    
    a = Bidirectional(LSTM(n_a, return_sequences = True))(X)
    
    for t in range(Ty):
        context = one_step_attention(a, s)
        s, _, c = post_activation_LSTM_cell(context, initial_state=[s, c])
        out = output_layer(s)
        outputs.append(out)
    
    model = Model([X, s0, c0], outputs)
    return model

Model instantiation and summary representation of the model:

In [21]:
mod = model(Tx, Ty, n_a, n_s, len(human_vocab), len(machine_vocab))
In [22]:
mod.summary()
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 30, 36)       0                                            
__________________________________________________________________________________________________
s0 (InputLayer)                 (None, 64)           0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 30, 64)       17664       input_1[0][0]                    
__________________________________________________________________________________________________
repeat_vector_1 (RepeatVector)  (None, 30, 64)       0           s0[0][0]                         
                                                                 lstm_1[0][0]                     
                                                                 lstm_1[1][0]                     
                                                                 lstm_1[2][0]                     
                                                                 lstm_1[3][0]                     
                                                                 lstm_1[4][0]                     
                                                                 lstm_1[5][0]                     
                                                                 lstm_1[6][0]                     
                                                                 lstm_1[7][0]                     
                                                                 lstm_1[8][0]                     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 30, 128)      0           bidirectional_1[0][0]            
                                                                 repeat_vector_1[0][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[1][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[2][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[3][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[4][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[5][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[6][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[7][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[8][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[9][0]            
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 30, 10)       1290        concatenate_1[0][0]              
                                                                 concatenate_1[1][0]              
                                                                 concatenate_1[2][0]              
                                                                 concatenate_1[3][0]              
                                                                 concatenate_1[4][0]              
                                                                 concatenate_1[5][0]              
                                                                 concatenate_1[6][0]              
                                                                 concatenate_1[7][0]              
                                                                 concatenate_1[8][0]              
                                                                 concatenate_1[9][0]              
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 30, 1)        11          dense_1[0][0]                    
                                                                 dense_1[1][0]                    
                                                                 dense_1[2][0]                    
                                                                 dense_1[3][0]                    
                                                                 dense_1[4][0]                    
                                                                 dense_1[5][0]                    
                                                                 dense_1[6][0]                    
                                                                 dense_1[7][0]                    
                                                                 dense_1[8][0]                    
                                                                 dense_1[9][0]                    
__________________________________________________________________________________________________
attention_weights (Activation)  (None, 30, 1)        0           dense_2[0][0]                    
                                                                 dense_2[1][0]                    
                                                                 dense_2[2][0]                    
                                                                 dense_2[3][0]                    
                                                                 dense_2[4][0]                    
                                                                 dense_2[5][0]                    
                                                                 dense_2[6][0]                    
                                                                 dense_2[7][0]                    
                                                                 dense_2[8][0]                    
                                                                 dense_2[9][0]                    
__________________________________________________________________________________________________
dot_1 (Dot)                     (None, 1, 64)        0           attention_weights[0][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[1][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[2][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[3][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[4][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[5][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[6][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[7][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[8][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[9][0]          
                                                                 bidirectional_1[0][0]            
__________________________________________________________________________________________________
c0 (InputLayer)                 (None, 64)           0                                            
__________________________________________________________________________________________________
lstm_1 (LSTM)                   [(None, 64), (None,  33024       dot_1[0][0]                      
                                                                 s0[0][0]                         
                                                                 c0[0][0]                         
                                                                 dot_1[1][0]                      
                                                                 lstm_1[0][0]                     
                                                                 lstm_1[0][2]                     
                                                                 dot_1[2][0]                      
                                                                 lstm_1[1][0]                     
                                                                 lstm_1[1][2]                     
                                                                 dot_1[3][0]                      
                                                                 lstm_1[2][0]                     
                                                                 lstm_1[2][2]                     
                                                                 dot_1[4][0]                      
                                                                 lstm_1[3][0]                     
                                                                 lstm_1[3][2]                     
                                                                 dot_1[5][0]                      
                                                                 lstm_1[4][0]                     
                                                                 lstm_1[4][2]                     
                                                                 dot_1[6][0]                      
                                                                 lstm_1[5][0]                     
                                                                 lstm_1[5][2]                     
                                                                 dot_1[7][0]                      
                                                                 lstm_1[6][0]                     
                                                                 lstm_1[6][2]                     
                                                                 dot_1[8][0]                      
                                                                 lstm_1[7][0]                     
                                                                 lstm_1[7][2]                     
                                                                 dot_1[9][0]                      
                                                                 lstm_1[8][0]                     
                                                                 lstm_1[8][2]                     
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 11)           715         lstm_1[0][0]                     
                                                                 lstm_1[1][0]                     
                                                                 lstm_1[2][0]                     
                                                                 lstm_1[3][0]                     
                                                                 lstm_1[4][0]                     
                                                                 lstm_1[5][0]                     
                                                                 lstm_1[6][0]                     
                                                                 lstm_1[7][0]                     
                                                                 lstm_1[8][0]                     
                                                                 lstm_1[9][0]                     
==================================================================================================
Total params: 52,704
Trainable params: 52,704
Non-trainable params: 0
__________________________________________________________________________________________________

Step 4: Train Model

Using Adam optimizer we proceed to compile and train our model:

In [23]:
opt = Adam(lr=0.005, beta_1=0.9, beta_2=0.999, decay=0.01)
mod.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
In [24]:
s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))
outputs = list(Yoh.swapaxes(0,1))
In [25]:
mod.fit([Xoh, s0, c0], outputs, epochs=30, batch_size=100)
Epoch 1/30
30000/30000 [==============================] - 49s 2ms/step - loss: 10.1373 - dense_3_loss: 1.9073 - dense_3_acc: 0.9395 - dense_3_acc_1: 0.9364 - dense_3_acc_2: 0.7474 - dense_3_acc_3: 0.3670 - dense_3_acc_4: 0.9074 - dense_3_acc_5: 0.6241 - dense_3_acc_6: 0.3347 - dense_3_acc_7: 0.8293 - dense_3_acc_8: 0.4759 - dense_3_acc_9: 0.2968
Epoch 2/30
30000/30000 [==============================] - 41s 1ms/step - loss: 3.4802 - dense_3_loss: 1.0036 - dense_3_acc: 0.9917 - dense_3_acc_1: 0.9926 - dense_3_acc_2: 0.8928 - dense_3_acc_3: 0.8248 - dense_3_acc_4: 0.9997 - dense_3_acc_5: 0.9765 - dense_3_acc_6: 0.7830 - dense_3_acc_7: 0.9998 - dense_3_acc_8: 0.8058 - dense_3_acc_9: 0.6294
Epoch 3/30
30000/30000 [==============================] - 41s 1ms/step - loss: 2.1113 - dense_3_loss: 0.6285 - dense_3_acc: 0.9934 - dense_3_acc_1: 0.9939 - dense_3_acc_2: 0.9040 - dense_3_acc_3: 0.9395 - dense_3_acc_4: 0.9999 - dense_3_acc_5: 0.9847 - dense_3_acc_6: 0.8909 - dense_3_acc_7: 0.9999 - dense_3_acc_8: 0.8668 - dense_3_acc_9: 0.7793
Epoch 4/30
30000/30000 [==============================] - 41s 1ms/step - loss: 1.5706 - dense_3_loss: 0.4694 - dense_3_acc: 0.9940 - dense_3_acc_1: 0.9945 - dense_3_acc_2: 0.9169 - dense_3_acc_3: 0.9618 - dense_3_acc_4: 0.9999 - dense_3_acc_5: 0.9853 - dense_3_acc_6: 0.9273 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9044 - dense_3_acc_9: 0.8492
Epoch 5/30
30000/30000 [==============================] - 41s 1ms/step - loss: 1.2789 - dense_3_loss: 0.3832 - dense_3_acc: 0.9942 - dense_3_acc_1: 0.9948 - dense_3_acc_2: 0.9336 - dense_3_acc_3: 0.9727 - dense_3_acc_4: 0.9999 - dense_3_acc_5: 0.9865 - dense_3_acc_6: 0.9426 - dense_3_acc_7: 0.9999 - dense_3_acc_8: 0.9254 - dense_3_acc_9: 0.8830
Epoch 6/30
30000/30000 [==============================] - 41s 1ms/step - loss: 1.0847 - dense_3_loss: 0.3299 - dense_3_acc: 0.9945 - dense_3_acc_1: 0.9950 - dense_3_acc_2: 0.9508 - dense_3_acc_3: 0.9811 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9879 - dense_3_acc_6: 0.9509 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9362 - dense_3_acc_9: 0.8990
Epoch 7/30
30000/30000 [==============================] - 41s 1ms/step - loss: 0.9479 - dense_3_loss: 0.2946 - dense_3_acc: 0.9947 - dense_3_acc_1: 0.9949 - dense_3_acc_2: 0.9643 - dense_3_acc_3: 0.9874 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9883 - dense_3_acc_6: 0.9553 - dense_3_acc_7: 0.9999 - dense_3_acc_8: 0.9426 - dense_3_acc_9: 0.9066
Epoch 8/30
30000/30000 [==============================] - 41s 1ms/step - loss: 0.8451 - dense_3_loss: 0.2687 - dense_3_acc: 0.9951 - dense_3_acc_1: 0.9957 - dense_3_acc_2: 0.9726 - dense_3_acc_3: 0.9907 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9888 - dense_3_acc_6: 0.9590 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9452 - dense_3_acc_9: 0.9117
Epoch 9/30
30000/30000 [==============================] - 41s 1ms/step - loss: 0.7679 - dense_3_loss: 0.2483 - dense_3_acc: 0.9952 - dense_3_acc_1: 0.9957 - dense_3_acc_2: 0.9779 - dense_3_acc_3: 0.9936 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9894 - dense_3_acc_6: 0.9613 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9496 - dense_3_acc_9: 0.9162
Epoch 10/30
30000/30000 [==============================] - 41s 1ms/step - loss: 0.7081 - dense_3_loss: 0.2333 - dense_3_acc: 0.9954 - dense_3_acc_1: 0.9957 - dense_3_acc_2: 0.9836 - dense_3_acc_3: 0.9959 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9894 - dense_3_acc_6: 0.9624 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9527 - dense_3_acc_9: 0.9202
Epoch 11/30
30000/30000 [==============================] - 41s 1ms/step - loss: 0.6579 - dense_3_loss: 0.2203 - dense_3_acc: 0.9956 - dense_3_acc_1: 0.9959 - dense_3_acc_2: 0.9873 - dense_3_acc_3: 0.9970 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9895 - dense_3_acc_6: 0.9641 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9556 - dense_3_acc_9: 0.9229
Epoch 12/30
30000/30000 [==============================] - 41s 1ms/step - loss: 0.6144 - dense_3_loss: 0.2089 - dense_3_acc: 0.9957 - dense_3_acc_1: 0.9961 - dense_3_acc_2: 0.9908 - dense_3_acc_3: 0.9974 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9900 - dense_3_acc_6: 0.9642 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9596 - dense_3_acc_9: 0.9266
Epoch 13/30
30000/30000 [==============================] - 41s 1ms/step - loss: 0.5796 - dense_3_loss: 0.1994 - dense_3_acc: 0.9959 - dense_3_acc_1: 0.9962 - dense_3_acc_2: 0.9926 - dense_3_acc_3: 0.9978 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9904 - dense_3_acc_6: 0.9660 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9615 - dense_3_acc_9: 0.9281
Epoch 14/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.5493 - dense_3_loss: 0.1903 - dense_3_acc: 0.9960 - dense_3_acc_1: 0.9959 - dense_3_acc_2: 0.9944 - dense_3_acc_3: 0.9979 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9906 - dense_3_acc_6: 0.9667 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9636 - dense_3_acc_9: 0.9317
Epoch 15/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.5219 - dense_3_loss: 0.1816 - dense_3_acc: 0.9960 - dense_3_acc_1: 0.9965 - dense_3_acc_2: 0.9949 - dense_3_acc_3: 0.9985 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9902 - dense_3_acc_6: 0.9666 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9671 - dense_3_acc_9: 0.9370
Epoch 16/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.4982 - dense_3_loss: 0.1747 - dense_3_acc: 0.9961 - dense_3_acc_1: 0.9965 - dense_3_acc_2: 0.9950 - dense_3_acc_3: 0.9982 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9908 - dense_3_acc_6: 0.9683 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9691 - dense_3_acc_9: 0.9393
Epoch 17/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.4761 - dense_3_loss: 0.1677 - dense_3_acc: 0.9963 - dense_3_acc_1: 0.9965 - dense_3_acc_2: 0.9957 - dense_3_acc_3: 0.9987 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9908 - dense_3_acc_6: 0.9681 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9705 - dense_3_acc_9: 0.9426
Epoch 18/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.4580 - dense_3_loss: 0.1610 - dense_3_acc: 0.9965 - dense_3_acc_1: 0.9965 - dense_3_acc_2: 0.9960 - dense_3_acc_3: 0.9987 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9912 - dense_3_acc_6: 0.9694 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9731 - dense_3_acc_9: 0.9465
Epoch 19/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.4399 - dense_3_loss: 0.1544 - dense_3_acc: 0.9964 - dense_3_acc_1: 0.9968 - dense_3_acc_2: 0.9959 - dense_3_acc_3: 0.9988 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9910 - dense_3_acc_6: 0.9695 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9748 - dense_3_acc_9: 0.9498
Epoch 20/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.4231 - dense_3_loss: 0.1482 - dense_3_acc: 0.9967 - dense_3_acc_1: 0.9969 - dense_3_acc_2: 0.9965 - dense_3_acc_3: 0.9990 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9915 - dense_3_acc_6: 0.9696 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9769 - dense_3_acc_9: 0.9533
Epoch 21/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.4074 - dense_3_loss: 0.1418 - dense_3_acc: 0.9967 - dense_3_acc_1: 0.9970 - dense_3_acc_2: 0.9966 - dense_3_acc_3: 0.9991 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9916 - dense_3_acc_6: 0.9699 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9779 - dense_3_acc_9: 0.9563
Epoch 22/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3944 - dense_3_loss: 0.1372 - dense_3_acc: 0.9968 - dense_3_acc_1: 0.9971 - dense_3_acc_2: 0.9968 - dense_3_acc_3: 0.9990 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9920 - dense_3_acc_6: 0.9711 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9799 - dense_3_acc_9: 0.9583
Epoch 23/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3793 - dense_3_loss: 0.1308 - dense_3_acc: 0.9968 - dense_3_acc_1: 0.9971 - dense_3_acc_2: 0.9967 - dense_3_acc_3: 0.9992 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9916 - dense_3_acc_6: 0.9711 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9804 - dense_3_acc_9: 0.9610
Epoch 24/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3679 - dense_3_loss: 0.1265 - dense_3_acc: 0.9969 - dense_3_acc_1: 0.9972 - dense_3_acc_2: 0.9969 - dense_3_acc_3: 0.9990 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9919 - dense_3_acc_6: 0.9717 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9814 - dense_3_acc_9: 0.9631
Epoch 25/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3560 - dense_3_loss: 0.1218 - dense_3_acc: 0.9969 - dense_3_acc_1: 0.9974 - dense_3_acc_2: 0.9972 - dense_3_acc_3: 0.9991 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9922 - dense_3_acc_6: 0.9719 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9822 - dense_3_acc_9: 0.9648
Epoch 26/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3445 - dense_3_loss: 0.1171 - dense_3_acc: 0.9971 - dense_3_acc_1: 0.9975 - dense_3_acc_2: 0.9972 - dense_3_acc_3: 0.9992 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9920 - dense_3_acc_6: 0.9725 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9827 - dense_3_acc_9: 0.9670
Epoch 27/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3343 - dense_3_loss: 0.1133 - dense_3_acc: 0.9971 - dense_3_acc_1: 0.9975 - dense_3_acc_2: 0.9972 - dense_3_acc_3: 0.9993 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9924 - dense_3_acc_6: 0.9726 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9836 - dense_3_acc_9: 0.9684
Epoch 28/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3250 - dense_3_loss: 0.1099 - dense_3_acc: 0.9973 - dense_3_acc_1: 0.9977 - dense_3_acc_2: 0.9973 - dense_3_acc_3: 0.9993 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9925 - dense_3_acc_6: 0.9731 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9836 - dense_3_acc_9: 0.9691
Epoch 29/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3166 - dense_3_loss: 0.1059 - dense_3_acc: 0.9972 - dense_3_acc_1: 0.9977 - dense_3_acc_2: 0.9973 - dense_3_acc_3: 0.9993 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9927 - dense_3_acc_6: 0.9732 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9845 - dense_3_acc_9: 0.9700
Epoch 30/30
30000/30000 [==============================] - 40s 1ms/step - loss: 0.3078 - dense_3_loss: 0.1028 - dense_3_acc: 0.9974 - dense_3_acc_1: 0.9979 - dense_3_acc_2: 0.9977 - dense_3_acc_3: 0.9993 - dense_3_acc_4: 1.0000 - dense_3_acc_5: 0.9930 - dense_3_acc_6: 0.9737 - dense_3_acc_7: 1.0000 - dense_3_acc_8: 0.9846 - dense_3_acc_9: 0.9713
Out[25]:
<keras.callbacks.History at 0x7ff76ead6048>

Step 5: Testing Model (optional)

We could perform more serious testing and evaluation of the model here, but since we didn't do a proper train/test split we'll just make some predictions to see whether it gets them right:

In [26]:
EXAMPLES = ['3 May 1979', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018', 'March 3 2001', 'March 3rd 2001', '1 March 2001']
for example in EXAMPLES:
    
    source = string_to_int(example, Tx, human_vocab)
    source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source)))
    source = source.reshape((1, ) + source.shape)
    prediction = mod.predict([source, s0, c0])
    prediction = np.argmax(prediction, axis = -1)
    output = [inv_machine_vocab[int(i)] for i in prediction]
    
    print("source:", example)
    print("output:", ''.join(output))
source: 3 May 1979
output: 1979-05-03
source: 5 April 09
output: 2009-04-05
source: 21th of August 2016
output: 2016-08-21
source: Tue 10 Jul 2007
output: 2007-07-10
source: Saturday May 9 2018
output: 2018-05-09
source: March 3 2001
output: 2001-03-03
source: March 3rd 2001
output: 2001-03-03
source: 1 March 2001
output: 2001-03-01

Step 6: Save and convert Model

Finally we save the model to be converted and used by our Frontend component using TensorFlow.js. Conversion will be done outside this notebook.

In [27]:
mod.save('dates_model.h5')
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 's0:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'c0:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1_1/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1_1/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1_2/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1_2/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1_3/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1_3/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1_4/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1_4/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1_5/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1_5/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1_6/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1_6/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1_7/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1_7/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
/usr/local/lib/python3.6/site-packages/keras/engine/network.py:872: UserWarning: Layer lstm_1 was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'lstm_1_8/TensorArrayReadV3:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'lstm_1_8/while/Exit_4:0' shape=(?, 64) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  '. They will not be included '
In [28]:
!tensorflowjs_converter --input_format keras dates_model.h5 tfjsmodel
Using TensorFlow backend.