Deep Learning with Keras from 0 to 1

A comprehensive Keras tutorials with examples

Keras is an open-source neural network library written in Python, serving as a high-level interface for building and training deep learning models. It provides a user-friendly API that allows developers to quickly prototype and deploy neural networks with minimal boilerplate code. Keras supports various backend engines, including TensorFlow, Microsoft Cognitive Toolkit (CNTK), and Theano, enabling seamless integration with existing deep learning frameworks. With its modular architecture, Keras facilitates rapid experimentation and model iteration.

The Sequential API and Functional API are two different approaches for building and configuring neural networks in Keras.

The Sequential API is simpler and more straightforward, primarily used for building sequential models where each layer has exactly one input tensor and one output tensor. This API is ideal for basic feedforward neural networks where data flows sequentially from one layer to the next.

The Functional API offers more flexibility and supports more complex model architectures, including multi-input and multi-output networks, skip connections, and shared layers. It allows users to define a directed acyclic graph (DAG) of layers by explicitly connecting the output of one layer to the input of another. This API is preferred for building models with branching or merging layers, enabling the creation of sophisticated architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and models with attention mechanisms.

Sequential API

Build Models

Let’s build some deep learning models using the sequential API.

Deep Neural Network (DNN)

A deep neural network (DNN) model using the Sequential API in Keras to classify handwritten digits from the MNIST dataset.

# Step 1: Import necessary libraries
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.datasets import mnist
from keras.utils import to_categorical

# Step 2: Load and preprocess the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Step 3: Build the model using Sequential API
model = Sequential([
    Flatten(input_shape=(28, 28, 1)),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

# Alternatively
model = Sequential()
model.add(Flatten(input_shape=(28, 28, 1)))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

# View your model
model.summary()

# Step 4: Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Step 5: Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.1)

# Step 6: Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f'Test loss: {test_loss:.4f}')
print(f'Test accuracy: {test_accuracy:.4f}')

# Step 7: Predict
predictions = model.predict(x_test[:5])  # Predict for the first 5 test samples
predicted_classes = np.argmax(predictions, axis=1)
print("Predictions:")
print(predicted_classes)

Convolutional Neural Network (CNN)

Reason for Normalization

Normalizing neural networks inputs improve our model. But deeper layers are trained based on previous layer outputs and since weights get updated via gradient descent, consecutive layers no longer benefit from normalization and they need to adapt to previous layers’ weight changes, finding more trouble to learn their own weights. Batch normalization makes sure that, independently of the changes, the inputs to the next layers are normalized. It does this in a smart way, with trainable parameters that also learn how much of this normalization is kept scaling or shifting it.

This improves gradient flow, allows for higher learning rates, reduces weight initializations dependence, adds regularization to our network and limits internal covariate shift; which is a funny name for a layer’s dependence on the previous layer outputs when learning its weights. Batch normalization is widely used today in many deep learning models.

# Step 1: Import necessary libraries
import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, BatchNormalization
from keras.datasets import cifar10
from keras.utils import to_categorical

# Step 2: Load and preprocess the dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Step 3: Build the CNN model with BatchNormalization
model = Sequential([
    Conv2D(32, (3, 3), padding='same', input_shape=(32, 32, 3)),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), padding='same'),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D((2, 2)),
    Conv2D(128, (3, 3), padding='same'),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Step 4: Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Step 5: Train the model
model.fit(x_train, y_train, 
          epochs=10, 
          batch_size=64, 
          validation_split=0.1)

# Step 6: Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f'Test loss: {test_loss:.4f}')
print(f'Test accuracy: {test_accuracy:.4f}')

Long-Short Term Memory (LSTM)

# Step 1: Import necessary libraries
import numpy as np
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import LSTM, Embedding, Dense
from keras.preprocessing import sequence

# Step 2: Load and preprocess the dataset
max_features = 20000  # Number of words to consider as features
max_len = 100  # Maximum length of reviews
batch_size = 32

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)

# Step 3: Build the LSTM model
model = Sequential([
    Embedding(max_features, 128),
    LSTM(128, dropout=0.2, recurrent_dropout=0.2),
    Dense(1, activation='sigmoid')
])

# Step 4: Compile the model
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

# Step 5: Train the model
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=5,
          validation_data=(x_test, y_test))

# Step 6: Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test, batch_size=batch_size)
print(f'Test loss: {test_loss:.4f}')
print(f'Test accuracy: {test_accuracy:.4f}')

Autoencoder (AE)

from keras.layers import Input, Dense
from keras.models import Model

# Define the dimensions of the input data
input_dim = 784  # Assuming MNIST images of size 28x28 pixels

# Define the encoder architecture
encoder = Sequential([
    Dense(256, activation='relu', input_shape=(input_dim,)),
    Dense(128, activation='relu'),
    Dense(64, activation='relu')
])

# Define the decoder architecture
decoder = Sequential([
    Dense(128, activation='relu', input_shape=(64,)),
    Dense(256, activation='relu'),
    Dense(input_dim, activation='sigmoid')
])

# Combine the encoder and decoder into an autoencoder model
input_img = Input(shape=(input_dim,))
encoded = encoder(input_img)
decoded = decoder(encoded)
autoencoder = Model(input_img, decoded)

# Compile the autoencoder model
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

Functional API

The Keras Functional API allows you to create complex models with multiple inputs, multiple outputs, shared layers, and more. Unlike the Sequential API, where you add layers sequentially, in the Functional API, you explicitly connect layers to each other, which provides more flexibility in model architecture.

Key Concepts

Input layers: Input layers are used to define the input shape of your data. In the Functional API, you explicitly define input layers.

Functional Connections: In the Functional API, you connect layers by passing the output of one layer as the input to another layer.

Model Creation: Once you’ve defined the connections between layers, you create a Model by specifying the inputs and outputs.

Use Cases

The Keras Functional API is well-suited for a variety of use cases where more flexibility and customization are required in defining neural network architectures. Some common use cases of the Keras Functional API include:

Multi-input and Multi-output Models: Functional API enables building models with multiple inputs and outputs, which is useful in scenarios like multi-task learning, where a single model can perform multiple related tasks simultaneously. For example, a model for image captioning could have an image input and a text input, and it could output a caption.
Models with Shared Layers: Functional API allows layers to be reused and shared between different parts of the model. This is beneficial when you want to apply the same operation to multiple inputs or when building Siamese networks for tasks like similarity learning.
Complex Network Architectures: When building complex architectures such as residual networks (ResNets), inception networks, or U-net architectures for segmentation tasks, the Functional API provides the flexibility to define intricate connections between layers.
Models with Branching: Functional API enables building models with branching, where the network can learn multiple features at different levels of abstraction. This is useful in architectures like Inception networks, where different convolutional filters operate on different scales of features.
Models with Skip Connections: Skip connections are commonly used in architectures like U-net and ResNet to facilitate better gradient flow and alleviate the vanishing gradient problem. Functional API allows easy implementation of skip connections by simply concatenating or adding the outputs of different layers.
Custom Model Architectures: Functional API is suitable for building entirely custom architectures tailored to specific tasks. This includes models for sequence-to-sequence tasks, graph neural networks, attention mechanisms, and more.
Model Subclassing: While the Sequential API does not support dynamic architectures, the Functional API can be combined with model subclassing to create highly customizable models with dynamic behaviors and control flow.
Transfer Learning and Fine-tuning: Functional API provides the flexibility to modify pre-trained models by adding or removing layers, freezing certain layers, or replacing specific layers with custom layers.

Build Models

Let’s build some deep learning models using the functional API.

A simple neural networks

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# Define input layer
inputs = Input(shape=(784,))
# Define first hidden layer
x = Dense(128, activation='relu')(inputs)
# Define second hidden layer
x = Dense(64, activation='relu')(x)
# Define output layer
outputs = Dense(10, activation='softmax')(x)

# Create model
model = Model(inputs=inputs, outputs=outputs)

# Compile model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Print model summary
model.summary()

A complicated Transformer

import tensorflow as tf
from tensorflow.keras.layers import Layer, MultiHeadAttention, Dense, Dropout, LayerNormalization, Input
from tensorflow.keras.models import Model

class MultiHeadSelfAttention(Layer):
    def __init__(self, embed_dim, num_heads):
        super(MultiHeadSelfAttention, self).__init__()
        self.num_heads = num_heads
        self.embed_dim = embed_dim
        assert embed_dim % self.num_heads == 0
        self.projection_dim = embed_dim // num_heads
        self.query_dense = Dense(embed_dim)
        self.key_dense = Dense(embed_dim)
        self.value_dense = Dense(embed_dim)
        self.combine_heads = Dense(embed_dim)

    def attention(self, query, key, value):
        score = tf.matmul(query, key, transpose_b=True)
        dim_key = tf.cast(tf.shape(key)[-1], tf.float32)
        scaled_score = score / tf.math.sqrt(dim_key)
        weights = tf.nn.softmax(scaled_score, axis=-1)
        output = tf.matmul(weights, value)
        return output, weights

    def separate_heads(self, x, batch_size):
        x = tf.reshape(x, (batch_size, -1, self.num_heads, self.projection_dim))
        return tf.transpose(x, perm=[0, 2, 1, 3])

    def call(self, inputs):
        # x.shape = [batch_size, seq_len, embed_dim]
        batch_size = tf.shape(inputs)[0]
        query = self.query_dense(inputs)  # (batch_size, seq_len, embed_dim)
        key = self.key_dense(inputs)  # (batch_size, seq_len, embed_dim)
        value = self.value_dense(inputs)  # (batch_size, seq_len, embed_dim)
        query = self.separate_heads(query, batch_size)  # (batch_size, num_heads, seq_len, projection_dim)
        key = self.separate_heads(key, batch_size)  # (batch_size, num_heads, seq_len, projection_dim)
        value = self.separate_heads(value, batch_size)  # (batch_size, num_heads, seq_len, projection_dim)
        attention, weights = self.attention(query, key, value)
        attention = tf.transpose(attention, perm=[0, 2, 1, 3])  # (batch_size, seq_len, num_heads, projection_dim)
        concat_attention = tf.reshape(attention, (batch_size, -1, self.embed_dim))  # (batch_size, seq_len, embed_dim)
        output = self.combine_heads(concat_attention)  # (batch_size, seq_len, embed_dim)
        return output

class TransformerBlock(Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super(TransformerBlock, self).__init__()
        self.att = MultiHeadSelfAttention(embed_dim, num_heads)
        self.ffn = tf.keras.Sequential([
            Dense(ff_dim, activation='relu'),
            Dense(embed_dim)
        ])
        self.layernorm1 = LayerNormalization(epsilon=1e-6)
        self.layernorm2 = LayerNormalization(epsilon=1e-6)
        self.dropout1 = Dropout(rate)
        self.dropout2 = Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)

def build_transformer(num_layers, embed_dim, num_heads, ff_dim, input_sequence_length, output_sequence_length, rate=0.1):
    inputs = Input(shape=(input_sequence_length, embed_dim))
    x = inputs
    for _ in range(num_layers):
        x = TransformerBlock(embed_dim, num_heads, ff_dim, rate)(x)
    outputs = Dense(output_sequence_length, activation='softmax')(x)
    return Model(inputs=inputs, outputs=outputs)

# Example usage:
num_layers = 2
embed_dim = 128
num_heads = 8
ff_dim = 512
input_sequence_length = 100
output_sequence_length = 10

transformer = build_transformer(num_layers, embed_dim, num_heads, ff_dim, input_sequence_length, output_sequence_length)
transformer.summary()

Other Functionalities

Apart from building models, we have a lot of other necessary things to do during deep learning.

Monitor Training - Callback

Callbacks in Keras are objects that can perform actions at various stages during training, such as at the start or end of an epoch, before or after a batch, or when a specific performance metric has reached a certain value.

They are useful for implementing functionalities:

ModelCheckpoint to save the model with the best validation accuracy
EarlyStopping to stop training early if the validation loss does not improve after a certain number of epochs
TensorBoard to log training metrics for visualization in TensorBoard

from keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard

# Define callbacks

checkpoint_callback = ModelCheckpoint(filepath='model_checkpoint.h5', 
                                      monitor='val_accuracy', 
                                      verbose=1, 
                                      save_best_only=True)

early_stopping_callback = EarlyStopping(monitor='val_loss', 
                                       patience=3, 
                                       verbose=1, 
                                       restore_best_weights=True)

tensorboard_callback = TensorBoard(log_dir='./logs', 
                                   histogram_freq=1, 
                                   write_graph=True, 
                                   write_images=True)

callbacks = [checkpoint_callback, early_stopping_callback, tensorboard_callback]

# Train the model with callbacks
model.fit(x_train, y_train, 
          epochs=20, 
          batch_size=32, 
          validation_split=0.1, 
          callbacks=callbacks)

Visualize Fitting - Training History

In Keras, the fit method returns a history object containing the training history. This history object records the loss and metrics values at each epoch during the training process. It’s a valuable resource for analyzing the performance of your model over time and for visualizing the training progress.

After training your model using the fit method, you can access the training history through the history object. The history object contains the following attributes:

history.history: A dictionary containing the training and validation metrics recorded at each epoch. The keys of this dictionary are the names of the metrics, and the values are lists containing the corresponding metric values for each epoch.

You can use these metrics to plot graphs and analyze the training progress and the performance of your model. For instance, you can plot the training and validation loss over epochs to check for overfitting or underfitting, or plot the accuracy to see how well your model is learning from the data.

# Step 1: Train the model
history = model.fit(x_train, y_train, 
                    epochs=20, 
                    batch_size=32, 
                    validation_split=0.1)

print(history.history['accuracy'])        # Training accuracy values
print(history.history['loss'])            # Training loss values
print(history.history['val_accuracy'])    # Validation accuracy values
print(history.history['val_loss'])        # Validation loss values

import matplotlib.pyplot as plt

# Plot training & validation accuracy values
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

Hyperparameter Tuning

Performing hyperparameter tuning using Keras models as scikit-learn estimators allows us to leverage the powerful tools provided by scikit-learn, such as GridSearchCV and RandomizedSearchCV, for efficient hyperparameter search. Below is a step-by-step guide on how to achieve this:

Create a Function to Build Keras Models: First, we define a function that constructs the Keras model. This function will accept hyperparameters as arguments and return a compiled Keras model.
Wrap the Keras Model with KerasClassifier: Next, we use the KerasClassifier wrapper provided by scikit-learn to turn our Keras model into a scikit-learn estimator. This wrapper allows us to use the Keras model with scikit-learn’s API.
Define Hyperparameter Grid: We define a dictionary specifying the hyperparameters we want to tune and their respective search spaces.
Perform Grid Search or Randomized Search: We use either GridSearchCV or RandomizedSearchCV from scikit-learn to search for the best combination of hyperparameters based on a specified scoring metric (e.g., accuracy or loss).
Evaluate and Select the Best Model: Finally, we evaluate the performance of the best model on the test set and select it for deployment.

import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Step 1: Create a function to build Keras models
def create_model(optimizer='adam', activation='relu', hidden_units=64):
    model = Sequential([
        Dense(hidden_units, activation=activation, input_shape=(4,)),
        Dense(3, activation='softmax')
    ])
    model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

# Step 2: Wrap the Keras model with KerasClassifier
model = KerasClassifier(build_fn=create_model)

# Step 3: Define hyperparameter grid
param_grid = {
    'optimizer': ['adam', 'sgd'],
    'activation': ['relu', 'tanh'],
    'hidden_units': [32, 64, 128]
}

# Step 4: Perform Grid Search
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy')
grid_result = grid_search.fit(X_train, y_train)

# Step 5: Evaluate and select the best model
best_model = grid_result.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Best Parameters:", grid_result.best_params_)
print("Best Accuracy:", accuracy)

Using Pre-trained Models

Using pretrained models in Keras for prediction tasks is a common practice, especially when dealing with tasks like image classification, object detection, or natural language processing. Keras provides a wide range of pretrained models through its applications module, including popular architectures like VGG16, ResNet, Inception, and more.

Load the Pretrained Model: First, load the pretrained model using the appropriate function from keras.applications.
Preprocess Input Data: Preprocess your input data according to the requirements of the pretrained model. For example, you may need to resize images to match the input size of the model or perform normalization.
Make Predictions: Use the predict method of the pretrained model to make predictions on your input data.
Interpret Predictions: Interpret the predictions based on the task at hand. For example, in image classification, you may need to map predicted class probabilities to class labels.
(Optional) Fine-tuning or Feature Extraction: If needed, you can fine-tune the pretrained model on your specific dataset or use it as a feature extractor by removing the top layers and adding new ones suited to your task.

# Import image and preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input

# Load the image with the right target size for your model
img = image.load_img(img_path, target_size=(224, 224))

# Turn it into an array
img_array = image.img_to_array(img)

# Expand the dimensions of the image, this is so that it fits the expected model input format
img_expanded = np.expand_dims(img_array, axis = 0)

# Pre-process the img in the same way original images were
img_ready = preprocess_input(img_expanded)

# Instantiate a ResNet50 model with 'imagenet' weights
model = ResNet50(weights='imagenet')

# Predict with ResNet50 on your already processed img
preds = model.predict(img_ready)

# Decode the first 3 predictions
print('Predicted:', decode_predictions(preds, top=3)[0])

Layer Inspection

You can use the Keras backend to create a function that computes the output of a specific layer in a Keras model given input data. This can be useful for various purposes, such as debugging, feature extraction, or implementing custom layers or models.

# Import tensorflow.keras backend
import tensorflow.keras.backend as K

# Input tensor from the 1st layer of the model
inp = model.layers[0].input

# Output tensor from the 1st layer of the model
out = model.layers[0].output

# Define a function from inputs to outputs
inp_to_out = K.function([inp], [out])

# Print the results of passing X_test through the 1st layer
print(inp_to_out([X_test]))

Now, you know how to use Keras Sequential and Functional APIs to build deep learning models such as DNN, CNN, LSTM and Transformer. Also, you know how to babysit your training process. Congratulations!