A comprehensive Keras tutorials with examples
Keras is an open-source neural network library written in Python, serving as a high-level interface for building and training deep learning models. It provides a user-friendly API that allows developers to quickly prototype and deploy neural networks with minimal boilerplate code. Keras supports various backend engines, including TensorFlow, Microsoft Cognitive Toolkit (CNTK), and Theano, enabling seamless integration with existing deep learning frameworks. With its modular architecture, Keras facilitates rapid experimentation and model iteration.
The Sequential API and Functional API are two different approaches for building and configuring neural networks in Keras.
The Sequential API is simpler and more straightforward, primarily used for building sequential models where each layer has exactly one input tensor and one output tensor. This API is ideal for basic feedforward neural networks where data flows sequentially from one layer to the next.
The Functional API offers more flexibility and supports more complex model architectures, including multi-input and multi-output networks, skip connections, and shared layers. It allows users to define a directed acyclic graph (DAG) of layers by explicitly connecting the output of one layer to the input of another. This API is preferred for building models with branching or merging layers, enabling the creation of sophisticated architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and models with attention mechanisms.
Let’s build some deep learning models using the sequential API.
A deep neural network (DNN) model using the Sequential API in Keras to classify handwritten digits from the MNIST dataset.
# Step 1: Import necessary libraries
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.datasets import mnist
from keras.utils import to_categorical
# Step 2: Load and preprocess the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)
# Step 3: Build the model using Sequential API
model = Sequential([
Flatten(input_shape=(28, 28, 1)),
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
# Alternatively
model = Sequential()
model.add(Flatten(input_shape=(28, 28, 1)))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
# View your model
model.summary()
# Step 4: Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Step 5: Train the model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.1)
# Step 6: Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f'Test loss: {test_loss:.4f}')
print(f'Test accuracy: {test_accuracy:.4f}')
# Step 7: Predict
predictions = model.predict(x_test[:5]) # Predict for the first 5 test samples
predicted_classes = np.argmax(predictions, axis=1)
print("Predictions:")
print(predicted_classes)
Reason for Normalization
Normalizing neural networks inputs improve our model. But deeper layers are trained based on previous layer outputs and since weights get updated via gradient descent, consecutive layers no longer benefit from normalization and they need to adapt to previous layers’ weight changes, finding more trouble to learn their own weights. Batch normalization makes sure that, independently of the changes, the inputs to the next layers are normalized. It does this in a smart way, with trainable parameters that also learn how much of this normalization is kept scaling or shifting it.
This improves gradient flow, allows for higher learning rates, reduces weight initializations dependence, adds regularization to our network and limits internal covariate shift; which is a funny name for a layer’s dependence on the previous layer outputs when learning its weights. Batch normalization is widely used today in many deep learning models.
# Step 1: Import necessary libraries
import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, BatchNormalization
from keras.datasets import cifar10
from keras.utils import to_categorical
# Step 2: Load and preprocess the dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)
# Step 3: Build the CNN model with BatchNormalization
model = Sequential([
Conv2D(32, (3, 3), padding='same', input_shape=(32, 32, 3)),
BatchNormalization(),
Activation('relu'),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), padding='same'),
BatchNormalization(),
Activation('relu'),
MaxPooling2D((2, 2)),
Conv2D(128, (3, 3), padding='same'),
BatchNormalization(),
Activation('relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
# Step 4: Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Step 5: Train the model
model.fit(x_train, y_train,
epochs=10,
batch_size=64,
validation_split=0.1)
# Step 6: Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f'Test loss: {test_loss:.4f}')
print(f'Test accuracy: {test_accuracy:.4f}')
# Step 1: Import necessary libraries
import numpy as np
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import LSTM, Embedding, Dense
from keras.preprocessing import sequence
# Step 2: Load and preprocess the dataset
max_features = 20000 # Number of words to consider as features
max_len = 100 # Maximum length of reviews
batch_size = 32
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
# Step 3: Build the LSTM model
model = Sequential([
Embedding(max_features, 128),
LSTM(128, dropout=0.2, recurrent_dropout=0.2),
Dense(1, activation='sigmoid')
])
# Step 4: Compile the model
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Step 5: Train the model
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=5,
validation_data=(x_test, y_test))
# Step 6: Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test, batch_size=batch_size)
print(f'Test loss: {test_loss:.4f}')
print(f'Test accuracy: {test_accuracy:.4f}')
from keras.layers import Input, Dense
from keras.models import Model
# Define the dimensions of the input data
input_dim = 784 # Assuming MNIST images of size 28x28 pixels
# Define the encoder architecture
encoder = Sequential([
Dense(256, activation='relu', input_shape=(input_dim,)),
Dense(128, activation='relu'),
Dense(64, activation='relu')
])
# Define the decoder architecture
decoder = Sequential([
Dense(128, activation='relu', input_shape=(64,)),
Dense(256, activation='relu'),
Dense(input_dim, activation='sigmoid')
])
# Combine the encoder and decoder into an autoencoder model
input_img = Input(shape=(input_dim,))
encoded = encoder(input_img)
decoded = decoder(encoded)
autoencoder = Model(input_img, decoded)
# Compile the autoencoder model
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
The Keras Functional API allows you to create complex models with multiple inputs, multiple outputs, shared layers, and more. Unlike the Sequential API, where you add layers sequentially, in the Functional API, you explicitly connect layers to each other, which provides more flexibility in model architecture.
Input layers: Input layers are used to define the input shape of your data. In the Functional API, you explicitly define input layers.
Functional Connections: In the Functional API, you connect layers by passing the output of one layer as the input to another layer.
Model Creation: Once you’ve defined the connections between layers, you create a Model
by specifying the inputs and outputs.
The Keras Functional API is well-suited for a variety of use cases where more flexibility and customization are required in defining neural network architectures. Some common use cases of the Keras Functional API include:
Let’s build some deep learning models using the functional API.
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
# Define input layer
inputs = Input(shape=(784,))
# Define first hidden layer
x = Dense(128, activation='relu')(inputs)
# Define second hidden layer
x = Dense(64, activation='relu')(x)
# Define output layer
outputs = Dense(10, activation='softmax')(x)
# Create model
model = Model(inputs=inputs, outputs=outputs)
# Compile model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Print model summary
model.summary()
import tensorflow as tf
from tensorflow.keras.layers import Layer, MultiHeadAttention, Dense, Dropout, LayerNormalization, Input
from tensorflow.keras.models import Model
class MultiHeadSelfAttention(Layer):
def __init__(self, embed_dim, num_heads):
super(MultiHeadSelfAttention, self).__init__()
self.num_heads = num_heads
self.embed_dim = embed_dim
assert embed_dim % self.num_heads == 0
self.projection_dim = embed_dim // num_heads
self.query_dense = Dense(embed_dim)
self.key_dense = Dense(embed_dim)
self.value_dense = Dense(embed_dim)
self.combine_heads = Dense(embed_dim)
def attention(self, query, key, value):
score = tf.matmul(query, key, transpose_b=True)
dim_key = tf.cast(tf.shape(key)[-1], tf.float32)
scaled_score = score / tf.math.sqrt(dim_key)
weights = tf.nn.softmax(scaled_score, axis=-1)
output = tf.matmul(weights, value)
return output, weights
def separate_heads(self, x, batch_size):
x = tf.reshape(x, (batch_size, -1, self.num_heads, self.projection_dim))
return tf.transpose(x, perm=[0, 2, 1, 3])
def call(self, inputs):
# x.shape = [batch_size, seq_len, embed_dim]
batch_size = tf.shape(inputs)[0]
query = self.query_dense(inputs) # (batch_size, seq_len, embed_dim)
key = self.key_dense(inputs) # (batch_size, seq_len, embed_dim)
value = self.value_dense(inputs) # (batch_size, seq_len, embed_dim)
query = self.separate_heads(query, batch_size) # (batch_size, num_heads, seq_len, projection_dim)
key = self.separate_heads(key, batch_size) # (batch_size, num_heads, seq_len, projection_dim)
value = self.separate_heads(value, batch_size) # (batch_size, num_heads, seq_len, projection_dim)
attention, weights = self.attention(query, key, value)
attention = tf.transpose(attention, perm=[0, 2, 1, 3]) # (batch_size, seq_len, num_heads, projection_dim)
concat_attention = tf.reshape(attention, (batch_size, -1, self.embed_dim)) # (batch_size, seq_len, embed_dim)
output = self.combine_heads(concat_attention) # (batch_size, seq_len, embed_dim)
return output
class TransformerBlock(Layer):
def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
super(TransformerBlock, self).__init__()
self.att = MultiHeadSelfAttention(embed_dim, num_heads)
self.ffn = tf.keras.Sequential([
Dense(ff_dim, activation='relu'),
Dense(embed_dim)
])
self.layernorm1 = LayerNormalization(epsilon=1e-6)
self.layernorm2 = LayerNormalization(epsilon=1e-6)
self.dropout1 = Dropout(rate)
self.dropout2 = Dropout(rate)
def call(self, inputs, training):
attn_output = self.att(inputs)
attn_output = self.dropout1(attn_output, training=training)
out1 = self.layernorm1(inputs + attn_output)
ffn_output = self.ffn(out1)
ffn_output = self.dropout2(ffn_output, training=training)
return self.layernorm2(out1 + ffn_output)
def build_transformer(num_layers, embed_dim, num_heads, ff_dim, input_sequence_length, output_sequence_length, rate=0.1):
inputs = Input(shape=(input_sequence_length, embed_dim))
x = inputs
for _ in range(num_layers):
x = TransformerBlock(embed_dim, num_heads, ff_dim, rate)(x)
outputs = Dense(output_sequence_length, activation='softmax')(x)
return Model(inputs=inputs, outputs=outputs)
# Example usage:
num_layers = 2
embed_dim = 128
num_heads = 8
ff_dim = 512
input_sequence_length = 100
output_sequence_length = 10
transformer = build_transformer(num_layers, embed_dim, num_heads, ff_dim, input_sequence_length, output_sequence_length)
transformer.summary()
Apart from building models, we have a lot of other necessary things to do during deep learning.
Callbacks in Keras are objects that can perform actions at various stages during training, such as at the start or end of an epoch, before or after a batch, or when a specific performance metric has reached a certain value.
They are useful for implementing functionalities:
ModelCheckpoint
to save the model with the best validation accuracyEarlyStopping
to stop training early if the validation loss does not improve after a certain number of epochsTensorBoard
to log training metrics for visualization in TensorBoardfrom keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard
# Define callbacks
checkpoint_callback = ModelCheckpoint(filepath='model_checkpoint.h5',
monitor='val_accuracy',
verbose=1,
save_best_only=True)
early_stopping_callback = EarlyStopping(monitor='val_loss',
patience=3,
verbose=1,
restore_best_weights=True)
tensorboard_callback = TensorBoard(log_dir='./logs',
histogram_freq=1,
write_graph=True,
write_images=True)
callbacks = [checkpoint_callback, early_stopping_callback, tensorboard_callback]
# Train the model with callbacks
model.fit(x_train, y_train,
epochs=20,
batch_size=32,
validation_split=0.1,
callbacks=callbacks)
In Keras, the fit
method returns a history
object containing the training history. This history object records the loss and metrics values at each epoch during the training process. It’s a valuable resource for analyzing the performance of your model over time and for visualizing the training progress.
After training your model using the fit
method, you can access the training history through the history
object. The history
object contains the following attributes:
history.history
: A dictionary containing the training and validation metrics recorded at each epoch. The keys of this dictionary are the names of the metrics, and the values are lists containing the corresponding metric values for each epoch.You can use these metrics to plot graphs and analyze the training progress and the performance of your model. For instance, you can plot the training and validation loss over epochs to check for overfitting or underfitting, or plot the accuracy to see how well your model is learning from the data.
# Step 1: Train the model
history = model.fit(x_train, y_train,
epochs=20,
batch_size=32,
validation_split=0.1)
print(history.history['accuracy']) # Training accuracy values
print(history.history['loss']) # Training loss values
print(history.history['val_accuracy']) # Validation accuracy values
print(history.history['val_loss']) # Validation loss values
import matplotlib.pyplot as plt
# Plot training & validation accuracy values
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
Performing hyperparameter tuning using Keras models as scikit-learn estimators allows us to leverage the powerful tools provided by scikit-learn, such as GridSearchCV and RandomizedSearchCV, for efficient hyperparameter search. Below is a step-by-step guide on how to achieve this:
KerasClassifier
wrapper provided by scikit-learn to turn our Keras model into a scikit-learn estimator. This wrapper allows us to use the Keras model with scikit-learn’s API.GridSearchCV
or RandomizedSearchCV
from scikit-learn to search for the best combination of hyperparameters based on a specified scoring metric (e.g., accuracy or loss).import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Step 1: Create a function to build Keras models
def create_model(optimizer='adam', activation='relu', hidden_units=64):
model = Sequential([
Dense(hidden_units, activation=activation, input_shape=(4,)),
Dense(3, activation='softmax')
])
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
# Step 2: Wrap the Keras model with KerasClassifier
model = KerasClassifier(build_fn=create_model)
# Step 3: Define hyperparameter grid
param_grid = {
'optimizer': ['adam', 'sgd'],
'activation': ['relu', 'tanh'],
'hidden_units': [32, 64, 128]
}
# Step 4: Perform Grid Search
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy')
grid_result = grid_search.fit(X_train, y_train)
# Step 5: Evaluate and select the best model
best_model = grid_result.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Best Parameters:", grid_result.best_params_)
print("Best Accuracy:", accuracy)
Using pretrained models in Keras for prediction tasks is a common practice, especially when dealing with tasks like image classification, object detection, or natural language processing. Keras provides a wide range of pretrained models through its applications
module, including popular architectures like VGG16, ResNet, Inception, and more.
keras.applications
.predict
method of the pretrained model to make predictions on your input data.# Import image and preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input
# Load the image with the right target size for your model
img = image.load_img(img_path, target_size=(224, 224))
# Turn it into an array
img_array = image.img_to_array(img)
# Expand the dimensions of the image, this is so that it fits the expected model input format
img_expanded = np.expand_dims(img_array, axis = 0)
# Pre-process the img in the same way original images were
img_ready = preprocess_input(img_expanded)
# Instantiate a ResNet50 model with 'imagenet' weights
model = ResNet50(weights='imagenet')
# Predict with ResNet50 on your already processed img
preds = model.predict(img_ready)
# Decode the first 3 predictions
print('Predicted:', decode_predictions(preds, top=3)[0])
You can use the Keras backend to create a function that computes the output of a specific layer in a Keras model given input data. This can be useful for various purposes, such as debugging, feature extraction, or implementing custom layers or models.
# Import tensorflow.keras backend
import tensorflow.keras.backend as K
# Input tensor from the 1st layer of the model
inp = model.layers[0].input
# Output tensor from the 1st layer of the model
out = model.layers[0].output
# Define a function from inputs to outputs
inp_to_out = K.function([inp], [out])
# Print the results of passing X_test through the 1st layer
print(inp_to_out([X_test]))
Now, you know how to use Keras Sequential and Functional APIs to build deep learning models such as DNN, CNN, LSTM and Transformer. Also, you know how to babysit your training process. Congratulations!