Data science

ChatGPT has a real rival now: Opus 4.8

What Is the Ultimate AI Library?

A web-based platform where you browse, search, and access 1,200+ curated AI tutorials—carefully selected by our editors so you don’t have to sift through the noise.

Platform Advantages

Permanent access to all content (vs LinkedIn’s 6-month limit)

Advanced search and filtering by topic, author, and use case

Organized categories instead of endless scrolling

Weekly updates with fresh content

Save searches and compare tutorials

Coming Soon for Members

Shareable “playlists” of tutorials

Author-specific pages to follow experts

Step-by-step guided walkthroughs

AI Central exclusive guides

Audio versions of popular tutorials

This library is unlike other AI courses because you won’t learn passively.

Start implementing AI tools DURING your first session

Follow our proven playbook to your first automation

Drive efficiency with tested frameworks

Leave with the roadmap to scale your AI proficiency

Before vs After the AI Library

Transform your professional reality from overwhelming confusion to confident AI mastery.

Overwhelmed by AI Noise

Cut through endless content to find what actually works for your business.

No Time to Experiment

Get proven solutions instead of wasting hours on trial and error.

Fragmented Information

Access organized, comprehensive knowledge in one trusted platform.

External Pressure to Perform

Stay competitive with practical AI skills that deliver real results.

Clear AI Confidence

Implement proven AI solutions with step-by-step clarity and immediate results.

Instant Implementation

Save 15+ hours per week with tested automations and productivity frameworks.

Professional Authority

Become the go-to AI expert in your organization with comprehensive knowledge.

Competitive Edge

Access organized, comprehensive AI strategies that drive measurable business results.

Why Professionals Choose AI Central

Clear guidance, immediate application, and measurable results.

Clear, Practical Knowledge

Skip the technical complexity. Get straightforward implementations you can use today.

Immediate Application

Every tutorial includes real-world examples from professionals like you.

Efficiency & Credibility

Build competitive advantage with proven frameworks that deliver measurable results.

Focused Solutions

Cut through information overload with curated, tested approaches.

Trusted & Reliable

Vetted by thousands of professionals who’ve achieved real business outcomes.

Professional Growth

Position yourself as the go-to AI resource in your organization.

There is no single “ultimate” AI library because the best choice depends entirely on your specific project goals. Different libraries excel at distinct tasks like deep learning, natural language processing, or production deployment. Why is Python frequently regarded as the top programming language for developing Artificial Intelligence? Based on my experience, the Best Python Libraries for AI Development offer strong, adaptable, and user-friendly tools that speed up the creation of AI models. Instead of coding from scratch, you may now concentrate on solving complicated challenges.

In this blog, I’ll explain some of the Best Python libraries for AI development and provide you with some real-life examples. To give you a practical grasp of how to utilize these libraries efficiently in real-world applications, you will also be able to view the scripts I used to construct different AI solutions. I will also give you some insights into the strengths of these Python libraries. 

An end-to-end platform for machine learning

Get started with TensorFlow

TensorFlow makes it easy to create ML models that can run in any environment. Learn how to use the intuitive APIs through interactive code samples.

import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
  loss='sparse_categorical_crossentropy',
  metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

Solve real-world problems with ML

Explore examples of how TensorFlow is used to advance research and build AI-powered applications.

Introducing Keras 3.0

After five months of extensive public beta testing, we’re excited to announce the official release of Keras 3.0. Keras 3 is a full rewrite of Keras that enables you to run your Keras workflows on top of either JAX, TensorFlow, PyTorch, or OpenVINO (for inference-only), and that unlocks brand new large-scale model training and deployment capabilities. You can pick the framework that suits you best, and switch from one to another based on your current goals. You can also use Keras as a low-level cross-framework language to develop custom components such as layers, models, or metrics that can be used in native workflows in JAX, TensorFlow, or PyTorch — with one codebase.


Welcome to multi-framework machine learning.

You’re already familiar with the benefits of using Keras — it enables high-velocity development via an obsessive focus on great UX, API design, and debuggability. It’s also a battle-tested framework that has been chosen by over 2.5M developers and that powers some of the most sophisticated, largest-scale ML systems in the world, such as the Waymo self-driving fleet and the YouTube recommendation engine. But what are the additional benefits of using the new multi-backend Keras 3?

  • Always get the best performance for your models. In our benchmarks, we found that JAX typically delivers the best training and inference performance on GPU, TPU, and CPU — but results vary from model to model, as non-XLA TensorFlow is occasionally faster on GPU. The ability to dynamically select the backend that will deliver the best performance for your model without having to change anything to your code means you’re guaranteed to train and serve with the highest achievable efficiency.
  • Unlock ecosystem optionality for your models. Any Keras 3 model can be instantiated as a PyTorch Module, can be exported as a TensorFlow SavedModel, or can be instantiated as a stateless JAX function. That means that you can use your Keras 3 models with PyTorch ecosystem packages, with the full range of TensorFlow deployment & production tools (like TF-Serving, TF.js and TFLite), and with JAX large-scale TPU training infrastructure. Write one model.py using Keras 3 APIs, and get access to everything the ML world has to offer.
  • Leverage large-scale model parallelism & data parallelism with JAX. Keras 3 includes a brand new distribution API, the keras.distribution namespace, currently implemented for the JAX backend (coming soon to the TensorFlow and PyTorch backends). It makes it easy to do model parallelism, data parallelism, and combinations of both — at arbitrary model scales and cluster scales. Because it keeps the model definition, training logic, and sharding configuration all separate from each other, it makes your distribution workflow easy to develop and easy to maintain. 

The Keras distribution API is a new interface designed to facilitate distributed deep learning across a variety of backends like JAX, TensorFlow and PyTorch. This powerful API introduces a suite of tools enabling data and model parallelism, allowing for efficient scaling of deep learning models on multiple accelerators and hosts. Whether leveraging the power of GPUs or TPUs, the API provides a streamlined approach to initializing distributed environments, defining device meshes, and orchestrating the layout of tensors across computational resources. Through classes like DataParallel and ModelParallel, it abstracts the complexity involved in parallel computation, making it easier for developers to accelerate their machine learning workflows.


How it works

The Keras distribution API provides a global programming model that allows developers to compose applications that operate on tensors in a global context (as if working with a single device) while automatically managing distribution across many devices. The API leverages the underlying framework (e.g. JAX) to distribute the program and tensors according to the sharding directives through a procedure called single program, multiple data (SPMD) expansion.

By decoupling the application from sharding directives, the API enables running the same application on a single device, multiple devices, or even multiple clients, while preserving its global semantics.


Setup

import os
# The distribution API is only implemented for the JAX backend for now.
os.environ["KERAS_BACKEND"] = "jax"
import keras
from keras import layers
import jax
import numpy as np
from tensorflow import data as tf_data  # For dataset input.

DeviceMesh and TensorLayout

The  class in Keras distribution API represents a cluster of computational devices configured for distributed computation. It aligns with similar concepts where it’s used to map the physical devices to a logical mesh structure.

The TensorLayout class then specifies how tensors are distributed across the DeviceMesh, detailing the sharding of tensors along specified axes that correspond to the names of the axes in the DeviceMesh.

# Retrieve the local available gpu devices.
devices = jax.devices("gpu")  # Assume it has 8 local GPUs.
# Define a 2x4 device mesh with data and model parallel axes
mesh = keras.distribution.DeviceMesh(
    shape=(2, 4), axis_names=["data", "model"], devices=devices
)
# A 2D layout, which describes how a tensor is distributed across the
# mesh. The layout can be visualized as a 2D grid with "model" as rows and
# "data" as columns, and it is a [4, 2] grid when it mapped to the physical
# devices on the mesh.
layout_2d = keras.distribution.TensorLayout(axes=("model", "data"), device_mesh=mesh)
# A 4D layout which could be used for data parallel of a image input.
replicated_layout_4d = keras.distribution.TensorLayout(
    axes=("data", None, None, None), device_mesh=mesh
)

Distribution

The Distribution class in Keras serves as a foundational abstract class designed for developing custom distribution strategies. It encapsulates the core logic needed to distribute a model’s variables, input data, and intermediate computations across a device mesh. As an end user, you won’t have to interact directly with this class, but its subclasses like DataParallel or ModelParallel.


DataParallel

The DataParallel class in the Keras distribution API is designed for the data parallelism strategy in distributed training, where the model weights are replicated across all devices in the DeviceMesh, and each device processes a portion of the input data.

Here is a sample usage of this class.

# Create DataParallel with list of devices.
# As a shortcut, the devices can be skipped,
# and Keras will detect all local available devices.
# E.g. data_parallel = DataParallel()
data_parallel = keras.distribution.DataParallel(devices=devices)
# Or you can choose to create DataParallel with a 1D `DeviceMesh`.
mesh_1d = keras.distribution.DeviceMesh(
    shape=(8,), axis_names=["data"], devices=devices
)
data_parallel = keras.distribution.DataParallel(device_mesh=mesh_1d)
inputs = np.random.normal(size=(128, 28, 28, 1))
labels = np.random.normal(size=(128, 10))
dataset = tf_data.Dataset.from_tensor_slices((inputs, labels)).batch(16)
# Set the global distribution.
keras.distribution.set_distribution(data_parallel)
# Note that all the model weights from here on are replicated to
# all the devices of the `DeviceMesh`. This includes the RNG
# state, optimizer states, metrics, etc. The dataset fed into `model.fit` or
# `model.evaluate` will be split evenly on the batch dimension, and sent to
# all the devices. You don't have to do any manual aggregration of losses,
# since all the computation happens in a global context.
inputs = layers.Input(shape=(28, 28, 1))
y = layers.Flatten()(inputs)
y = layers.Dense(units=200, use_bias=False, activation="relu")(y)
y = layers.Dropout(0.4)(y)
y = layers.Dense(units=10, activation="softmax")(y)
model = keras.Model(inputs=inputs, outputs=y)
model.compile(loss="mse")
model.fit(dataset, epochs=3)
model.evaluate(dataset)
Epoch 1/3
 8/8 ━━━━━━━━━━━━━━━━━━━━ 8s 30ms/step - loss: 1.0116
Epoch 2/3
 8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.9237
Epoch 3/3
 8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - loss: 0.8736
 8/8 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 0.8349
0.842325747013092

ModelParallel and LayoutMap

ModelParallel will be mostly useful when model weights are too large to fit on a single accelerator. This setting allows you to spit your model weights or activation tensors across all the devices on the DeviceMesh, and enable the horizontal scaling for the large models.

Unlike the DataParallel model where all weights are fully replicated, the weights layout under ModelParallel usually need some customization for best performances. We introduce LayoutMap to let you specify the TensorLayout for any weights and intermediate tensors from global perspective.

LayoutMap is a dict-like object that maps a string to TensorLayout instances. It behaves differently from a normal Python dict in that the string key is treated as a regex when retrieving the value. The class allows you to define the naming schema of TensorLayout and then retrieve the corresponding TensorLayout instance. Typically, the key used to query is the variable.path attribute, which is the identifier of the variable. As a shortcut, a tuple or list of axis names is also allowed when inserting a value, and it will be converted to TensorLayout.

The LayoutMap can also optionally contain a DeviceMesh to populate the TensorLayout.device_mesh if it is not set. When retrieving a layout with a key, and if there isn’t an exact match, all existing keys in the layout map will be treated as regex and matched against the input key again. If there are multiple matches, a ValueError is raised. If no matches are found, None is returned.

mesh_2d = keras.distribution.DeviceMesh(
    shape=(2, 4), axis_names=["data", "model"], devices=devices
)
layout_map = keras.distribution.LayoutMap(mesh_2d)
# The rule below means that for any weights that match with d1/kernel, it
# will be sharded with model dimensions (4 devices), same for the d1/bias.
# All other weights will be fully replicated.
layout_map["d1/kernel"] = (None, "model")
layout_map["d1/bias"] = ("model",)
# You can also set the layout for the layer output like
layout_map["d2/output"] = ("data", None)
model_parallel = keras.distribution.ModelParallel(layout_map, batch_dim_name="data")
keras.distribution.set_distribution(model_parallel)
inputs = layers.Input(shape=(28, 28, 1))
y = layers.Flatten()(inputs)
y = layers.Dense(units=200, use_bias=False, activation="relu", name="d1")(y)
y = layers.Dropout(0.4)(y)
y = layers.Dense(units=10, activation="softmax", name="d2")(y)
model = keras.Model(inputs=inputs, outputs=y)
# The data will be sharded across the "data" dimension of the method, which
# has 2 devices.
model.compile(loss="mse")
model.fit(dataset, epochs=3)
model.evaluate(dataset)
Epoch 1/3
/opt/conda/envs/keras-jax/lib/python3.10/site-packages/jax/_src/interpreters/mlir.py:761: UserWarning: Some donated buffers were not usable: ShapedArray(float32[784,50]).
See an explanation at https://jax.readthedocs.io/en/latest/faq.html#buffer-donation.
  warnings.warn("Some donated buffers were not usable:"
 8/8 ━━━━━━━━━━━━━━━━━━━━ 5s 8ms/step - loss: 1.0266
Epoch 2/3
 8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.9181
Epoch 3/3
 8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 0.8725
 8/8 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - loss: 0.8381  
0.8502610325813293

It is also easy to change the mesh structure to tune the computation between more data parallel or model parallel. You can do this by adjusting the shape of the mesh. And no changes are needed for any other code.

full_data_parallel_mesh = keras.distribution.DeviceMesh(
    shape=(8, 1), axis_names=["data", "model"], devices=devices
)
more_data_parallel_mesh = keras.distribution.DeviceMesh(
    shape=(4, 2), axis_names=["data", "model"], devices=devices
)
more_model_parallel_mesh = keras.distribution.DeviceMesh(
    shape=(2, 4), axis_names=["data", "model"], devices=devices
)
full_model_parallel_mesh = keras.distribution.DeviceMesh(
    shape=(1, 8), axis_names=["data", "model"], devices=devices
)
  • Maximize reach for your open-source model releases. Want to release a pretrained model? Want as many people as possible to be able to use it? If you implement it in pure TensorFlow or PyTorch, it will be usable by roughly half of the community. If you implement it in Keras 3, it is instantly usable by anyone regardless of their framework of choice (even if they’re not Keras users themselves). Twice the impact at no added development cost.
  • Use data pipelines from any source. The Keras 3 fit()/evaluate()/predict() routines are compatible with tf.data.Dataset objects, with PyTorch DataLoader objects, with NumPy arrays, Pandas dataframes — regardless of the backend you’re using. You can train a Keras 3 + TensorFlow model on a PyTorch DataLoader or train a Keras 3 + PyTorch model on a tf.data.Dataset.

The full Keras API, available for JAX, TensorFlow, and PyTorch.

Keras 3 implements the full Keras API and makes it available with TensorFlow, JAX, and PyTorch — over a hundred layers, dozens of metrics, loss functions, optimizers, and callbacks, the Keras training and evaluation loops, and the Keras saving & serialization infrastructure. All the APIs you know and love are here.

Any Keras model that only uses built-in layers will immediately work with all supported backends. In fact, your existing tf.keras models that only use built-in layers can start running in JAX and PyTorch right away! That’s right, your codebase just gained a whole new set of capabilities.


Author multi-framework layers, models, metrics…

Keras 3 enables you to create components (like arbitrary custom layers or pretrained models) that will work the same in any framework. In particular, Keras 3 gives you access to the keras.ops namespace that works across all backends. It contains:

  • A full implementation of the NumPy API. Not something “NumPy-like” — just literally the NumPy API, with the same functions and the same arguments. You get ops.matmulops.sumops.stackops.einsum, etc.
  • A set of neural network-specific functions that are absent from NumPy, such as ops.softmaxops.binary_crossentropyops.conv, etc.

As long as you only use ops from keras.ops, your custom layers, custom losses, custom metrics, and custom optimizers will work with JAX, PyTorch, and TensorFlow — with the same code. That means that you can maintain only one component implementation (e.g. a single model.py together with a single checkpoint file), and you can use it in all frameworks, with the exact same numerics.


…that works seamlessly with any JAX, TensorFlow, and PyTorch workflow.

Keras 3 is not just intended for Keras-centric workflows where you define a Keras model, a Keras optimizer, a Keras loss and metrics, and you call fit()evaluate(), and predict(). It’s also meant to work seamlessly with low-level backend-native workflows: you can take a Keras model (or any other component, such as a loss or metric) and start using it in a JAX training loop, a TensorFlow training loop, or a PyTorch training loop, or as part of a JAX or PyTorch model, with zero friction. Keras 3 provides exactly the same degree of low-level implementation flexibility in JAX and PyTorch as tf.keras previously did in TensorFlow.

You can:

  • Write a low-level JAX training loop to train a Keras model using an optax optimizer, jax.gradjax.jitjax.pmap.
  • Write a low-level TensorFlow training loop to train a Keras model using tf.GradientTape and tf.distribute.
  • Write a low-level PyTorch training loop to train a Keras model using a torch.optim optimizer, a torch loss function, and the torch.nn.parallel.DistributedDataParallel wrapper.
  • Use Keras layers in a PyTorch Module (because they are Module instances too!)
  • Use any PyTorch Module in a Keras model as if it were a Keras layer.
  • etc.

A new distribution API for large-scale data parallelism and model parallelism.

The models we’ve been working with have been getting larger and larger, so we wanted to provide a Kerasic solution to the multi-device model sharding problem. The API we designed keeps the model definition, the training logic, and the sharding configuration entirely separate from each other, meaning that your models can be written as if they were going to run on a single device. You can then add arbitrary sharding configurations to arbitrary models when it’s time to train them.

Data parallelism (replicating a small model identically on multiple devices) can be handled in just two lines:

Model parallelism lets you specify sharding layouts for model variables and intermediate output tensors, along multiple named dimensions. In the typical case, you would organize available devices as a 2D grid (called a device mesh), where the first dimension is used for data parallelism and the second dimension is used for model parallelism. You would then configure your model to be sharded along the model dimension and replicated along the data dimension.

The API lets you configure the layout of every variable and every output tensor via regular expressions. This makes it easy to quickly specify the same layout for entire categories of variables.

The new distribution API is intended to be multi-backend, but is only available for the JAX backend for the time being. TensorFlow and PyTorch support is coming soon.


Pretrained models.

There’s a wide range of pretrained models that you can start using today with Keras 3.

All 40 Keras Applications models (the keras.applications namespace) are available in all backends. 

KerasHub API documentation

KerasHub is a toolbox of modular building blocks ranging from pretrained state-of-the-art models, to low-level Transformer Encoder layers.

  • Modeling API: Base classes that can be used for most high-level tasks using pretrained models. Note that you can use the from_preset() constructor on a base class to instantiate a model of the correct subclass.
  • Model Architectures: Implementations of all pretrained model architectures shipped with KerasHub.
  • Tokenizers: Layer implementations of tokenization routines for text-based models.
  • Preprocessing Layers: Layers for building preprocessing pipelines that handle audio, text, and image input.
  • Modeling Layers: Common modeling layers used by pretrained model architectures.
  • Samplers: An API for controlling generative text sampling.
  • Metrics: Metrics useful for audio, text, and image workflows.

This includes:

  • BERT
  • OPT
  • Whisper
  • T5
  • StableDiffusion
  • YOLOv8
  • SegmentAnything
  • etc.

Support for cross-framework data pipelines with all backends.

Multi-framework ML also means multi-framework data loading and preprocessing. Keras 3 models can be trained using a wide range of data pipelines — regardless of whether you’re using the JAX, PyTorch, or TensorFlow backends. It just works.

  • tf.data.Dataset pipelines: the reference for scalable production ML.
  • torch.utils.data.DataLoader objects.
  • NumPy arrays and Pandas dataframes.
  • Keras’s own keras.utils.PyDataset objects.

Progressive disclosure of complexity.

Progressive disclosure of complexity is the design principle at the heart of the Keras API. Keras doesn’t force you to follow a single “true” way of building and training models. Instead, it enables a wide range of different workflows, from the very high-level to the very low-level, corresponding to different user profiles.

That means that you can start out with simple workflows — such as using Sequential and Functional models and training them with fit() — and when you need more flexibility, you can easily customize different components while reusing most of your prior code. As your needs become more specific, you don’t suddenly fall off a complexity cliff and you don’t need to switch to a different set of tools.

We’ve brought this principle to all of our backends. For instance, you can customize what happens in your training loop while still leveraging the power of fit(), without having to write your own training loop from scratch — just by overriding the train_step method.

Here’s how it works in PyTorch and TensorFlow:

Customizing what happens in fit() with JAX

When you’re doing supervised learning, you can use fit() and everything works smoothly.

When you need to take control of every little detail, you can write your own training loop entirely from scratch.

But what if you need a custom training algorithm, but you still want to benefit from the convenient features of fit(), such as callbacks, built-in distribution support, or step fusing?

A core principle of Keras is progressive disclosure of complexity. You should always be able to get into lower-level workflows in a gradual way. You shouldn’t fall off a cliff if the high-level functionality doesn’t exactly match your use case. You should be able to gain more control over the small details while retaining a commensurate amount of high-level convenience.

When you need to customize what fit() does, you should override the training step function of the Model class. This is the function that is called by fit() for every batch of data. You will then be able to call fit() as usual – and it will be running your own learning algorithm.

Note that this pattern does not prevent you from building models with the Functional API. You can do this whether you’re building Sequential models, Functional API models, or subclassed models.

Let’s see how that works.


Setup

import os
# This guide can only be run with the JAX backend.
os.environ["KERAS_BACKEND"] = "jax"
import jax
import keras
import numpy as np

A first simple example

Let’s start from a simple example:

  • We create a new class that subclasses.
  • We implement a fully-stateless compute_loss_and_updates() method to compute the loss as well as the updated values for the non-trainable variables of the model. Internally, it calls stateless_call() and the built-in stateless_compute_loss().
  • We implement a fully-stateless train_step() method to compute current metric values (including the loss) as well as updated values for the trainable variables, the optimizer variables, and the metric variables.
class CustomModel(keras.Model):
    def compute_loss_and_updates(
        self,
        trainable_variables,
        non_trainable_variables,
        metrics_variables,
        x,
        y,
        sample_weight,
        training=False,
    ):
        y_pred, non_trainable_variables = self.stateless_call(
            trainable_variables,
            non_trainable_variables,
            x,
            training=training,
        )
        loss, (
            trainable_variables,
            non_trainable_variables,
            metrics_variables,
        ) = self.stateless_compute_loss(
            trainable_variables,
            non_trainable_variables,
            metrics_variables,
            x=x,
            y=y,
            y_pred=y_pred,
            sample_weight=sample_weight,
            training=training,
        )
        return loss, (y_pred, non_trainable_variables, metrics_variables)
    def train_step(self, state, data):
        (
            trainable_variables,
            non_trainable_variables,
            optimizer_variables,
            metrics_variables,
        ) = state
        x, y, sample_weight = keras.utils.unpack_x_y_sample_weight(data)
        # Get the gradient function.
        grad_fn = jax.value_and_grad(self.compute_loss_and_updates, has_aux=True)
        # Compute the gradients.
        (loss, (y_pred, non_trainable_variables, metrics_variables)), grads = grad_fn(
            trainable_variables,
            non_trainable_variables,
            metrics_variables,
            x,
            y,
            sample_weight,
            training=True,
        )
        # Update trainable variables and optimizer variables.
        trainable_variables, optimizer_variables = self.optimizer.stateless_apply(
            optimizer_variables, grads, trainable_variables
        )
        # Update metrics.
        new_metrics_vars = []
        logs = {}
        for metric in self.metrics:
            this_metric_vars = metrics_variables[
                len(new_metrics_vars) : len(new_metrics_vars) + len(metric.variables)
            ]
            if metric.name == "loss":
                this_metric_vars = metric.stateless_update_state(
                    this_metric_vars, loss, sample_weight=sample_weight
                )
            else:
                this_metric_vars = metric.stateless_update_state(
                    this_metric_vars, y, y_pred, sample_weight=sample_weight
                )
            logs[metric.name] = metric.stateless_result(this_metric_vars)
            new_metrics_vars += this_metric_vars
        # Return metric logs and updated state variables.
        state = (
            trainable_variables,
            non_trainable_variables,
            optimizer_variables,
            new_metrics_vars,
        )
        return logs, state

Let’s try this out:

# Construct and compile an instance of CustomModel
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = CustomModel(inputs, outputs)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
# Just use `fit` as usual
x = np.random.random((1000, 32))
y = np.random.random((1000, 1))
model.fit(x, y, epochs=3)
Epoch 1/3
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - mae: 0.3765 - loss: 0.2093
Epoch 2/3
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 232us/step - mae: 0.3634 - loss: 0.1968
Epoch 3/3
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 228us/step - mae: 0.3543 - loss: 0.1877
<keras.src.callbacks.history.History at 0x15d8472e0>

Going lower-level

Naturally, you could just skip passing a loss function in compile(), and instead do everything manually in train_step. Likewise for metrics.

Here’s a lower-level example, that only uses compile() to configure the optimizer:

class CustomModel(keras.Model):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.loss_tracker = keras.metrics.Mean(name="loss")
        self.mae_metric = keras.metrics.MeanAbsoluteError(name="mae")
        self.loss_fn = keras.losses.MeanSquaredError()
    def compute_loss_and_updates(
        self,
        trainable_variables,
        non_trainable_variables,
        x,
        y,
        sample_weight,
        training=False,
    ):
        y_pred, non_trainable_variables = self.stateless_call(
            trainable_variables,
            non_trainable_variables,
            x,
            training=training,
        )
        loss = self.loss_fn(y, y_pred, sample_weight=sample_weight)
        return loss, (y_pred, non_trainable_variables)
    def train_step(self, state, data):
        (
            trainable_variables,
            non_trainable_variables,
            optimizer_variables,
            metrics_variables,
        ) = state
        x, y, sample_weight = keras.utils.unpack_x_y_sample_weight(data)
        # Get the gradient function.
        grad_fn = jax.value_and_grad(self.compute_loss_and_updates, has_aux=True)
        # Compute the gradients.
        (loss, (y_pred, non_trainable_variables)), grads = grad_fn(
            trainable_variables,
            non_trainable_variables,
            x,
            y,
            sample_weight,
            training=True,
        )
        # Update trainable variables and optimizer variables.
        trainable_variables, optimizer_variables = self.optimizer.stateless_apply(
            optimizer_variables, grads, trainable_variables
        )
        # Update metrics.
        loss_tracker_vars = metrics_variables[: len(self.loss_tracker.variables)]
        mae_metric_vars = metrics_variables[len(self.loss_tracker.variables) :]
        loss_tracker_vars = self.loss_tracker.stateless_update_state(
            loss_tracker_vars, loss, sample_weight=sample_weight
        )
        mae_metric_vars = self.mae_metric.stateless_update_state(
            mae_metric_vars, y, y_pred, sample_weight=sample_weight
        )
        logs = {}
        logs[self.loss_tracker.name] = self.loss_tracker.stateless_result(
            loss_tracker_vars
        )
        logs[self.mae_metric.name] = self.mae_metric.stateless_result(mae_metric_vars)
        new_metrics_vars = loss_tracker_vars + mae_metric_vars
        # Return metric logs and updated state variables.
        state = (
            trainable_variables,
            non_trainable_variables,
            optimizer_variables,
            new_metrics_vars,
        )
        return logs, state
    @property
    def metrics(self):
        # We list our `Metric` objects here so that `reset_states()` can be
        # called automatically at the start of each epoch
        # or at the start of `evaluate()`.
        return [self.loss_tracker, self.mae_metric]
# Construct an instance of CustomModel
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = CustomModel(inputs, outputs)
# We don't pass a loss or metrics here.
model.compile(optimizer="adam")
# Just use `fit` as usual -- you can use callbacks, etc.
x = np.random.random((1000, 32))
y = np.random.random((1000, 1))
model.fit(x, y, epochs=5)
Epoch 1/5
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - loss: 0.9146 - mae: 0.8248
Epoch 2/5
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 225us/step - loss: 0.4087 - mae: 0.5116
Epoch 3/5
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 230us/step - loss: 0.2766 - mae: 0.4233
Epoch 4/5
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 202us/step - loss: 0.2631 - mae: 0.4106
Epoch 5/5
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 198us/step - loss: 0.2604 - mae: 0.4070
<keras.src.callbacks.history.History at 0x15dccb0a0>

Providing your own evaluation step

What if you want to do the same for calls to model.evaluate()? Then you would override test_step in exactly the same way. Here’s what it looks like:

class CustomModel(keras.Model):
    def test_step(self, state, data):
        # Unpack the data.
        x, y, sample_weight = keras.utils.unpack_x_y_sample_weight(data)
        (
            trainable_variables,
            non_trainable_variables,
            metrics_variables,
        ) = state
        # Compute predictions and loss.
        y_pred, non_trainable_variables = self.stateless_call(
            trainable_variables,
            non_trainable_variables,
            x,
            training=False,
        )
        loss, (
            trainable_variables,
            non_trainable_variables,
            metrics_variables,
        ) = self.stateless_compute_loss(
            trainable_variables,
            non_trainable_variables,
            metrics_variables,
            x=x,
            y=y,
            y_pred=y_pred,
            sample_weight=sample_weight,
            training=False,
        )
        # Update metrics.
        new_metrics_vars = []
        logs = {}
        for metric in self.metrics:
            this_metric_vars = metrics_variables[
                len(new_metrics_vars) : len(new_metrics_vars) + len(metric.variables)
            ]
            if metric.name == "loss":
                this_metric_vars = metric.stateless_update_state(
                    this_metric_vars, loss, sample_weight=sample_weight
                )
            else:
                this_metric_vars = metric.stateless_update_state(
                    this_metric_vars, y, y_pred, sample_weight=sample_weight
                )
            logs[metric.name] = metric.stateless_result(this_metric_vars)
            new_metrics_vars += this_metric_vars
        # Return metric logs and updated state variables.
        state = (
            trainable_variables,
            non_trainable_variables,
            new_metrics_vars,
        )
        return logs, state
# Construct an instance of CustomModel
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = CustomModel(inputs, outputs)
model.compile(loss="mse", metrics=["mae"])
# Evaluate with our custom test_step
x = np.random.random((1000, 32))
y = np.random.random((1000, 1))
model.evaluate(x, y, return_dict=True)
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - mae: 0.5369 - loss: 0.4170
{'compile_metrics': {'mae': Array(0.5368782, dtype=float32)},
 'loss': Array(0.41702443, dtype=float32)}

All stateful objects in Keras (i.e. objects that own numerical variables that get updated during training or evaluation) now have a stateless API, making it possible to use them in JAX functions (which are required to be fully stateless):

  • All layers and models have a stateless_call() method which mirrors __call__().
  • All optimizers have a stateless_apply() method which mirrors apply().
  • All metrics have a stateless_update_state() method which mirrors update_state() and a stateless_result() method which mirrors result().

These methods have no side-effects whatsoever: they take as input the current value of the state variables of the target object, and return the update values as part of their outputs, e.g.:

outputs, updated_non_trainable_variables = layer.stateless_call(
    trainable_variables,
    non_trainable_variables,
    inputs,
)

You never have to implement these methods yourself — they’re automatically available as long as you’ve implemented the stateful version (e.g. call() or update_state()).


Run inference with the OpenVINO backend.

Starting with release 3.8, Keras introduces the OpenVINO backend that is an inference-only backend, meaning it is designed only for running model predictions using predict() method. This backend enables to leverage OpenVINO performance optimizations directly within the Keras workflow, enabling faster inference on OpenVINO supported hardware.

To switch to the OpenVINO backend, set the KERAS_BACKEND environment variable to "openvino" or specify the backend in the local configuration file at ~/.keras/keras.json. Here is an example of how to infer a model (trained with PyTorch, JAX, or TensorFlow backends), using the OpenVINO backend:

import os
os.environ["KERAS_BACKEND"] = "openvino"
import keras
loaded_model = keras.saving.load_model(...)
predictions = loaded_model.predict(...)

Note that the OpenVINO backend may currently lack support for some operations. This will be addressed in upcoming Keras releases as operation coverage is being expanded.


Moving from Keras 2 to Keras 3

Keras 3 is highly backwards compatible with Keras 2: it implements the full public API surface of Keras 2, with a limited number of exceptions. Most users will not have to make any code change to start running their Keras scripts on Keras 3.

Larger codebases are likely to require some code changes, since they are more likely to run into one of the exceptions listed above, and are more likely to have been using private APIs or deprecated APIs (tf.compat.v1.keras namespace, experimental namespace, keras.src private namespace). To help you move to Keras 3, we are releasing a complete with quick fixes for all issues you might encounter.

You also have the option to ignore the changes in Keras 3 and just keep using Keras 2 with TensorFlow — this can be a good option for projects that are not actively developed but need to keep running with updated dependencies. You have two possibilities:

  1. If you were accessing keras as a standalone package, just switch to using the Python package tf_keras instead, which you can install via pip install tf_keras. The code and API are wholly unchanged — it’s Keras 2.15 with a different package name. We will keep fixing bugs in tf_keras and we will keep regularly releasing new versions. However, no new features or performance improvements will be added, since the package is now in maintenance mode.
  2. If you were accessing keras via tf.keras, there are no immediate changes until TensorFlow 2.16. TensorFlow 2.16+ will use Keras 3 by default. In TensorFlow 2.16+, to keep using Keras 2, you can first install tf_keras, and then export the environment variable TF_USE_LEGACY_KERAS=1. This will direct TensorFlow 2.16+ to resolve tf.keras to the locally-installed tf_keras package. Note that this may affect more than your own code, however: it will affect any package importing tf.keras in your Python process. To make sure your changes only affect your own code, you should use the tf_keras package.

Enjoy the library!

We’re excited for you to try out the new Keras and improve your workflows by leveraging multi-framework ML. Let us know how it goes: issues, points of friction, feature requests, or success stories — we’re eager to hear from you!


tf.lite is being replaced by LiteRT

The tf.lite module will be deprecated with development for on-device inference moving to a new, independent repository: The new APIs are available in Kotlin and C++. This code base will decouple from the TensorFlow repository and tf.lite will be removed from future TensorFlow Python packages, so we encourage migration of projects to LiteRT to receive the latest updates. More details to follow.

As announced at Google I/O ‘25, LiteRT improves upon TFLite, particularly for NPU and GPU hardware acceleration and performance for on-device ML and AI applications.

LiteRT provides a unified interface for Neural Processing Units (NPUs), removing the need to navigate vendor-specific compilers or libraries. This approach avoids many device-specific complications, boosts performance for real-time and large-model inference, and minimizes memory copies through zero-copy hardware buffer usage.

Send feedback

NPU acceleration with LiteRT

LiteRT provides a unified interface to use Neural Processing Units (NPUs) without requesting you to navigate vendor-specific compilers, runtimes, or library dependencies. Using LiteRT for NPU acceleration boosts performance for real-time and large-model inference and minimizes memory copies through zero-copy hardware buffer usage.

AOT and on-device compilation

LiteRT NPU supports both AOT and on-device compilation to meet your specific deployment requirements:

  • Offline (AOT) compilation: This is best suited for large, complex models where the target SoC is known. Compiling ahead-of-time significantly reduces initialization costs and lowers memory usage when the user launches your app.
  • Online (on-device) compilation: Also known as JIT compilation. This is ideal for platform-agnostic model distribution of small models. The model is compiled on the user’s device during initialization, requiring no extra preparation step but incurring a higher first-run cost.

Here’s how you can deploy your model using both AOT or on-device compilation options:

Step 1: AOT Compilation for the target NPU SoCs

You can use the LiteRT AOT (ahead of time) Compiler to compile your .tflite model to the supported SoCs. You can also target multiple SoC vendors and versions simultaneously within a single compilation process. 

While optional, AOT compilation is highly recommended for larger models to reduce on-device initialization time. This step is not required for on-device compilation.

Step 2: Deploy with Google Play if on Android

Lets you publish a single artifact to Play containing your code, assets, and ML models and to choose from a number of delivery modes and targeting options.

Benefits

  • Upload a single publishing artifact to Google Play and delegate hosting, delivery, updates, and targeting to Play at no additional cost.
  • Deliver your ML models at install-time, fast-follow, or on-demand.
    • Install-time delivery can guarantee that a very large model is present when your app is opened. Your model will be installed as an APK.
    • Fast-follow delivery occurs automatically in the background after your app has been installed. Users may open your app before your model has been fully downloaded. Your model will be downloaded to your app’s internal storage space.
    • On-demand delivery lets you request the model at runtime, which is useful if the model is only required for certain user-flows. Your model will be downloaded to your app’s internal storage space.
  • Deliver variants of your ML models that are targeted to specific devices based on device model, system properties, or RAM.
  • Keep app updates small and optimized with Play’s automatic patching, which means only the differences in files need to be downloaded.
  • Models downloaded by Play for On-device AI should only be used by your apps. Models shouldn’t be offered to other apps.
  • Individual AI packs can be up to 1.5GB, based on their compressed download sizes. The maximum cumulative app size of any version of your app generated from your app bundle is 4GB.
  • Apps over 1GB in size must set min SDK Level to 21 or higher.

How to use Play for On-device AI

Play for On-device AI uses AI packs. You package custom models that are ready for distribution in AI packs in your app bundle. You can choose whether the AI pack should be delivered at install-time, fast-follow, or on-demand.

By packaging AI packs with your app bundle, you can use all of Play’s existing testing and release tools, such as test tracks and staged rollouts to manage your app’s distribution with your custom models.

AI packs are updated together with the app binary. If your new app release doesn’t make changes to an AI pack, then Play’s automatic patching process will ensure the user doesn’t have to re-download it. Play will just download what’s changed when it updates the app.

AI packs only contain models. Java/Kotlin and native libraries are not allowed.

Overview of Play Feature Delivery

Play Feature Delivery uses advanced capabilities of app bundles, allowing certain features of your app to be delivered conditionally or downloaded on demand. To do that, first you need to separate these features from your base app into feature modules.

Feature module build configuration

When you create a new feature module using Android Studio, the IDE applies the following Gradle plugin to the module’s build.gradle file.

// The following applies the dynamic-feature plugin to your feature module.
// The plugin includes the Gradle tasks and properties required to configure and build
// an app bundle that includes your feature module.
plugins {
  id 'com.android.dynamic-feature'
}

Faster input pipeline warm-up with tf.data

To help reduce latency, especially the time it takes for your model to process the first element of a dataset, we’ve added autotune.min_parallelism. This new option allows asynchronous dataset operations like .map and .batch to immediately start with a specified minimum level of parallelism, speeding up the initial warm-up time for your input pipelines.

Changes to I/O GCS filesystem package

The tensorflow-io-gcs-filesystem package for Google Cloud Storage support is now optional. Previously, it was installed, by default, with TensorFlow. If your workflow requires access to GCS, you must now explicitly install this package by running: pip install “tensorflow[gcs-filesystem]”.

Note that the package has recently received limited support, and there is currently no guarantee it will be available for newer Python versions.

Leave a Reply

Your email address will not be published. Required fields are marked *