githubEdit

gear-complexBasics

This section introduces the core concepts behind Brain4J model construction and training. It assumes you already have a basic understanding of machine learning (datasets, loss functions, optimizers, epochs), and focuses on how those ideas are expressed inside Brain4J.

By the end of this page, you should be able to:

  1. Understand what a Model is and how it is built

  2. Compose architectures using layers and reusable blocks

  3. Train a model using Trainer and TrainingConfig

  4. Monitor training and evaluation

  5. Save and reload trained models

Overall Execution Model

A Brain4J training pipeline can be summarized as follows:

You start from a dataset, expressed as a DataSource. You then define the structure of the network using ModelSpecs, which is a declarative description of layers and blocks. This specification is compiled into a concrete Model, which owns parameters and executes forward and backward passes.

Training behavior is defined independently through a TrainingConfig, and the actual training loop is executed by a Trainer. During training, one or more Monitor instances observe what is happening, without affecting the optimization process itself.

This separation is intentional: Brain4J avoids mixing concerns such as architecture, optimization strategy, execution, and logging into a single object.

Datasets and Data Sources

Brain4J represents datasets through the DataSource abstraction. A data source is responsible for providing samples in the correct order and batch size, optionally shuffling them between epochs.

A commonly used implementation is ListDataSource, which simply wraps an in-memory list of samples:

Each element in the dataset is a Sample, which pairs an input tensor with its corresponding target tensor:

The important point is that datasets are completely independent from models. A DataSource does not know anything about the network it is feeding, and the model does not know where the data comes from. This makes it trivial to reuse the same dataset for different experiments or evaluation setups.

Defining a Model: Layers and Specifications

At the lowest level, Brain4J models are composed of layers. Layers define what computation happens, they handle gradient calculation, weights initialization, but not how training happens.

circle-exclamation

ModelSpecs: Describing the Architecture

Instead of instantiating a model directly, you define its structure using ModelSpecs :

ModelSpecs is essentially a blueprint. It contains no weights, no gradients, and no runtime state. Its purpose is to describe the structure of the network in a way that is explicit, inspectable, and reproducible.

This design allows the same specification to be compiled multiple times with different random seeds or training configurations.

Compiling a Model

To turn a specification into an executable model, you compile it:

Compilation:

  • Allocates parameters

  • Initialization of weights using the provided random seed

  • Validation of layer compatibility and tensor shapes

Once compiled, the Model owns all trainable parameters and is responsible for forward and backward execution. You can inspect its structure using:

At this point, the model is fully defined but has not yet been trained.

Model Blocks: Reusable Architecture Patterns

As architectures grow more complex, repeating the same sequence of layers becomes both verbose and error-prone. Brain4J addresses this with ModelBlock.

A ModelBlock is a reusable architectural component that expands into one or more layers at compile time. Conceptually, it acts as a macro rather than a runtime container.

This block can then be used inside a model specification:

Which has the same effects as:

Internally, all blocks are flattened into a single sequential list of layers during compilation. They introduce no additional runtime abstraction or performance overhead.

Blocks can also contain other blocks, allowing you to express hierarchical or modular designs while still producing a simple sequential model under the hood.

circle-info

Both layers and compiled models inherit ModelBlock .

Training Configuration

Training behavior is defined separately from the model via TrainingConfig.

This configuration specifies:

  • Loss function: how predictions are compared to targets

  • Optimizer: how gradients are transformed into updates

  • Updater: how and when parameters are updated (e.g. stochastic, batch-based)

Because this configuration is independent from the model, the same architecture can be trained using different losses or optimizers without modification.

Trainer and the Training Loop

The Trainer is responsible for executing the training process:

The trainer orchestrates:

  • Iteration over epochs and batches

  • Forward and backward passes

  • Application of parameter updates

  • Notification of monitors

The model itself remains unaware of epochs, datasets, or evaluation logic. This clear separation makes training behavior explicit and easy to reason about.

Monitors: Observing Without Interfering

Monitors provide a way to observe training without affecting the optimization process.

A simple example is DefaultMonitor, which logs batch timing and overall progress:

For evaluation, EvalMonitor can periodically test the model on a separate dataset:

In this case, evaluation is performed every 10 epochs. Multiple monitors can be attached simultaneously, and they are executed in parallel from the trainer’s perspective.

Saving & Loading Models

Once training is complete, a model can be serialized using ModelZoo:

The serialized file contains the model structure, trained parameters, and all metadata required to restore it. The resulting model can later be reloaded for inference or further training.

Last updated