Basics
This section introduces the core concepts behind Brain4J model construction and training. It assumes you already have a basic understanding of machine learning (datasets, loss functions, optimizers, epochs), and focuses on how those ideas are expressed inside Brain4J.
By the end of this page, you should be able to:
Understand what a
Modelis and how it is builtCompose architectures using layers and reusable blocks
Train a model using
TrainerandTrainingConfigMonitor training and evaluation
Save and reload trained models
Overall Execution Model
A Brain4J training pipeline can be summarized as follows:
You start from a dataset, expressed as a DataSource. You then define the structure of the network using ModelSpecs, which is a declarative description of layers and blocks. This specification is compiled into a concrete Model, which owns parameters and executes forward and backward passes.
Training behavior is defined independently through a TrainingConfig, and the actual training loop is executed by a Trainer. During training, one or more Monitor instances observe what is happening, without affecting the optimization process itself.
This separation is intentional: Brain4J avoids mixing concerns such as architecture, optimization strategy, execution, and logging into a single object.
Datasets and Data Sources
Brain4J represents datasets through the DataSource abstraction. A data source is responsible for providing samples in the correct order and batch size, optionally shuffling them between epochs.
A commonly used implementation is ListDataSource, which simply wraps an in-memory list of samples:
Each element in the dataset is a Sample, which pairs an input tensor with its corresponding target tensor:
The important point is that datasets are completely independent from models. A DataSource does not know anything about the network it is feeding, and the model does not know where the data comes from. This makes it trivial to reuse the same dataset for different experiments or evaluation setups.
Defining a Model: Layers and Specifications
At the lowest level, Brain4J models are composed of layers. Layers define what computation happens, they handle gradient calculation, weights initialization, but not how training happens.
Every Brain4J model must start with an InputLayer, which defines the input dimensionality.
If it is missing, model compilation fails with an error.
ModelSpecs: Describing the Architecture
Instead of instantiating a model directly, you define its structure using ModelSpecs :
ModelSpecs is essentially a blueprint. It contains no weights, no gradients, and no runtime state. Its purpose is to describe the structure of the network in a way that is explicit, inspectable, and reproducible.
This design allows the same specification to be compiled multiple times with different random seeds or training configurations.
Compiling a Model
To turn a specification into an executable model, you compile it:
Compilation:
Allocates parameters
Initialization of weights using the provided random seed
Validation of layer compatibility and tensor shapes
Once compiled, the Model owns all trainable parameters and is responsible for forward and backward execution. You can inspect its structure using:
At this point, the model is fully defined but has not yet been trained.
Model Blocks: Reusable Architecture Patterns
As architectures grow more complex, repeating the same sequence of layers becomes both verbose and error-prone. Brain4J addresses this with ModelBlock.
A ModelBlock is a reusable architectural component that expands into one or more layers at compile time. Conceptually, it acts as a macro rather than a runtime container.
This block can then be used inside a model specification:
Which has the same effects as:
Internally, all blocks are flattened into a single sequential list of layers during compilation. They introduce no additional runtime abstraction or performance overhead.
Blocks can also contain other blocks, allowing you to express hierarchical or modular designs while still producing a simple sequential model under the hood.
Both layers and compiled models inherit ModelBlock .
Training Configuration
Training behavior is defined separately from the model via TrainingConfig.
This configuration specifies:
Loss function: how predictions are compared to targets
Optimizer: how gradients are transformed into updates
Updater: how and when parameters are updated (e.g. stochastic, batch-based)
Because this configuration is independent from the model, the same architecture can be trained using different losses or optimizers without modification.
Trainer and the Training Loop
The Trainer is responsible for executing the training process:
The trainer orchestrates:
Iteration over epochs and batches
Forward and backward passes
Application of parameter updates
Notification of monitors
The model itself remains unaware of epochs, datasets, or evaluation logic. This clear separation makes training behavior explicit and easy to reason about.
Monitors: Observing Without Interfering
Monitors provide a way to observe training without affecting the optimization process.
A simple example is DefaultMonitor, which logs batch timing and overall progress:
For evaluation, EvalMonitor can periodically test the model on a separate dataset:
In this case, evaluation is performed every 10 epochs. Multiple monitors can be attached simultaneously, and they are executed in parallel from the trainer’s perspective.
Saving & Loading Models
Once training is complete, a model can be serialized using ModelZoo:
The serialized file contains the model structure, trained parameters, and all metadata required to restore it. The resulting model can later be reloaded for inference or further training.
Last updated