Basics

This section introduces the core concepts behind Brain4J model construction and training. It assumes you already have a basic understanding of machine learning (datasets, loss functions, optimizers, epochs), and focuses on how those ideas are expressed inside Brain4J.

By the end of this page, you should be able to:

Understand what a Model is and how it is built
Compose architectures using layers and reusable blocks
Train a model using Trainer and TrainingConfig
Monitor training and evaluation
Save and reload trained models

Overall Execution Model

A Brain4J training pipeline can be summarized as follows:

You start from a dataset, expressed as a DataSource. You then define the structure of the network using ModelSpecs, which is a declarative description of layers and blocks. This specification is compiled into a concrete Model, which owns parameters and executes forward and backward passes.

Training behavior is defined independently through a TrainingConfig, and the actual training loop is executed by a Trainer. During training, one or more Monitor instances observe what is happening, without affecting the optimization process itself.

This separation is intentional: Brain4J avoids mixing concerns such as architecture, optimization strategy, execution, and logging into a single object.

Datasets and Data Sources

Brain4J represents datasets through the DataSource abstraction. A data source is responsible for providing samples in the correct order and batch size, optionally shuffling them between epochs.

A commonly used implementation is ListDataSource, which simply wraps an in-memory list of samples:

ListDataSource trainSource = new ListDataSource(samples, true, 128);

Each element in the dataset is a Sample, which pairs an input tensor with its corresponding target tensor:

new Sample(input, output);

The important point is that datasets are completely independent from models. A DataSource does not know anything about the network it is feeding, and the model does not know where the data comes from. This makes it trivial to reuse the same dataset for different experiments or evaluation setups.

Defining a Model: Layers and Specifications

At the lowest level, Brain4J models are composed of layers. Layers define what computation happens, they handle gradient calculation, weights initialization, but not how training happens.

Every Brain4J model must start with an InputLayer, which defines the input dimensionality. If it is missing, model compilation fails with an error.

ModelSpecs: Describing the Architecture

Instead of instantiating a model directly, you define its structure using ModelSpecs :

ModelSpecs specs = ModelSpecs.of(
    new InputLayer(28 * 28),
    new DenseLayer(128),
    new DenseLayer(10, Activations.SOFTMAX)
);

ModelSpecs is essentially a blueprint. It contains no weights, no gradients, and no runtime state. Its purpose is to describe the structure of the network in a way that is explicit, inspectable, and reproducible.

This design allows the same specification to be compiled multiple times with different random seeds or training configurations.

Compiling a Model

To turn a specification into an executable model, you compile it:

Model model = specs.compile(42);

Compilation:

Allocates parameters
Initialization of weights using the provided random seed
Validation of layer compatibility and tensor shapes

Once compiled, the Model owns all trainable parameters and is responsible for forward and backward execution. You can inspect its structure using:

model.summary();

At this point, the model is fully defined but has not yet been trained.

Model Blocks: Reusable Architecture Patterns

As architectures grow more complex, repeating the same sequence of layers becomes both verbose and error-prone. Brain4J addresses this with ModelBlock.

A ModelBlock is a reusable architectural component that expands into one or more layers at compile time. Conceptually, it acts as a macro rather than a runtime container.

private ModelBlock getBlock(int dimension) {
    return new ModelBlock() {
        @Override
        public void appendTo(List<Layer> layers) {
            layers.addAll(List.of(
                new DenseLayer(dimension),
                new NormLayer(),
                new ActivationLayer(Activations.RELU)
            ));
        }
    };
}

This block can then be used inside a model specification:

ModelSpecs specs = ModelSpecs.of(
    new InputLayer(28 * 28),
    getBlock(128),
    new DenseLayer(64, Activations.RELU),
    new DenseLayer(10, Activations.SOFTMAX)
);

Which has the same effects as:

ModelSpecs specs = ModelSpecs.of(
    new InputLayer(28 * 28),
    new DenseLayer(128),
    new NormLayer(),
    new ActivationLayer(Activations.RELU),
    new DenseLayer(64, Activations.RELU),
    new DenseLayer(10, Activations.SOFTMAX)
);

Internally, all blocks are flattened into a single sequential list of layers during compilation. They introduce no additional runtime abstraction or performance overhead.

Blocks can also contain other blocks, allowing you to express hierarchical or modular designs while still producing a simple sequential model under the hood.

Both layers and compiled models inherit ModelBlock .

Training Configuration

Training behavior is defined separately from the model via TrainingConfig.

TrainingConfig config = new TrainingConfig(
    new CrossEntropy(),
    new AdamW(0.01),
    new StochasticUpdater()
);

This configuration specifies:

Loss function: how predictions are compared to targets
Optimizer: how gradients are transformed into updates
Updater: how and when parameters are updated (e.g. stochastic, batch-based)

Because this configuration is independent from the model, the same architecture can be trained using different losses or optimizers without modification.

Trainer and the Training Loop

The Trainer is responsible for executing the training process:

Trainer trainer = new Trainer(model, monitors, config);
trainer.fit(trainSource, 50);

The trainer orchestrates:

Iteration over epochs and batches
Forward and backward passes
Application of parameter updates
Notification of monitors

The model itself remains unaware of epochs, datasets, or evaluation logic. This clear separation makes training behavior explicit and easy to reason about.

Monitors: Observing Without Interfering

Monitors provide a way to observe training without affecting the optimization process.

A simple example is DefaultMonitor, which logs batch timing and overall progress:

Monitor monitor = new DefaultMonitor();

For evaluation, EvalMonitor can periodically test the model on a separate dataset:

Monitor evalMonitor = new EvalMonitor(testSource, 10);

In this case, evaluation is performed every 10 epochs. Multiple monitors can be attached simultaneously, and they are executed in parallel from the trainer’s perspective.

Saving & Loading Models

Once training is complete, a model can be serialized using ModelZoo:

ModelZoo.saveModel(model, new File("mnist-100k.csv"));

The serialized file contains the model structure, trained parameters, and all metadata required to restore it. The resulting model can later be reloaded for inference or further training.

PreviousNeural Network Example NextTensors

Last updated 1 month ago

hashtagOverall Execution Model

hashtagDatasets and Data Sources

hashtagDefining a Model: Layers and Specifications

hashtagModelSpecs: Describing the Architecture

hashtagCompiling a Model

hashtagModel Blocks: Reusable Architecture Patterns

hashtagTraining Configuration

hashtagTrainer and the Training Loop

hashtagMonitors: Observing Without Interfering

hashtagSaving & Loading Models

Overall Execution Model

Datasets and Data Sources

Defining a Model: Layers and Specifications

ModelSpecs: Describing the Architecture

Compiling a Model

Model Blocks: Reusable Architecture Patterns

Training Configuration

Trainer and the Training Loop

Monitors: Observing Without Interfering

Saving & Loading Models