Tensors
A tensor is a multi-dimensional mathematical object that generalizes scalars (numbers), vectors (1D arrays), and matrices (2D arrays) to higher dimension.
Tensors are the fundamental data structure in Brain4J. All computations, models, and data transformations operate on tensors, which provide an efficient way to represent and manipulate high-dimensional data such as vectors, matrices, sequences, images, and text.
A tensor in Brain4J is defined by the following properties:
Order (Rank): Number of indices required to uniquely identify an element. For example, scalars have order 0, vectors have order 1, and matrices have order 2. Note: In Brain4J scalars have rank 1, because they are.
Shape: Number of elements along each dimension, or axis of the tensor. For example: an RGB image can be represented by a tensor with shape [color_channels, height, width].
Stride: The number of elements to skip in order to access the next element in that dimension.
Strides: A list containing the stride value for each dimension of the tensor.
In Brain4J, scalars are represented as tensors with shape [1] (1D vector). As a result, scalars have rank 1 and Brain4J does not support rank-0 tensors.
Device placement
In Brain4J, every tensor is associated with a specific execution device. A device defines where the tensor’s data is stored and where operations on that tensor are executed.
Currently, Brain4J supports the following devices:
CPU (default)
GPU (via OpenCL)
Creating tensors on a device
Tensors are created on the CPU by default. A tensor can be explicitly moved to another device when required.
Device device = Brain4J.firstDevice(); // gets the first device available
Tensor t = Tensors.random(1024, 1024); // CPU tensor
// moves the tensor to the GPU
Tensor gpuTensor = t.to(device);Whenever a tensor is located in the GPU, operations can be batched in a single command queue and planned for execution. Note: the data() method is not recommended for use when a tensor is hosted in the GPU.
Memory Layout
Internally, Brain4J tensors store their data in a flat, contiguous memory buffer, typically a one-dimensional float[] (or a device buffer in the case of GPU tensors). Multi-dimensional structure is not represented through nested arrays, but through shape and stride metadata.
This design choice is fundamental for performance, interoperability, and flexibility. By default, Brain4J uses a row-major (C-style) memory layout, meaning that the last dimension is contiguous in memory.
For example, a tensor with shape:
is laid out in memory as:
This corresponds to strides:
Broadcasting
Broadcasting allows tensors with compatible shapes to participate in the same operation without explicitly duplicating data.
A tensor can be broadcast along a dimension if its size in that dimension is either:
equal to the target size, or
equal to
1
Broadcasting is applied per-dimension, from the trailing axes.
Important notes
Broadcasting does not allocate new memory for the broadcasted tensor.
If shapes are not compatible, the operation fails with an error.
Not all operations support broadcasting.
In-place operations & Mutability
Brain4J tensors are mutable objects. Depending on the operation, a method may:
modify the tensor data in-place
allocate a new data buffer (deep copy)
return a view that shares the same underlying memory
This distinction is critical. Failing to understand it can easily introduce silent and hard-to-debug errors, especially when combining views, in-place mutations, and autograd.
For this reason, Brain4J does not hide mutability behind implicit copies. Instead, the API exposes it explicitly, and users are expected to reason about it.
In-place operations
In-place operations directly modify the internal data buffer of the tensor. No new memory is allocated, and the original values are overwritten.
Typical examples include:
Direct writes such as
set(...)Element-wise mutations such as
map(...)Fill operations like
fill(...)Normalization methods such as
layerNorm(...)
These operations return this (or otherwise operate on the same instance) and permanently change the tensor contents.
Methods like layerNorm(...) are semantically destructive. Despite their functional-looking name, they overwrite the original data. This behavior is intentional but must be understood clearly by the user.
If you need to preserve the original tensor, you must explicitly create a copy using clone() before calling these methods.
Object Mutability vs Data Mutability
Some methods mutate the tensor object without modifying the numerical data itself. This mainly concerns autograd-related state:
enabling or disabling gradient tracking
resetting accumulated gradients
updating the autograd context
These operations still mutate the tensor instance and therefore must be treated as stateful, even though the underlying data buffer is unchanged.
Operations That Allocate New Memory
Many tensor operations allocate a new data buffer, producing a tensor that is fully independent from the original one.
This category includes:
Explicit copies such as
clone()Shape-altering operations that require materialization (e.g.
slice,concat)Reductions and aggregations (
sum,mean,variance)Non-linear transformations such as
softmaxMathematical operations like
matmulandconvolve
In all these cases, the resulting tensor owns its own memory and can be mutated safely without affecting the original tensor.
Cloning
Tensors support cloning through the clone() method. A cloned tensor is always a deep copy and never shares the underlying data buffer with the original tensor.
The cloning behavior is defined as follows:
The tensor values are fully copied into a new contiguous buffer.
The shape array is always copied
If the source tensor is contiguous, its strides are preserved.
If the source tensor is non-contiguous (e.g. after transpose), the clone is materialized into a contiguous layout with standard strides
The cloned tensor has no associated autograd context
Views and Aliasing (No Data Copy)
Some operations return a view of the original tensor. Views do not allocate new memory and instead share the same underlying data buffer, while exposing a different shape and/or stride configuration.
Common view-producing operations include:
reshape(...)flatten()transpose(...)squeeze(...)andunsqueeze(...)
All these operations:
do not copy data
share the same memory buffer
reflect mutations across all aliases
As a consequence, modifying one view (for example via set(...) or map(...)) will also affect every other tensor that shares the same data.
The transpose(...) methods works accordingly to the chosen backend. If SIMD acceleration is active, full transposition is done by creating a new data array and moving the original values, otherwise a lazy-transposition is used.
This is prone to change in future releases.
Autograd-Safe Operations (*Grad Methods)
*Grad Methods)Operations with a *Grad suffix follow a strict rule:
If gradient tracking is disabled, they delegate to the non-gradient version
If gradient tracking is enabled, they never operate in-place
In the presence of autograd, these methods always create a new tensor to ensure correctness of the computation graph.
This guarantees that in-place data corruption cannot occur during gradient-based training.
All operations which do not end with the *Grad prefix, do not keep the autograd context.
The *Grad api will be removed in future releases and unified into the single implementations.
Examples
Last updated