Flux Release Notes

See also github's page for a complete list of PRs merged before each release.

v0.16.0 (15 December 2025)

This release has a single breaking change:

The recurrent cells RNNCell, LSTMCell, and GRUCell forward has been changed to $y_t, state_t = cell(x_t, state_{t-1})$. Previously, it was $state_t = cell(x_t, state_{t-1})$.

Other highlights include:

Added WeightNorm normalization layer.
Added Recurrence layer, turning a recurrent layer into a layer processing the entire sequence at once.

v0.15.0 (5 December 2024)

This release includes two breaking changes:

The recurrent layers have been thoroughly revised. See below and read the documentation for details.
Flux now defines and exports its own gradient function. Consequently, using gradient in an unqualified manner (e.g., after using Flux, Zygote) could result in an ambiguity error.

The most significant updates and deprecations are as follows:

Recurrent layers have undergone a complete redesign in PR 2500.
- RNNCell, LSTMCell, and GRUCell are now exported and provide functionality for single time-step processing: rnncell(x_t, h_t) -> h_{t+1}.
- RNN, LSTM, and GRU no longer store the hidden state internally, it has to be explicitely passed to the layer. Moreover, they now process entire sequences at once, rather than one element at a time: rnn(x, h) -> h′.
- The Recur wrapper has been deprecated and removed.
- The reset! function has also been removed; state management is now entirely up to the user.
The Flux.Optimise module has been deprecated in favor of the Optimisers.jl package. Now Flux re-exports the optimisers from Optimisers.jl. Most users will be uneffected by this change. The module is still available for now, but will be removed in a future release.
Most Flux layers will re-use memory via NNlib.bias_act!, when possible.
Further support for Enzyme.jl, via methods of Flux.gradient(loss, Duplicated(model)). Flux now owns & exports gradient and withgradient, but without Duplicated this still defaults to calling Zygote.jl.
Flux.params has been deprecated. Use Zygote's explicit differentiation instead, gradient(m -> loss(m, x, y), model), or use Flux.trainables(model) to get the trainable parameters.
Flux now requires Functors.jl v0.5. This new release of Functors assumes all types to be functors by default. Therefore, applying Flux.@layer or Functors.@functor to a type is no longer strictly necessary for Flux's models. However, it is still recommended to use @layer Model for additional functionality like pretty printing.
@layer Modelnow behaves the same as @layer :expand Model, which means that the model is expanded into its sublayers (if there are any) when printed. To force compact printing, use @layer :noexpand Model.

v0.14.22

Data movement between devices is now provided by MLDataDevices.jl.

v0.14.18

Add support for distributed data parallel training.
MPI and NCCL backend available with FluxMPIExt and FluxMPINCCLExt extensions respectively.

v0.14.17

Add support for Enzyme with Flux.train!.

v0.14.13

New macro Flux.@layer which should be used in place of @functor. This also adds show methods for pretty printing.

v0.14.12

New SignDecay optimiser, like WeightDecay but for L1 norm.

v0.14.0 (July 2023)

Flux now requires julia v1.9 or later.
CUDA.jl is not a hard dependency anymore. Support is now provided through the extension mechanism, by loading using Flux, CUDA. The package cuDNN.jl also needs to be installed in the environment. (You will get instructions if this is missing.)
After a deprecations cycle, the macro @epochs and the functions Flux.stop, Flux.skip, Flux.zeros, Flux.ones have been removed.

v0.13.17

Apple's Metal GPU acceleration preliminary support via the extension mechanism.

v0.13.16

Most greek-letter keyword arguments are deprecated in favour of ascii. Thus LayerNorm(3; ϵ=1e-4) (not ε!) should become LayerNorm(3; eps=1e-4).
DataLoader(...) |> gpu will now produce a special iterator, moving each batch as needed, instead of giving an error.
Added Flux.state returning the internal state of the model for serialization.

v0.13.15

Added MultiHeadAttention layer.
f16, f32, f64 now specifically target floating point arrays (i.e. integers arrays and other types are preserved).
f16, f32, f64 can now handle Complex{<:AbstractFloat} arrays.
Added EmbeddingBag layer.

v0.13.14

Fixed various deprecation warnings, from Zygone.@nograd and Vararg.
Initial support for AMDGPU via extension mechanism.
Add gpu_backend preference to select GPU backend using LocalPreference.toml.
Add Flux.gpu_backend! method to switch between GPU backends.

v0.13.13

Added f16 which changes precision to Float16, recursively.
Most layers standardise their input to eltype(layer.weight), #2156, to limit the cost of accidental Float64 promotion.
Friendlier errors from size mismatches #2176.

v0.13.12

CUDA.jl 4.0 compatibility.
Use dropout from NNlib as back-end for Dropout layer.

v0.13.9

New method of train! using Zygote's "explicit" mode. Part of a move away from "implicit" Params.
Added Flux.setup, which is Optimisers.setup with extra checks, and translation from deprecated "implicit" optimisers like Flux.Optimise.Adam to new ones from Optimisers.jl.

v0.13.7

Added @autosize macro, as another way to use outputsize.
Export Embedding.

v0.13.6

Use the package OneHotArrays.jl instead of having the same code here.
Added @autosize macro

v0.13.4

Added PairwiseFusion layer
Re-name ADAM to Adam, etc (with deprecations).

v0.13 (April 2022)

After a deprecations cycle, the datasets in Flux.Data have been removed in favour of MLDatasets.jl.
params is not exported anymore since it is a common name and is also exported by Distributions.jl
flatten is not exported anymore due to clash with Iterators.flatten.
Remove Juno.jl progress bar support as it is now obsolete.
Dropout gained improved compatibility with Int and Complex arrays and is now twice-differentiable.
Notation Dense(2 => 3, σ) for channels matches Conv; the equivalent Dense(2, 3, σ) still works.
Many utily functions and the DataLoader are now provided by MLUtils.jl.
The DataLoader is now compatible with generic dataset types implementing MLUtils.numobs and MLUtils.getobs.
Added truncated normal initialisation of weights.
The Flux.Diagonal layer is now called Scale, and accepts an activation function.
loadparams! is replaced by loadmodel! which copies trainable + non-trainable parameters and performs more thorough structural checking

v0.12.10

Dropout/AlphaDropout now supports user-specified RNGs

v0.12.9

Fixed incorrect output and added GPU compatibility for AlphaDropout.
Add trilinear Upsample layer.
Improved performance of RNNs
Optimisers now accept an ϵ argument
Improved handling of complex values inputs while training
Fixed AlphaDropout

v0.12.8

Optimised inference and gradient calculation of OneHotMatrixpr

v0.12.7

Added support for GRUv3
The layers within Chain and Parallel may now have names.

v0.12.5

Added option to configure groups in Conv.
REPL printing via show displays parameter counts.

v0.12.4

Implemented an Embedding layer based on NNlib.gather and NNlib.scatter.

v0.12.1 - v0.12.3

CUDA.jl 3.0 support
Bug fixes and optimizations.

v0.12 (March 2021)

Add identity_init.
Add Orthogonal Matrix initialization as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.
Added Focal Loss function to Losses module
The Dense layer now supports inputs with multiple batch dimensions.
Dense and Conv layers no longer perform implicit type conversion.
The keyword initW is of Dense layers is now init, to agree with convolutional layers.
Excise datasets in favour of other providers in the julia ecosystem.
Added option to set bias to false to eliminating bias from being trained.
Add CTC loss function to Losses module
Removed kwarg only constructors for convolutional layers.
Add sparse initialization as described in Deep learning via Hessian-free optimization.
Moved GPU CI to use buildkite instead of GitLab
New Parallel layer adds inception module-like building blocks.
Feature additions and bug fixes for BatchNorm, LayerNorm, InstanceNorm, and GroupNorm normalization layers
Added Upsample and PixelShuffle layers
End of deprecation cycle: loss functions cannot be accessed directly from Flux anymore, they live in the Flux.Losses module. All loss functions perform mean aggregation by default.

v0.11.2

Adds the AdaBelief optimiser.
Other new features and bug fixes (see GitHub releases page)

v0.11 (July 2020)

Moved CUDA compatibility to use CUDA.jl instead of CuArrays.jl
Add kaiming initialization methods: kaiming_uniform and kaiming_normal
Use DataLoader with NamedTuples, so that tensors can be accessed by name.
Error if Dense layers weights and biases are not arrays.
Add Adaptive Pooling in Flux layers.
Change to DataLoader's constructor
Uniform loss interface
Loss functions now live in the Flux.Losses module
Optimistic ADAM (OADAM) optimiser for adversarial training.
Add option for same padding to conv and pooling layers by setting pad=SamePad().
Added option to set bias to Flux.Zeros to eliminating bias from being trained.
Added GlobalMaxPool and GlobalMeanPool layers for performing global pooling operations.
Added ClipValue and ClipNorm in this pr to Flux.Optimise to provide a cleaner API for gradient clipping.
Added new kwarg-only constructors for the various convolutional layers.
Documented the convolutional layer constructors accepting weight and bias keyword arguments to supply custom arrays for those fields.
Testing suite improvements now test for gradients of all layers along with GPU support.
Functors have now moved to Functors.jl to allow for their use outside of Flux.
Added helper functions Flux.convfilter and Flux.depthwiseconvfilter to construct weight arrays for convolutions outside of layer constructors so as to not have to depend on the default layers for custom implementations.
dropout function now has a mandatory active keyword argument. The Dropout struct (whose behavior is left unchanged) is the recommended choice for common usage.
and many more fixes and additions...

v0.10.1 - v0.10.4

See GitHub's releases.

v0.10.0 (November 2019)

The default AD engine has switched from Tracker to Zygote.jl
- The dependency on Tracker.jl has been removed.
- This means Flux now does not depend on using a specialised TrackedArray type, and can be used with normal Array implementations directly.
- Tracker compatibility is maintained in most common cases, but Zygote will be the preferred AD backend for Flux from now on.
The CUDNN wrappers have been moved from Flux into CuArrays, to allow for better supporting the CUDA backend, and improve user experience, not to mention making Flux lean.
*crossentropy functions now work as expected with CuArrays. PR for binarycrossentropy.
Added clearer docs around training and the Optimiser interface.
Layer initialisations have been improved with a clearer API on how to extend it for other purposes.
Better messaging around CUDA availability, with hooks to initialize the GPU as default where possible.
@treelike has been formalised as a functor, with an effective deprecation.
testmode! is deprecated in favour of istraining

v0.9.0

Depthwise convolutional layer API changes from in => mult channel specification to in => out channel specification, and deprecates implicit out constructor.
New SkipConnection, which can be used to train residual neural network architectures.
New RADAM optimiser.

v0.8.0

Dropout now has a dims argument for specifying the unbroadcast dimensions.
New ConvTranspose layer.
New Maxout layer
Datasets are now hash verified on download to avoid corruption.
We now zero the initial state for RNNs.
Normalisation can now work on arbitrary dims.
Many docs and bugfixes thanks to @KristofferC and others.
NamedTuples now work like Tuples when doing mapleaves.
New "performance tips" section of the docs.
The training loop is now more readable and better shows how to use the lower-level APIs.
New AlphaDropout.
Data.Iris makes Fisher's Iris dataset available with Iris.labels and Iris.features.
New InstanceNorm, as popularized by Instance Normalization: The Missing Ingredient for Fast Stylization.
New GroupNorm, as described in Group Normalization.
New CrossCor.

AD Changes:

det, logdet and logabsdet now have adjoints.
Support for PermuteDimsArray.
Flux.Tracker is now its own package, in preparation for replacing it with Zygote.

v0.7.0

Despite the heroic efforts of scholars and archeologists, pre-0.7 history is lost to the sands of time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEWS.md

NEWS.md

Flux Release Notes

v0.16.0 (15 December 2025)

v0.15.0 (5 December 2024)

v0.14.22

v0.14.18

v0.14.17

v0.14.13

v0.14.12

v0.14.0 (July 2023)

v0.13.17

v0.13.16

v0.13.15

v0.13.14

v0.13.13

v0.13.12

v0.13.9

v0.13.7

v0.13.6

v0.13.4

v0.13 (April 2022)

v0.12.10

v0.12.9

v0.12.8

v0.12.7

v0.12.5

v0.12.4

v0.12.1 - v0.12.3

v0.12 (March 2021)

v0.11.2

v0.11 (July 2020)

v0.10.1 - v0.10.4

v0.10.0 (November 2019)

v0.9.0

v0.8.0

v0.7.0

Files

NEWS.md

Latest commit

History

NEWS.md

File metadata and controls

Flux Release Notes

v0.16.0 (15 December 2025)

v0.15.0 (5 December 2024)

v0.14.22

v0.14.18

v0.14.17

v0.14.13

v0.14.12

v0.14.0 (July 2023)

v0.13.17

v0.13.16

v0.13.15

v0.13.14

v0.13.13

v0.13.12

v0.13.9

v0.13.7

v0.13.6

v0.13.4

v0.13 (April 2022)

v0.12.10

v0.12.9

v0.12.8

v0.12.7

v0.12.5

v0.12.4

v0.12.1 - v0.12.3

v0.12 (March 2021)

v0.11.2

v0.11 (July 2020)

v0.10.1 - v0.10.4

v0.10.0 (November 2019)

v0.9.0

v0.8.0

v0.7.0