Skip to content

Latest commit

 

History

History
254 lines (204 loc) · 17.3 KB

NEWS.md

File metadata and controls

254 lines (204 loc) · 17.3 KB

Flux Release Notes

See also github's page for a complete list of PRs merged before each release.

v0.16.0 (15 December 2025)

This release has a single breaking change:

  • The recurrent cells RNNCell, LSTMCell, and GRUCell forward has been changed to $y_t, state_t = cell(x_t, state_{t-1})$. Previously, it was $state_t = cell(x_t, state_{t-1})$.

Other highlights include:

  • Added WeightNorm normalization layer.
  • Added Recurrence layer, turning a recurrent layer into a layer processing the entire sequence at once.

v0.15.0 (5 December 2024)

This release includes two breaking changes:

  • The recurrent layers have been thoroughly revised. See below and read the documentation for details.
  • Flux now defines and exports its own gradient function. Consequently, using gradient in an unqualified manner (e.g., after using Flux, Zygote) could result in an ambiguity error.

The most significant updates and deprecations are as follows:

  • Recurrent layers have undergone a complete redesign in PR 2500.
    • RNNCell, LSTMCell, and GRUCell are now exported and provide functionality for single time-step processing: rnncell(x_t, h_t) -> h_{t+1}.
    • RNN, LSTM, and GRU no longer store the hidden state internally, it has to be explicitely passed to the layer. Moreover, they now process entire sequences at once, rather than one element at a time: rnn(x, h) -> h′.
    • The Recur wrapper has been deprecated and removed.
    • The reset! function has also been removed; state management is now entirely up to the user.
  • The Flux.Optimise module has been deprecated in favor of the Optimisers.jl package. Now Flux re-exports the optimisers from Optimisers.jl. Most users will be uneffected by this change. The module is still available for now, but will be removed in a future release.
  • Most Flux layers will re-use memory via NNlib.bias_act!, when possible.
  • Further support for Enzyme.jl, via methods of Flux.gradient(loss, Duplicated(model)). Flux now owns & exports gradient and withgradient, but without Duplicated this still defaults to calling Zygote.jl.
  • Flux.params has been deprecated. Use Zygote's explicit differentiation instead, gradient(m -> loss(m, x, y), model), or use Flux.trainables(model) to get the trainable parameters.
  • Flux now requires Functors.jl v0.5. This new release of Functors assumes all types to be functors by default. Therefore, applying Flux.@layer or Functors.@functor to a type is no longer strictly necessary for Flux's models. However, it is still recommended to use @layer Model for additional functionality like pretty printing.
  • @layer Modelnow behaves the same as @layer :expand Model, which means that the model is expanded into its sublayers (if there are any) when printed. To force compact printing, use @layer :noexpand Model.

v0.14.22

v0.14.18

v0.14.17

v0.14.13

  • New macro Flux.@layer which should be used in place of @functor. This also adds show methods for pretty printing.

v0.14.12

  • New SignDecay optimiser, like WeightDecay but for L1 norm.

v0.14.0 (July 2023)

  • Flux now requires julia v1.9 or later.
  • CUDA.jl is not a hard dependency anymore. Support is now provided through the extension mechanism, by loading using Flux, CUDA. The package cuDNN.jl also needs to be installed in the environment. (You will get instructions if this is missing.)
  • After a deprecations cycle, the macro @epochs and the functions Flux.stop, Flux.skip, Flux.zeros, Flux.ones have been removed.

v0.13.17

  • Apple's Metal GPU acceleration preliminary support via the extension mechanism.

v0.13.16

  • Most greek-letter keyword arguments are deprecated in favour of ascii. Thus LayerNorm(3; ϵ=1e-4) (not ε!) should become LayerNorm(3; eps=1e-4).
  • DataLoader(...) |> gpu will now produce a special iterator, moving each batch as needed, instead of giving an error.
  • Added Flux.state returning the internal state of the model for serialization.

v0.13.15

  • Added MultiHeadAttention layer.
  • f16, f32, f64 now specifically target floating point arrays (i.e. integers arrays and other types are preserved).
  • f16, f32, f64 can now handle Complex{<:AbstractFloat} arrays.
  • Added EmbeddingBag layer.

v0.13.14

  • Fixed various deprecation warnings, from Zygone.@nograd and Vararg.
  • Initial support for AMDGPU via extension mechanism.
  • Add gpu_backend preference to select GPU backend using LocalPreference.toml.
  • Add Flux.gpu_backend! method to switch between GPU backends.

v0.13.13

  • Added f16 which changes precision to Float16, recursively.
  • Most layers standardise their input to eltype(layer.weight), #2156, to limit the cost of accidental Float64 promotion.
  • Friendlier errors from size mismatches #2176.

v0.13.12

  • CUDA.jl 4.0 compatibility.
  • Use dropout from NNlib as back-end for Dropout layer.

v0.13.9

  • New method of train! using Zygote's "explicit" mode. Part of a move away from "implicit" Params.
  • Added Flux.setup, which is Optimisers.setup with extra checks, and translation from deprecated "implicit" optimisers like Flux.Optimise.Adam to new ones from Optimisers.jl.

v0.13.7

  • Added @autosize macro, as another way to use outputsize.
  • Export Embedding.

v0.13.6

v0.13.4

v0.13 (April 2022)

  • After a deprecations cycle, the datasets in Flux.Data have been removed in favour of MLDatasets.jl.
  • params is not exported anymore since it is a common name and is also exported by Distributions.jl
  • flatten is not exported anymore due to clash with Iterators.flatten.
  • Remove Juno.jl progress bar support as it is now obsolete.
  • Dropout gained improved compatibility with Int and Complex arrays and is now twice-differentiable.
  • Notation Dense(2 => 3, σ) for channels matches Conv; the equivalent Dense(2, 3, σ) still works.
  • Many utily functions and the DataLoader are now provided by MLUtils.jl.
  • The DataLoader is now compatible with generic dataset types implementing MLUtils.numobs and MLUtils.getobs.
  • Added truncated normal initialisation of weights.
  • The Flux.Diagonal layer is now called Scale, and accepts an activation function.
  • loadparams! is replaced by loadmodel! which copies trainable + non-trainable parameters and performs more thorough structural checking

v0.12.10

v0.12.9

v0.12.8

  • Optimised inference and gradient calculation of OneHotMatrixpr

v0.12.7

  • Added support for GRUv3
  • The layers within Chain and Parallel may now have names.

v0.12.5

  • Added option to configure groups in Conv.
  • REPL printing via show displays parameter counts.

v0.12.4

v0.12.1 - v0.12.3

  • CUDA.jl 3.0 support
  • Bug fixes and optimizations.

v0.12 (March 2021)

v0.11.2

  • Adds the AdaBelief optimiser.
  • Other new features and bug fixes (see GitHub releases page)

v0.11 (July 2020)

  • Moved CUDA compatibility to use CUDA.jl instead of CuArrays.jl
  • Add kaiming initialization methods: kaiming_uniform and kaiming_normal
  • Use DataLoader with NamedTuples, so that tensors can be accessed by name.
  • Error if Dense layers weights and biases are not arrays.
  • Add Adaptive Pooling in Flux layers.
  • Change to DataLoader's constructor
  • Uniform loss interface
  • Loss functions now live in the Flux.Losses module
  • Optimistic ADAM (OADAM) optimiser for adversarial training.
  • Add option for same padding to conv and pooling layers by setting pad=SamePad().
  • Added option to set bias to Flux.Zeros to eliminating bias from being trained.
  • Added GlobalMaxPool and GlobalMeanPool layers for performing global pooling operations.
  • Added ClipValue and ClipNorm in this pr to Flux.Optimise to provide a cleaner API for gradient clipping.
  • Added new kwarg-only constructors for the various convolutional layers.
  • Documented the convolutional layer constructors accepting weight and bias keyword arguments to supply custom arrays for those fields.
  • Testing suite improvements now test for gradients of all layers along with GPU support.
  • Functors have now moved to Functors.jl to allow for their use outside of Flux.
  • Added helper functions Flux.convfilter and Flux.depthwiseconvfilter to construct weight arrays for convolutions outside of layer constructors so as to not have to depend on the default layers for custom implementations.
  • dropout function now has a mandatory active keyword argument. The Dropout struct (whose behavior is left unchanged) is the recommended choice for common usage.
  • and many more fixes and additions...

v0.10.1 - v0.10.4

See GitHub's releases.

v0.10.0 (November 2019)

  • The default AD engine has switched from Tracker to Zygote.jl
    • The dependency on Tracker.jl has been removed.
    • This means Flux now does not depend on using a specialised TrackedArray type, and can be used with normal Array implementations directly.
    • Tracker compatibility is maintained in most common cases, but Zygote will be the preferred AD backend for Flux from now on.
  • The CUDNN wrappers have been moved from Flux into CuArrays, to allow for better supporting the CUDA backend, and improve user experience, not to mention making Flux lean.
  • *crossentropy functions now work as expected with CuArrays. PR for binarycrossentropy.
  • Added clearer docs around training and the Optimiser interface.
  • Layer initialisations have been improved with a clearer API on how to extend it for other purposes.
  • Better messaging around CUDA availability, with hooks to initialize the GPU as default where possible.
  • @treelike has been formalised as a functor, with an effective deprecation.
  • testmode! is deprecated in favour of istraining

v0.9.0

v0.8.0

AD Changes:

v0.7.0

Despite the heroic efforts of scholars and archeologists, pre-0.7 history is lost to the sands of time.