See also github's page for a complete list of PRs merged before each release.
This release has a single breaking change:
- The recurrent cells
RNNCell
,LSTMCell
, andGRUCell
forward has been changed to$y_t, state_t = cell(x_t, state_{t-1})$ . Previously, it was$state_t = cell(x_t, state_{t-1})$ .
Other highlights include:
- Added
WeightNorm
normalization layer. - Added
Recurrence
layer, turning a recurrent layer into a layer processing the entire sequence at once.
This release includes two breaking changes:
- The recurrent layers have been thoroughly revised. See below and read the documentation for details.
- Flux now defines and exports its own gradient function. Consequently, using gradient in an unqualified manner (e.g., after
using Flux, Zygote
) could result in an ambiguity error.
The most significant updates and deprecations are as follows:
- Recurrent layers have undergone a complete redesign in PR 2500.
RNNCell
,LSTMCell
, andGRUCell
are now exported and provide functionality for single time-step processing:rnncell(x_t, h_t) -> h_{t+1}
.RNN
,LSTM
, andGRU
no longer store the hidden state internally, it has to be explicitely passed to the layer. Moreover, they now process entire sequences at once, rather than one element at a time:rnn(x, h) -> h′
.- The
Recur
wrapper has been deprecated and removed. - The
reset!
function has also been removed; state management is now entirely up to the user.
- The
Flux.Optimise
module has been deprecated in favor of the Optimisers.jl package. Now Flux re-exports the optimisers from Optimisers.jl. Most users will be uneffected by this change. The module is still available for now, but will be removed in a future release. - Most Flux layers will re-use memory via
NNlib.bias_act!
, when possible. - Further support for Enzyme.jl, via methods of
Flux.gradient(loss, Duplicated(model))
. Flux now owns & exportsgradient
andwithgradient
, but withoutDuplicated
this still defaults to calling Zygote.jl. Flux.params
has been deprecated. Use Zygote's explicit differentiation instead,gradient(m -> loss(m, x, y), model)
, or useFlux.trainables(model)
to get the trainable parameters.- Flux now requires Functors.jl v0.5. This new release of Functors assumes all types to be functors by default. Therefore, applying
Flux.@layer
orFunctors.@functor
to a type is no longer strictly necessary for Flux's models. However, it is still recommended to use@layer Model
for additional functionality like pretty printing. @layer Model
now behaves the same as@layer :expand Model
, which means that the model is expanded into its sublayers (if there are any) when printed. To force compact printing, use@layer :noexpand Model
.
- Data movement between devices is now provided by MLDataDevices.jl.
- Add support for distributed data parallel training.
- MPI and NCCL backend available with
FluxMPIExt
andFluxMPINCCLExt
extensions respectively.
- Add support for Enzyme with
Flux.train!
.
- New macro
Flux.@layer
which should be used in place of@functor
. This also addsshow
methods for pretty printing.
- New
SignDecay
optimiser, likeWeightDecay
but for L1 norm.
- Flux now requires julia v1.9 or later.
- CUDA.jl is not a hard dependency anymore. Support is now provided through the extension mechanism, by loading
using Flux, CUDA
. The package cuDNN.jl also needs to be installed in the environment. (You will get instructions if this is missing.) - After a deprecations cycle, the macro
@epochs
and the functionsFlux.stop
,Flux.skip
,Flux.zeros
,Flux.ones
have been removed.
- Apple's Metal GPU acceleration preliminary support via the extension mechanism.
- Most greek-letter keyword arguments are deprecated in favour of ascii.
Thus
LayerNorm(3; ϵ=1e-4)
(notε
!) should becomeLayerNorm(3; eps=1e-4)
. DataLoader(...) |> gpu
will now produce a special iterator, moving each batch as needed, instead of giving an error.- Added
Flux.state
returning the internal state of the model for serialization.
- Added MultiHeadAttention layer.
f16, f32, f64
now specifically target floating point arrays (i.e. integers arrays and other types are preserved).f16, f32, f64
can now handleComplex{<:AbstractFloat}
arrays.- Added
EmbeddingBag
layer.
- Fixed various deprecation warnings, from
Zygone.@nograd
andVararg
. - Initial support for
AMDGPU
via extension mechanism. - Add
gpu_backend
preference to select GPU backend usingLocalPreference.toml
. - Add
Flux.gpu_backend!
method to switch between GPU backends.
- Added
f16
which changes precision toFloat16
, recursively. - Most layers standardise their input to
eltype(layer.weight)
, #2156, to limit the cost of accidental Float64 promotion. - Friendlier errors from size mismatches #2176.
- CUDA.jl 4.0 compatibility.
- Use
dropout
from NNlib as back-end forDropout
layer.
- New method of
train!
using Zygote's "explicit" mode. Part of a move away from "implicit"Params
. - Added Flux.setup, which is
Optimisers.setup
with extra checks, and translation from deprecated "implicit" optimisers likeFlux.Optimise.Adam
to new ones from Optimisers.jl.
- Added
@autosize
macro, as another way to useoutputsize
. - Export
Embedding
.
- Use the package OneHotArrays.jl instead of having the same code here.
- Added
@autosize
macro
- Added
PairwiseFusion
layer - Re-name
ADAM
toAdam
, etc (with deprecations).
- After a deprecations cycle, the datasets in
Flux.Data
have been removed in favour of MLDatasets.jl. params
is not exported anymore since it is a common name and is also exported by Distributions.jlflatten
is not exported anymore due to clash withIterators.flatten
.- Remove Juno.jl progress bar support as it is now obsolete.
Dropout
gained improved compatibility with Int and Complex arrays and is now twice-differentiable.- Notation
Dense(2 => 3, σ)
for channels matchesConv
; the equivalentDense(2, 3, σ)
still works. - Many utily functions and the
DataLoader
are now provided by MLUtils.jl. - The DataLoader is now compatible with generic dataset types implementing
MLUtils.numobs
andMLUtils.getobs
. - Added truncated normal initialisation of weights.
- The
Flux.Diagonal
layer is now calledScale
, and accepts an activation function. loadparams!
is replaced byloadmodel!
which copies trainable + non-trainable parameters and performs more thorough structural checking
Dropout
/AlphaDropout
now supports user-specified RNGs
- Fixed incorrect output and added GPU compatibility for AlphaDropout.
- Add trilinear Upsample layer.
- Improved performance of RNNs
- Optimisers now accept an
ϵ
argument - Improved handling of complex values inputs while training
- Fixed AlphaDropout
- Optimised inference and gradient calculation of OneHotMatrixpr
- Added support for
GRUv3
- The layers within
Chain
andParallel
may now have names.
- Implemented an
Embedding layer
based onNNlib.gather
andNNlib.scatter
.
- CUDA.jl 3.0 support
- Bug fixes and optimizations.
- Add identity_init.
- Add Orthogonal Matrix initialization as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.
- Added Focal Loss function to Losses module
- The Dense layer now supports inputs with multiple batch dimensions.
- Dense and Conv layers no longer perform implicit type conversion.
- The keyword
initW
is of Dense layers is nowinit
, to agree with convolutional layers. - Excise datasets in favour of other providers in the julia ecosystem.
- Added option to set
bias
to false to eliminatingbias
from being trained. - Add CTC loss function to Losses module
- Removed kwarg only constructors for
convolutional layers
. - Add sparse initialization as described in Deep learning via Hessian-free optimization.
- Moved GPU CI to use buildkite instead of GitLab
- New
Parallel
layer adds inception module-like building blocks. - Feature additions and bug fixes for BatchNorm, LayerNorm, InstanceNorm, and GroupNorm normalization layers
- Added Upsample and PixelShuffle layers
- End of deprecation cycle: loss functions cannot be accessed directly from
Flux
anymore, they live in theFlux.Losses
module. All loss functions performmean
aggregation by default.
- Adds the AdaBelief optimiser.
- Other new features and bug fixes (see GitHub releases page)
- Moved CUDA compatibility to use CUDA.jl instead of CuArrays.jl
- Add kaiming initialization methods: kaiming_uniform and kaiming_normal
- Use
DataLoader
withNamedTuple
s, so that tensors can be accessed by name. - Error if Dense layers weights and biases are not arrays.
- Add Adaptive Pooling in Flux layers.
- Change to
DataLoader
's constructor - Uniform loss interface
- Loss functions now live in the
Flux.Losses
module - Optimistic ADAM (OADAM) optimiser for adversarial training.
- Add option for same padding to conv and pooling layers by setting
pad=SamePad()
. - Added option to set
bias
to Flux.Zeros to eliminatingbias
from being trained. - Added
GlobalMaxPool
andGlobalMeanPool
layers for performing global pooling operations. - Added
ClipValue
andClipNorm
in this pr toFlux.Optimise
to provide a cleaner API for gradient clipping. - Added new kwarg-only constructors for the various convolutional layers.
- Documented the convolutional layer constructors accepting
weight
andbias
keyword arguments to supply custom arrays for those fields. - Testing suite improvements now test for gradients of all layers along with GPU support.
- Functors have now moved to Functors.jl to allow for their use outside of Flux.
- Added helper functions
Flux.convfilter
andFlux.depthwiseconvfilter
to construct weight arrays for convolutions outside of layer constructors so as to not have to depend on the default layers for custom implementations. dropout
function now has a mandatory active keyword argument. TheDropout
struct (whose behavior is left unchanged) is the recommended choice for common usage.- and many more fixes and additions...
See GitHub's releases.
- The default AD engine has switched from Tracker to Zygote.jl
- The dependency on Tracker.jl has been removed.
- This means Flux now does not depend on using a specialised
TrackedArray
type, and can be used with normal Array implementations directly. - Tracker compatibility is maintained in most common cases, but Zygote will be the preferred AD backend for Flux from now on.
- The CUDNN wrappers have been moved from Flux into CuArrays, to allow for better supporting the CUDA backend, and improve user experience, not to mention making Flux lean.
*crossentropy
functions now work as expected with CuArrays. PR for binarycrossentropy.- Added clearer docs around training and the Optimiser interface.
- Layer initialisations have been improved with a clearer API on how to extend it for other purposes.
- Better messaging around CUDA availability, with hooks to initialize the GPU as default where possible.
@treelike
has been formalised as a functor, with an effective deprecation.testmode!
is deprecated in favour of istraining
- Depthwise convolutional layer API changes from
in => mult
channel specification toin => out
channel specification, and deprecates implicitout
constructor. - New SkipConnection, which can be used to train residual neural network architectures.
- New RADAM optimiser.
- Dropout now has a
dims
argument for specifying the unbroadcast dimensions. - New ConvTranspose layer.
- New Maxout layer
- Datasets are now hash verified on download to avoid corruption.
- We now zero the initial state for RNNs.
- Normalisation can now work on arbitrary
dims
. - Many docs and bugfixes thanks to @KristofferC and others.
- NamedTuples now work like Tuples when doing
mapleaves
. - New "performance tips" section of the docs.
- The training loop is now more readable and better shows how to use the lower-level APIs.
- New AlphaDropout.
- Data.Iris makes Fisher's Iris dataset available with
Iris.labels
andIris.features
. - New InstanceNorm, as popularized by Instance Normalization: The Missing Ingredient for Fast Stylization.
- New GroupNorm, as described in Group Normalization.
- New CrossCor.
AD Changes:
det
,logdet
andlogabsdet
now have adjoints.- Support for PermuteDimsArray.
- Flux.Tracker is now its own package, in preparation for replacing it with Zygote.
Despite the heroic efforts of scholars and archeologists, pre-0.7 history is lost to the sands of time.