This spec covers how equality is compiled and executed by the F# compiler and library, based mainly on the types involved in the equality operation after all inlining, type specialization and other optimizations have been applied.
This spec is about the semantics and performance of the following coding constructs
a = b
a <> b
It is also about the semantics and performance of uses of the following FSharp.Core
constructs which, after inlining, generate code that contains an equality check at the specific EQTYPE
HashIdentity.Structural<'T>
{Array,Seq,List}.contains
{Array,Seq,List}.countBy
{Array,Seq,List}.groupBy
{Array,Seq,List}.distinct
{Array,Seq,List}.distinctBy
{Array,Seq,List}.except
All of which have implied equality checks. Some of these operations are inlined, see below, which in turn affects the semantics and performance of the overall operation.
In math, a (binary) relation is a way to describe a relationship between the elements of sets. "Greater than" is a relation for numbers, "Subset of" is a relation for sets.
Here we talk about 3 particular relations:
- Reflexivity - every element is related to itself
- For integers,
=
is reflexive (a = a
is always true) and>
is not (a > a
is never true)
- Symmetry - if
a
is related tob
, thenb
is related toa
- For integers,
=
is symmetric (a = b
->b = a
) and>
is not (ifa > b
thenb > a
is false)
- Transitivity - if
a
is related tob
, andb
is related toc
, thena
is also relatedc
- For integers,
>
is transitive (a > b
&&b > c
->a > c
) and√
is not (a = √b
&&b = √c
doesn't meana = √c
)
If a relation has 1, 2, and 3, we talk about Equivalence Relation (ER). If a relation only has 2 and 3, we talk about Partial Equivalence Relation (PER).
This matters in comparing floats since they include NaN. Depending on if we consider NaN = NaN
true or false, we talk about ER or PER comparison respectively.
The static type known to the F# compiler is crucial to determining the performance of the operation. The runtime type of the equality check is also significant in some situations.
Here we define the relevant static type EQTYPE
for the different constructs above:
a = b
:EQTYPE
is the statically known type ofa
orb
a <> b
:EQTYPE
is the statically known type ofa
orb
HashIdentity.Structural<'T>
,EQTYPE
is the inlined'T
(results in specialized equality)Array.contains<'T>
,EQTYPE
is the inlined'T
(results in specialized equality)List.contains<T>
likewiseSeq.contains<T>
likewise
These only result in naked generic equality if themselves used from a non-inlined generic context.
Array.groupBy<'Key, 'T> f array
,EQTYPE
is non-inlined'Key
, results in naked generic equalityArray.countBy array
likewise for'T
Array.distinct<'T> array
likewiseArray.distinctBy array
likewiseArray.except array
likewiseList.groupBy
likewiseList.countBy
likewiseList.distinct
likewiseList.distinctBy
likewiseList.except
likewiseSeq.groupBy
likewiseSeq.countBy
likewiseSeq.distinct
likewiseSeq.distinctBy
likewiseSeq.except
likewise
These always result in naked generic equality checks.
Example 1:
let x = HashIdentity.Structural<byte> // EQTYPE known to compiler is `byte`
Example 2 (a non-inlined "naked" generic context):
let f2<'T> () =
... some long code
// EQTYPE known to the compiler is `'T`
// RUNTIME-EQTYPE known to the library is `byte`
let x = HashIdentity.Structural<'T>
... some long code
f2<byte>() // performance of this is determined by EQTYPE<'T> and RUNTIME-EQTYPE<byte>
Example 3 (an inlined generic context):
let f3<'T> () =
... some long code
// EQTYPE known to the compiler is `byte`
// RUNTIME-EQTYPE known to the library is `byte`
let x = HashIdentity.Structural<'T>
... some long code
f3<byte>() // performance of this is determined by EQTYPE<byte> and RUNTIME-EQTYPE<byte>
Example 4 (a generic struct type in a non-inline generic context):
let f4<'T> () =
... some long code
// EQTYPE known to the compiler is `SomeStructType<'T>`
// RUNTIME-EQTYPE known to the library is `SomeStructType<byte>`
let x = HashIdentity.Structural<SomeStructType<'T>>
... some long code
f4<byte>() // performance of this determined by EQTYPE<SomeStructType<'T>> and RUNTIME-EQTYPE<SomeStructType<byte>>
This very much depends on the EQTYPE
involved in the equality as known by the compiler
Aim here is to flesh these all out with:
- Semantics: what semantics the user expects, and what the semantics actually is
- Perf expectation: what perf the user expects
- Compilation today: How we actually compile today
- Perf today: What is the perf we achieve today
- (Optional) sharplab.io link to how things are in whatever version is selected in sharplab
- (Optional) notes
let f (x: int) (y: int) = (x = y)
- Semantics: equality on primitive
- Perf: User expects full performance down to native
- Compilation today: compiles to IL instruction ✅
- Perf today: good ✅
- sharplab int32
let f (x: float32) (y: float32) = (x = y)
- Semantics: IEEE floating point equality (respecting NaN etc.)
- Perf: User expects full performance down to native
- Compilation today: compiles to IL instruction ✅
- Perf today: good ✅
- sharplab float32
- Semantics: .NET equivalent equality, non-localized for strings
- Perf: User expects full performance down to native
- Compilation today: compiles to
String.Equals
orDecimal.op_Equality
call ✅ - Perf today: good ✅
- sharplab decimal
- sharplab string
- Semantics: User expects structural
- Perf: User expects flattening to constituent checks
- Compilation today: tuple equality is flattened to constituent checks ✅
- Perf today: good ✅
- sharplab (int * double * 'T), with example reductions/optimizations noted
- Semantics: User expects structural
- Perf: User expects flattening to constituent checks
- Compilation today: not flattened, compiled to
GenericEqualityIntrinsic
- Perf today: the check does type tests, does virtual calls via
IStructuralEqualityComparer
, boxes etc. ❌(Problem1) - sharplab for size 6
- Semantics: User expects structural
- Perf: User expects flattening to constituent checks or at least the same optimizations as tuples
- Compilation today: compiled to
GenericEqualityIntrinsic
- Perf today: boxes, does type tests, does virtual calls via
IStructuralEqualityComparer
etc. ❌(Problem2) - sharplab for size 3
- Semantics: User expects identical to equality on the underlying type
- Perf: User expects same perf as flattening to underlying type
- Compilation today: flattens to underlying type
- Perf today: good ✅
- sharplab for C# enum int
- sharplab for F# enum int
- Semantics: User expects call to
IEquatable<T>
if present, but F# spec says callthis.Equals(box that)
, in practice these are the same - Perf expected: no boxing
- Compilation today: EqualityComparer.Default
- Perf today: good ✅
- sharplab
- Semantics: User expects field-by-field structural equality with no boxing
- Perf expected: no boxing
- Compilation today:
GenericEqualityIntrinsic<SomeStructType>
- Perf today: always boxes (Problem3 ❌)
- sharplab
- Note: the optimization path is a bit strange here, see the reductions below
Details
(x = y)
--inline-->
GenericEquality x y
--inline-->
GenericEqualityFast x y
--inline-->
GenericEqualityIntrinsic x y
--devirtualize-->
x.Equals(box y, LanguagePrimitives.GenericEqualityComparer);
The struct type has these generated methods:
override bool Equals(object y)
override bool Equals(SomeStruct obj)
override bool Equals(object obj, IEqualityComparer comp) //with EqualsVal
These call each other in sequence, boxing then unboxing then boxing. We do NOT generate this method, we probably should:
override bool Equals(SomeStruct obj, IEqualityComparer comp) //with EqualsValUnboxed
If we did, the devirtualizing optimization should reduce to this directly, which would result in no boxing.
- Semantics: User expects structural
- Perf expected: User expects perf is sum of constituent parts
- Compilation today: either
FSharpEqualityComparer_PER`1<uint8[]>::get_EqualityComparer().Equals(...)
orFSharpEqualityComparer_PER`1<T[]>::get_EqualityComparer().Equals(...)
- Perf today: good ✅
- sharplab for
byte[]
Here "large" means the compiler-generated structural equality is NOT inlined.
- Semantics: User expects structural by default
- Perf expected: User expects perf is sum of constituent parts, type-specialized if generic
- Compilation today: direct call to
Equals(T)
- Perf today: the call to
Equals(T)
has specialized code but boxes fields if struct or generic, see Problem3 ❌, Problem4 ❌
Here "tiny" means the compiler-generated structural equality IS inlined.
- Semantics: User expects structural by default
- Perf expected: User expects perf is sum of constituent parts, type-specialized if generic
- Compilation today:
FSharpEqualityComparer_ER`1<!a>::get_EqualityComparer().Equals(...)
- Perf today: good ✅
- Semantics: User expects the PER equality semantics of whatever
'T
actually is - Perf expected: User expects no boxing
- Compilation today:
FSharpEqualityComparer_ER`1<!a>::get_EqualityComparer().Equals(...)
- Perf today: good ✅
This case happens in structural equality for tuple types and other structural types
- Semantics: User expects the PER equality semantics of whatever
'T
actually is - Perf: User expects no boxing
- Compilation today:
FSharpEqualityComparer_ER`1<!a>::get_EqualityComparer().Equals(...)
- Perf today: good ✅
- Sharplab
- Flatten and inline
- RCG: Use reflective code generation internally in FSharp.Core
- KFS: Rely on known semantics of F# structural types and treat those as special
- TS: Hand-code type-specializations using static optimization conditions in FSharp.Core
- TT: Type-indexed tables of baked (poss by reflection) equality comparers and functions, where some pre-computation is done
- DV: De-virtualization
- DEQ: Use
EqualityComparer<'T>.Default
where possible
- Uses TT, DEQ, KFS, DV
- Focuses on solving Problem4
- 99% not breaking, apart from the case of value types with custom equality implemented differently than the
EqualityComparer.Default
- the change would lead to the usage of the custom implementation which is reasonable
Note: this included changes to the optimizer to reduce GenericEqualityIntrinsic down to a type-indexed table lookup fetching an IEqualityComparer
and calling it. These hand-coded reductions appear unnecessary as the reduction doesn't open up any further optimizations.