You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I guess everyone is aware that the current AES implementation is not resistant to traditional non-speculative side channels due to the direct usage of T-tables, which can reveal the information about the key being used (usage of T-tables basically creates a data access pattern that depends on the key). The key question is what would be the right approach for libtomcrypt to address this.
I see a number of options (each has its pros and cons):
Usage of dedicated CPU-specific instructions. Instructions like AES-NI [1] and their alternatives can be used to solve the above problem in a pretty straightforward way. Many open source libs have went this way, i.e. openssl, bearssl, etc. Pros: Easy to implement, the end implementation is fast and efficient, applicable to all modes Cons: Not a generic solution, CPU-specific.
Bitslicing. This is a technique [2] that allows to achieve good side-channel resistance for the modes that can be parallelized.
It is implemented in some crypto libraries, for example in bearSSL [3]. Pros: no CPU dependency, relatively efficient implementation Cons: Not useful for non-parallelizable modes.
Algebraic implementation. This approach would involve computing the result of S-box on the fly using algebraic structure of S-box, i.e. the fact that an Sbox is a combination of finding a byte-wise inverse in GF(2^8) + bitwise affine transformation. The computation itself should be very carefully implemented not to contain any side-channels as well. I am not aware of any major crypto library that follows this approach. Pros: no CPU dependency, applicable for all modes Cons: Likely to be slow and not efficient (albeit can try to use some speedup tricks given that we always work in the same GF)
IMO it'd make sense to have at least a bitsliced implementation in addition to the current one.
To go for CPU-specific instructions would be nice as well but something that should be provided additional to bitslicing... AES-NI for x86 or AES instructions for aarch64 could be a starter.
I don't think an algebraic implementation makes sense... I've played with htable [1] some years ago and it was really slow ... :) Maybe it could be made faster but additional to its slowness it was also big IIRC.
Just adding my support here for the addition of AES-NI. Many developers are currently looking for GCM with AES-NI due to the speed and security that's possible on Intel CPU's.
Detecting AES-NI capable CPU's can be done with just a few line of code, for example:
It appears this developer has a working example using AES-NI with LibTomCrypt for Linux (I was unable to get it to compile on Windows using MinGW or Visual Studio however, though that might be easily solvable):
I'd offer to help, but I'm afraid assembly is not in my skill set. I can assist with testing however - I have an array of OS's and Intel CPU's to test on, so just let me know.
Thanks for considering adding AES-NI to this already excellent library!
I guess everyone is aware that the current AES implementation is not resistant to traditional non-speculative side channels due to the direct usage of T-tables, which can reveal the information about the key being used (usage of T-tables basically creates a data access pattern that depends on the key). The key question is what would be the right approach for libtomcrypt to address this.
I see a number of options (each has its pros and cons):
Usage of dedicated CPU-specific instructions. Instructions like AES-NI [1] and their alternatives can be used to solve the above problem in a pretty straightforward way. Many open source libs have went this way, i.e. openssl, bearssl, etc.
Pros: Easy to implement, the end implementation is fast and efficient, applicable to all modes
Cons: Not a generic solution, CPU-specific.
Bitslicing. This is a technique [2] that allows to achieve good side-channel resistance for the modes that can be parallelized.
It is implemented in some crypto libraries, for example in bearSSL [3].
Pros: no CPU dependency, relatively efficient implementation
Cons: Not useful for non-parallelizable modes.
Algebraic implementation. This approach would involve computing the result of S-box on the fly using algebraic structure of S-box, i.e. the fact that an Sbox is a combination of finding a byte-wise inverse in GF(2^8) + bitwise affine transformation. The computation itself should be very carefully implemented not to contain any side-channels as well. I am not aware of any major crypto library that follows this approach.
Pros: no CPU dependency, applicable for all modes
Cons: Likely to be slow and not efficient (albeit can try to use some speedup tricks given that we always work in the same GF)
[1] https://en.wikipedia.org/wiki/AES_instruction_set
[2] https://link.springer.com/chapter/10.1007/11935070_14
[3] https://bearssl.org/constanttime.html#bitslicing
What are people's opinions on this for libtomcrypt? Which way is seen as a preferred direction?
The text was updated successfully, but these errors were encountered: