Hardware implementation of post quantum signature scheme "Falcon"
The purpose of this thesis is to provide a hardware-based implementation of the Falcon post quantum crypto scheme in the form of a hardware accelerator. As the crypto scheme uses the Fourier representation of polynomials to speed up the operations between them, the most time-consuming operation used...
Κύριος συγγραφέας: | |
---|---|
Άλλοι συγγραφείς: | |
Γλώσσα: | English |
Έκδοση: |
2022
|
Θέματα: | |
Διαθέσιμο Online: | https://hdl.handle.net/10889/23586 |
Περίληψη: | The purpose of this thesis is to provide a hardware-based implementation of the Falcon post quantum crypto scheme in the form of a hardware accelerator. As the crypto scheme uses the Fourier representation of polynomials to speed up the operations between them, the most time-consuming operation used is the actual forward and inverse Fast Fourier Transformation.
The FFT/IFFT transformation in the referenced implementation of the algorithm uses emulated double precision arithmetic operations with integer arithmetic and bit manipulations. This is done to provide compatibility with systems such as low area and power micro-processors that does not host an FPU. An optimized implementation is also provided, that uses double precision arithmetic for CPU’s that can perform double precision operations.
In the following pages a hardware implementation of a 1024-point Radix-2 DIF FFT is presented. The design uses double precision arithmetic and pre-calculated roots of unity to perform FFT and IFFT operations for sizes up to 1024 points. The maximum operational frequency is 100Mhz and can fit in a small FPGA such as a Zynq Z-7010.
For the comparison of the improvement the execution times from two processing systems are used: An I5-10600K CPU @ 4.10 GHz and a Cortex A9 CPU @ 660Mhz. Both CPUs contain an FPU and can use both the referenced and the optimized implementation. It will be shown that the design can improve the referenced implementation in both CPUs by 35%. It can also achieve the same performance for the optimized implementation as the 2 CPUS but for smaller clock frequencies, potentially lowering the power requirements of the calculation. |
---|