- M.F. Al-Gailani
Cybersecurity Engineering Department, College of Information Engineering, Al-Nahrain University, Baghdad, Iraq.
m.falih@nahrainuniv.edu.iq 0000-0003-4307-941X
ISSN: 2182-2069 (printed) / ISSN: 2182-2077 (online)
Hardware Design and Implementation of Secure Hash Algorithm Based on FPGA
The Secure Hash Algorithm is an essential part of data security. It has numerous applications, the most important of which is data integrity, which ensures that the received data matches the transmitted data and has not been tampered with during transmission or storage. Achieving data security often requires the use of a combination of algorithms and protocols to ensure data confidentiality, integrity, and authenticity. Therefore, a hardware implementation is the optimal solution to meet data security requirements and implement these algorithms in a coordinated and time-efficient manner, ensuring high execution speed compared to software implementation. In this research paper, hardware is designed for SHA-2 and implemented on a Xilinx XC7VX330T-3FFG1761 device using Xilinx ISE 14.7. The architecture is designed for message digest lengths of 256 and 512 bits. It is also suitable for other SHA-2 families by truncating the extra bits. The design aims to optimize performance with minimal complexity to suit resource-constrained devices and applications. Therefore, an iterative looping architecture was initially chosen to achieve this goal. Implementation results of the proposed architecture were satisfactory, with a throughput of 1.021 Gbps for SHA-256 and 1.331 Gbps for SHA-512. In addition, the number of slice registers was 333 for SHA-256 and 699 for SHA-512. The total power consumption for implementing the architecture was 0.394 W for SHA-256 and 0.608 W for SHA-512. A hybrid architecture for SHA-256 was then designed by replicating the hardware of a single round two and four times, allowing for the processing of two and four rounds per iteration. This reduces the number of iterations by half and a quarter, improving performance and increasing throughput by approximately 63% and 87% while increasing the number of slices LUTs and the number of occupied slices by only 5% and 30% (for unfolded 2), and 12% and 34% (for unfolded 4).