[1] B.H. Lee, & S.M. Kuo, “Real Time Digital Signal Processing, Implementations, Applications and Experiments with the TMS320C55x” John Wiley & Sons LTD, New York (2001) p. 330
[2] V. Gierenz, C. Panis, J. Nurmi, “Parameterized MAC unit generation for a scalable embedded DSP core,” Microprocessors and Microsystems, 34 (5), (2010), pp. 138–150.
[3] K. Benkrid, S. Belkacemi, “Design and implementation of a 2D convolution core for video applications on FPGAs,” Digital and Computational Video, DCV 2002. Proceedings. Third International Workshop on, (2002), pp. 85-92.
[4] M. Verhelst and B. Moons, “Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices,” IEEE Solid-State Circuits Magazine, 9(4), (2017), pp.55-65.
[5] J. Chang, H. Lee, and C. Choi, “A power-aware variable-precision multiply-accumulate unit,” in International Symposium on Communications and Information Technology, (2009), pp. 1336–1339.
[6] H. Lee, “Power-Aware Scalable Booth Multiplier,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E88-A, No. 11, (2005), pp.3230-3234.
[7] H. Jiang, F. J. H. Santiago, H. Mo, L. Liu, and J. Han, “Approximate arithmetic circuits: A survey, characterization and recent applications,” Proceedings of the IEEE, vol. 108, no. 12, pp. 2108-2135, Dec. 2020.
[8] L.Sousa, “Nonconventional Computer Arithmetic Circuits, Systems and Applications,” IEEE Circuits and Systems Magazine, vol. 21, no 1, pp. 6-40, March 2021.
[9] J. .Hu, Z. Li, M.Yang, Z.Huang, and W. Qian, “A high-accuracy approximate adder with correct sign calculation,” Integration, the VLSI Journal, vol. 65, pp. 370-388, March 2019.
[10] K. Verma et al., “Variable latency speculative addition: a new paradigm for arithmetic circuit design,” in Proc. Design, Automation and Test in Europe, pp. 1250—1255, 2008.
[11] K. Du, P. Varman, and K. Mohanram, “High performance reliable variable latency carry select addition,” Proc. Design, Autom. Test Eur., Mar. 2012, pp. 1257–1262.
[12] A. Cilardo, “A new speculative addition architecture suitable for two’s comple- ment operations,” in Proc. Design, Automation and Test in Europe, pp. 664—669, 2009.
[13] D. Kelly and J. Phillips, “Arithmetic data value speculation,” Adv. Comput. Syst. Architecture, Lecture Notes Comput. Sci., 2005, pp. 353–366.
[14] S. M. Nowick, K. Y. Yun, P. A. Beerel, and A. E. Dooply, “Speculative completion for the design of high-performance asynchronous dynamic adders,” in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, Apr. 1997, pp. 210–223.
[15] D. Esposito, D. De Caro, A.G.M. Strollo, “Variable Latency Speculative Parallel Prefix Adders for Unsigned and Signed Operands, ” IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 63, n. 8, pp. 1200-1209, Aug. 2016.
[16] I.-C. Lin, Y.-M. Yang, and C.-C. Lin, “High-performance low-power carry speculative addition with variable latency,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 9, pp. 1591–1603, Sep. 2015.
[17] Y. Choi and E. E. Swartzlander, “Speculative Carry Generation with Prefix Adder,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 3, pp. 321-326, March 2008.
[18] A. Cilardo, D. De Caro, N. Petra, F. Caserta, N. Mazzocca, E. Napoli, and A. G. M. Strollo, “High speed speculative multipliers based on speculative carry-save tree,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 12, (2014), pp. 3426–3435.
[19] D. Esposito, D. De Caro, E. Napoli, N. Petra and A. G. M. Strollo, “On the use of approximate adders in carry-save multiplieraccumulators,” IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, (2017), pp. 1-4.
[20] D. Esposito, A. G. M. Strollo, and M. Alioto, “Low-power approximate MAC unit,” inProc. IEEE PRIME, Giardini Naxos, Italy, Jun. (2017), pp. 81–84.
[21] G. A. Gillani, M. A. Hanif, M. Krone, S. H. Gerez, M. Shafique, andA. B. J. Kokkeler, “Designing approximate MAC accelerators with internal-self-healing,”IEEE Access, vol. 7, (2019), pp. 142–77.
[22] M. Masadeh, O. Hasan, and S. Tahar, “Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency,” IEEE Access, vol. 7, (2019), pp. 129–147.
[23] B. Parhami “Computer arithmetic, algorithms and hardware designs. " New York: Oxford Press; (2000).
[24] H. Parandeh-Afshar, S.M. Fakhraie, and O.Fatemi, “Parallel Merged Multiplier-Accumulator Coprocessor Optimized for Digital Filters”, Elsevier Journal of Computers and Electrical Engineering, no.36, (2008), pp.864-873.
[25] AA. Fayed, MA. Bayoumi “A merged multiplier–accumulator for high speed signal processing applications,” IEEE Trans VLSI, 3(2), (2002).
[26] J. Wang, L. Xu, H. Wang and C. Choy, “A high-speed pipeline architecture of squarer-accumulator (SQAC),” IEEE Region 10 Conference (TENCON), Singapore, (2016), pp. 3429-3432.
[27] L. S. Wallace, “A suggestion for fast multipliers, " IEEE Trans.Comput., vol. EC-13, (1964), pp. 14–17.
[28] R. S. Waters and E. E. Swartzlander, “A reduced complexity Wallace
multiplier reduction,” IEEE Transactions on Computers, vol. 59, no. 8,
pp. 1134–1137, August 2010.