بررسی سربارهای سختافزاری و بهرهوری انرژی در پیادهسازی انواع چندیسازی ممیزثابت در شتابدهنده شبکه عصبی عمیق
محورهای موضوعی : هوش مصنوعی و رباتیکمرضیه مستعلی زاده 1 , سید علی انصارمحمدی 2 , نجمه نظری 3 , مصطفی ارسالی صالحی نسب 4 *
1 - مهندسی معماری کامپیوتر، دانشکده برق و کامپیوتر، دانشگاه تهران، تهران، ایران
2 - دانشجو مقطع دکتری، دانشکده مهندسی برق و کامپیوتر، دانشگاه تهران، تهران، ایران
3 - مهندسی معماری کامپیوتر، دانشکده برق و کامپیوتر، دانشگاه تهران، تهران، ایران
4 - دانشگاه تهران
کلید واژه: شبکههای عصبی عمیق, سیستمهای نهفته, بهرهوری انرژی, کوانتیزاسیون ممیزثابت,
چکیده مقاله :
یکی از کارآمدترین راهکارهای فشردهسازی و کاهش انرژی مصرفی شبکههای عصبی عمیق در دستگاههای نهفته، کوانتیزاسیون با استفاده از نمایش اعداد ممیز ثابت است. در سالهای اخیر، روشهای متنوعی برای بهبود صحت شبکههای کوانتیزهشده مطرح شده است که اغلب سربارهای محاسباتی زیادی به شبکه تحمیل میکنند، اگرچه این موضوع تاکنون از دید طراحان شبکههای عصبی عمیق پنهان ماندهاست. در این پژوهش، روشهای مختلف کوانتیزاسیون ممیزثابت، بر اساس مولفههای تاثیرگذار در سربارهای سخت افزاری، طبقهبندی و مدل شده است. پس از آن، معماریهای سختافزاری ارائهشده برای هریک از مدلها به صورت عادلانه، با در نظرگرفتن هزینهفایدهی بین صحت شبکه و بهرهوری انرژی سختافزار، بررسی و مقایسه میشوند. نتایج نشان میدهد تکنیکهایی که برای کاهش خطای روشهای کوانتیزاسیون به کار گرفته میشود، اگرچه به افزایش صحت شبکههای عصبی منجر میشود اما از طرف دیگر بهرهوری انرژی سختافزار را کاهش میدهد. براساس نتایج شبیهسازی، افزودن ضریب مقیاس و آفست به کوانتیزاسیون ممیزثابت LSQ، صحت شبکه را حدود 1/0 افزایش میدهد اما بهرهوری انرژی سختافزار حدود 3 برابر کمتر شده است. این موضوع لزوم توجه به سربارهای سختافزاری را بهخصوص در سیستمهای نهفته، بیش از پیش نشان میدهد.
Deep Neural Networks (DNNs) have demonstrated remarkable performance in various application domains, such as computer vision, pattern recognition, and natural language processing. However, deploying these models on edge-computing devices poses a challenge due to their extensive memory requirements and computational complexity. These factors make it difficult to deploy DNNs on low-power and limited-resource devices. One promising technique to address this challenge is quantization, particularly fixed-point quantization. Previous studies have shown that reducing the bit-width of weights and activations, such as to 3 or 4 bits, through fixed-point quantization can preserve the classification accuracy of full-precision neural networks. Despite extensive research on the compression efficiency of fixed-point quantization techniques, their energy efficiency, a critical metric in evaluating embedded systems, has not been thoroughly explored. Therefore, this research aims to assess the energy efficiency of fixed-point quantization techniques while maintaining accuracy. To accomplish this, we present a model and design an architecture for each quantization method. Subsequently, we compare their area and energy efficiency at the same accuracy level. Our experimental results indicate that incorporating scaling factors and offsets into LSQ, a well-known quantization method, improves DNN accuracy by 0.1%. However, this improvement comes at the cost of a 3× decrease in hardware energy efficiency. This research highlights the significance of evaluating fixed-point quantization techniques not only in terms of compression efficiency but also in terms of energy efficiency when applied to edge-computing device.
[1] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
[2] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, "Filter pruning via geometric median for deep convolutional neural networks acceleration," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4340-4349.
[3] Z. Zhuang et al., "Discrimination-aware channel pruning for deep neural networks," Advances in neural information processing systems, vol. 31, 2018.
[4] Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, "Rethinking the value of network pruning," arXiv preprint arXiv:1810.05270, 2018.
[5] C. Baskin et al., "Nice: Noise injection and clamping estimation for neural network quantization," Mathematics, vol. 9, no. 17, p. 2144, 2021.
[6] Y. Bhalgat, J. Lee, M. Nagel, T. Blankevoort, and N. Kwak, "Lsq+: Improving low-bit quantization through learnable offsets and better initialization," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 696-697.
[7] S.-E. Chang et al., "RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5251-5260.
[8] J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, "Pact: Parameterized clipping activation for quantized neural networks," arXiv preprint arXiv:1805.06085, 2018.
[9] M. Courbariaux, Y. Bengio, and J.-P. David, "Training deep neural networks with low precision multiplications," arXiv preprint arXiv:1412.7024, 2014.
[10] Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer, "Hawq: Hessian aware quantization of neural networks with mixed-precision," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 293-302.
[11] S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, "Learned step size quantization," arXiv preprint arXiv:1902.08153, 2019.
[12] M. Ghasemzadeh, M. Samragh, and F. Koushanfar, "ReBNet: Residual binarized neural network," in 2018 IEEE 26th annual international symposium on field-programmable custom computing machines (FCCM), 2018: IEEE, pp. 57-64.
[13] T. Chen et al., "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269-284, 2014.
[14] Y. Chen et al., "Dadiannao: A machine-learning supercomputer," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014: IEEE, pp. 609-622.
[15] P. Gysel, J. Pimentel, M. Motamedi, and S. Ghiasi, "Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks," IEEE transactions on neural networks and learning systems, vol. 29, no. 11, pp. 5784-5789, 2018.
[16] P. Gysel, M. Motamedi, and S. Ghiasi, "Hardware-oriented approximation of convolutional neural networks," arXiv preprint arXiv:1604.03168, 2016.
[17] S. Sharify et al., "Laconic deep learning inference acceleration," in 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), 2019: IEEE, pp. 304-317.
[18] S. Ghodrati, H. Sharma, C. Young, N. S. Kim, and H. Esmaeilzadeh, "Bit-parallel vector composability for neural acceleration," in 2020 57th ACM/IEEE Design Automation Conference (DAC), 2020: IEEE, pp. 1-6.
[19] S. Jung et al., "Learning to quantize deep networks by optimizing quantization intervals with task loss," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4350-4359.
[20] C. Gong et al., "µl2q: An ultra-low loss quantization method for DNN compression," in 2019 International Joint Conference on Neural Networks (IJCNN), 2019: IEEE, pp. 1-8.
[21] R. Gong et al., "Differentiable soft quantization: Bridging full-precision and low-bit neural networks," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4852-4861.
[22] M. Nikolić et al., "Bitpruning: Learning bitlengths for aggressive and accurate quantization," arXiv preprint arXiv:2002.03090, 2020.
[23] N. Nazari, M. Loni, M. E. Salehi, M. Daneshtalab, and M. Sjodin, "Tot-net: An endeavor toward optimizing ternary neural networks," in 2019 22nd Euromicro Conference on Digital System Design (DSD), 2019: IEEE, pp. 305-312.
[24] S. A. Mirsalari, N. Nazari, S. A. Ansarmohammadi, M. E. Salehi, and S. Ghiasi, "E2BNet: MAC-free yet accurate 2-level binarized neural network accelerator for embedded systems," Journal of Real-Time Image Processing, vol. 18, pp. 1285-1299, 2021.
[25] S. A. Mirsalari, N. Nazari, S. A. Ansarmohammadi, S. Sinaei, M. E. Salehi, and M. Daneshtalab, "ELC-ECG: Efficient LSTM Cell for ECG classification based on quantized architecture," in 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021: IEEE, pp. 1-5.
[26] M. E. Salehi, "Binary neural networks," 2020.
[27] N. Nazari, S. A. Mirsalari, S. Sinaei, M. E. Salehi, and M. Daneshtalab, "Multi-level binarized lstm in eeg classification for wearable devices," in 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2020: IEEE, pp. 175-181.
[28] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in International conference on machine learning, 2015: PMLR, pp. 1737-1746.
[29] F. Asim, J. Park, A. Azamat, and J. Lee, "Centered Symmetric Quantization for Hardware-Efficient Low-Bit Neural Networks," 2022: British Machine Vision Association (BMVA).
[30] P. Judd et al., "Reduced-precision strategies for bounded memory in deep neural nets," arXiv preprint arXiv:1511.05236, 2015.
[31] X. Zhao, Y. Wang, X. Cai, C. Liu, and L. Zhang, "Linear symmetric quantization of neural networks for low-precision integer hardware," 2020.
[32] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[33] J. L. McKinstry et al., "Discovering low-precision networks close to full-precision networks for efficient inference," in 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), 2019: IEEE, pp. 6-9.