# ROBUST AND HIGH-SPEED MTJ/CMOS 5-2 COMPRESSORS

Bahareh Elahi, Fazel Sharifi\*

Department of Electrical and Computer Engineering, Graduate University of Advanced Technology, Kerman, Iran f.sharifi@kgut.ac.ir

Received: 21/06/2024, Revised:19/10/2024, Accepted:18/11/2024.

## Abstract

Compressors are one of the most critical components of multipliers. Thus, improving this part's energy consumption, delay, and area can directly affect the systems' overall efficiency. Also, Magnetic tunnel junction (MTJ) devices have been studied as a prosperous solution to implement low-power circuits thanks to their non-volatility, high speed, low power, good endurance, and scalability. This paper uses MTJs to develop two hybrid MTJ/CMOS low-power designs of a 5-2 compressor. The proposed designs are simulated in the HSPICE simulator using 45nm standard CMOS technology and the spin torque transfer (STT) MTJ model. Finally, the new designs compared with the exiting 5-2 compressors. The proposed circuits improve the delay, PDP, and transistor count compared to the existing design. According to the simulation result, the first and second proposed designs have decreased the PDP value by about 51% and 66%, respectively than the previous work.

# Keywords

Compressor, Magnetic Tunnel Junction (MTJ), Spin Torque Transfer (STT), Low-Power Design, Process Variations.

# 1. Introduction

Over the past decade, the expansion of complementary metal-oxide-semiconductor (CMOS) was very successful. Progress in CMOS development led to a reduction in the area, power consumption, cost per transistor, and improvement in the efficiency of integrated circuits (IC) [1]. Since the dimensions of the CMOS transistors have been shrunk below 90nm and scaling them, some problems such as short channel effects and leakage power increment were appeared [2, 3]. Leakage power is a current that leaks through transistors even when they are turned off, which leads to a significant portion of the total power consumption [4].

On the other hand, with the impressive development of the internet of things (IOTs), the number of batterypowered devices is rising. Therefore, power consumption has become more important as a critical challenge [5]. In recent years, researchers focus on various approaches to overcome this issue to optimize power consumption which can be referred to as employing spintronic devices, using approximate computing [6, 7, 8].

Magnetic tunnel junctions (MTJs) are among the most important spintronic devices that have attracted much attention thanks to their non-volatility, near-zero standby power, energy efficiency, high integration density, radiation-hardness features, and compatibility with the other semiconductor devices [9]. Moreover, MTJs can be distributed over logic circuits as memory elements that reduce the distance between logic elements and memory building blocks in what is called "Logic in Memory" (LIM) [10]. MTJs can be vertically integrated at the backend process of CMOS fabrication. So, we have a new design approach called hybrid MTJ/CMOS [10].

Several circuits have been designed in the literature using hybrid MTJ/CMOS approach so far such magnetic flip-flop (MFF) [11], magnetic RAM (MRAM) [12], magnetic look-up-table (MLUT) [13] and magnetic content addressable memory (MCAM) [14], ternary magnetic RAM (TMRAM) [15], magnetic full adder cell (MFA) [1] and etc [10].

Multiplication is one of the most crucial operations and power-hungry arithmetic blocks in general-purpose microprocessors, digital signal processors, and digital filters [16]. In general, the multiplier architecture consists of three stages, including 1) partial products generation. 2) Partial products reduction 3) Final products generation [17]. Among the multiplier three stages, partial product reduction is the most important energy consumption and area step. Therefore, improvement in this field has a direct effect on the overall performance of the system [18]. Compressors are efficient circuits that can be used for this part which 4-2 and 5-2 compressors are more common among a variety of compressors [19].

In this paper, two low-power hybrid MTJ/CMOS 5-2 compressor designs are presented. The proposed hybrid MTJ/CMOS compressors are simulated using H-Spice simulator and compared with a magnetic 5-2 compressor presented in [20]. These circuits have been evaluated

using 45nm standard CMOS technology and the Spin Transfer Torque (STT) MTJ model. In addition, the results show that the proposed designs lead to a significant reduction in delay and the number of transistors and improvements in power consumption and PDP compared to the existing design.

The remainder of the paper is organized as follows: Section 2 reviews magnetic tunnel junction and 5-2 compressor cell. In Section 3, the proposed 5-2 compressor circuits are presented and described. The simulation and comparison results of MTJ/CMOS 5-2 compressors are presented in Section 4. Further, section 5 concludes the paper.

## 2. Background

## 2.1. Magnetic tunnel junction

Magnetic tunnel junction was invented as a promising device for implementing nonvolatile circuits [21]. MTJ device has a vertical nanopillar structure that consists of two ferromagnetic (FM) layers (a free layer and a fixed layer) and an oxide barrier that is placed between the two ferromagnetic layers [22]. As shown in Figure 1, the magnetization of the fixed layer is fixed and utilized as a reference layer. However, the magnetic orientation of the free layer can be modified by using various switching approaches [23]. Hence, there are two different magnetization configurations for the layers. One is a parallel (P) or low resistance. The other is an antiparallel (AP) or high-resistance. Also, P and AP states denote "1" and "0" in binary information, respectively [5, 24]. The TMR parameter (tunnelling magneto-resistance ratio) denotes the difference between RP and RAP in an MTJ is calculated by Eq. (1).

$$TMR = (R_{AP} - R_P)/R_P$$
(1)

Higher TMR is desirable and will allow easier detection of the difference between AP and P [25]. Generally, there are several methods for change the magnetic orientation of the free layer or writing to MTJ, named field induced magnetization switching (FIMS), thermally assisted switching (TAS), and spin torque transfer (STT) [26].



Fig. 1. Magnetic Tunnel Junction Device

The first-generation method of writing on MRAM was FIMS which suffers from poor selectivity, and poor scalability and high switching currents, and consequently, high power consumption [27]. After that, STT and TAS methods were introduced in the second generation of MRAMs. Among these methods, STT is the typical writing method due to its simplicity and higher performance. As an alternative for the other two methods, the STT technique provides higher power efficiency and faster writing speed [28]. STT for switching requires one bi-directional low switching current. When the current of MTJ (IMTJ) becomes higher than a critical current (IC), the state of MTJ will be changed [29]. As shown in Figure 1, if the current of MTJ enters from the free layer and exits the fixed layer and IMTJ > IC, then MTJ state is switched from logic "0" to "1". And if IMTJ flow from the fixed layer to the free layer and IMTJ > IC then the MTJ state is changed from logic "1" to "0" is obtained.

#### 2.2. Compressors

Compressors are used for the summation of the partial products during the multiplication process. A 5-2 compressor has five main inputs (A, B, C, D, E) and two main outputs (Sum and Carry). Also, two carry inputs (Cin1 and Cin2) come from the previous stage of lower importance, and two carry outputs Cout1 and Cout2) go to the next stage of higher importance. The weight of the five main inputs, two input carries, and sum output, are the same and are equal to 2n, but Carry output and two other output carries a weight of 2n+1. Generally, Sum, Carry, Cout2 and Cout1 outputs are defined by the following logic function:

$$Sum = A \bigoplus B \bigoplus C \bigoplus D \bigoplus E \bigoplus C_{in1} \bigoplus C_{in2}$$
(2)

$$Carry=Maj(E, C_{in2}, (A \oplus B \oplus C \oplus D \oplus_{Cin1}))$$
(3)

$$C_{out2} = Maj (A,B,C)$$
(4)

$$C_{out1} = Maj (D, C_{in1}, (A \oplus B \oplus C))$$
(5)

Details of a 5-2 Compressor operation are described in reference [30]. A general structure of a 5-2 compressor consists of three full adders, which are connected as in Figure 2 [31].



Fig. 2. A 5-2 compressor consists of three full adder cell

# 3. Proposed Designs

In this section, we propose two new hybrid MTJ/CMOS 5-2 compressor circuits. The proposed circuits include four sub circuits for each output (Sum, Carry, Cout1, and

Cout2). Note, the value of the Cin is stored in the MTJs. In each sub circuit, a pair of MTJs are always in opposite configurations (e.g., MTJ1:RAP and MTJ0:Rp, as shown in Figure 3).

Generally, there are three main methods for sensing the conventional MTJ: SRAM-based sense amplifier, dynamic current mode (DCM) based sense amplifier, and pre-charge sense amplifier (PCSA) [27].

PCSA provides power efficiency as good as sensing reliability while keeping high-speed performance [27]. PCSA operates in two phases: pre-charging and evaluation. In the pre-charging phase, the outputs are charged to the VDD value. Nevertheless, in the evaluation phase, the actual value of outputs is obtained by discharging one output. Therefore, to read (sense) the functionality of our hybrid MTJ/CMOS circuits, we used a precharged sense amplifier (PCSA). In the rest of the paper, we present the two proposed designs in detail. The presented designs used a heuristic design method to reduced delay and power consumption by eliminating some transistors from output to the ground. Therefore, the number of transistors is reduced. In the second proposed design, to reduce the resistance of the path, we used the XOR of some inputs to have higher speed and reliability.

#### 3.1. First Proposed Design

#### 3.1.1 Cout2 Output

Figure 3 shows the schematic of the  $C_{out2}$  output of the proposed design. According to Eq. (4), the  $C_{out2}$ module is the simplest circuit in 5-2 compressor that consists of a majority gate. During the pre-charging phase (CLk=0), P1=P4=ON and N0=OFF; thus,  $C_{out2}$  and  $\overline{C_{out2}}$ will be charged to  $V_{dd}$  via P1, P4. The logic evaluation can be achieved as CLK=1 when P1=P4=OFF and N0=ON will result in discharging one of the  $C_{out2}$  or  $\overline{C_{out2}}$ .



Fig. 3. Schematic of the  $C_{out2}$  output of the proposed designs.

For example, when CLK=1, A=V<sub>dd</sub>, and B=0, N3, N6, and N0 are ON; thus, both pull-down branches are connected, and the amount of MTJs determines the final output. If the value stored in MTJ1=0 and MTJ0=1, then the state of MTJ1=high resistance and MTJ0=low resistance. In this case, C<sub>out2</sub> is discharged more quickly than  $\overline{C_{out2}}$ . When C<sub>out2</sub> voltage reaches down enough, P2=ON and N1=OFF, and  $\overline{C_{out2}}$  will be pulled up to V<sub>dd</sub> or logic "1", and C<sub>out2</sub> will be further pulled down to GND or logic "0".

As another example, suppose:  $A=V_{dd}$  and  $B=V_{dd}$ , N5 and N6 are OFF; thus, the right branch will cut off, and the  $C_{out2}$  output remains  $V_{dd}$  regardless of the amount of MTJs (C).

#### 3.1.2 Sum Output

Generally, in full adders and compressors, the Sum subcircuit is an exclusive OR of all inputs [35]. According to Eq. (2), the Sum output of a 5-2 compressor circuit can be considered an XOR with seven inputs. Figure 4 illustrates the schematic of the Sum circuit. We have reduced the number of transistors in the CMOS tree of this design, which significantly reduces energy consumption and delay compared with the design in [20]. For example, if  $A=B=C_{in2}=V_{dd}$  and  $D=E=C_{in1}=0$ , in this case, two paths were created:

1) N3, N4, N11, N15, N19, N26 transistors, and MTJ0 provide the path of the  $\overline{Sum}$  to the ground.

2) N9, N10, N14, N18, N22, N24 transistors, and MTJ1 provide a path for the Sum to the ground. Due to the different resistances in branches, Sum discharges quicker than  $\overline{Sum}$ , consequently Sum=1 and  $\overline{Sum}$ =0.



Fig. 4. Schematic of the  $C_{out2}$  output of the proposed designs.

For another example, if  $A=D=E=C_{in1}=V_{dd}$  and  $B=C_{in2}=0$ , then N9, N6, N13, N16, N21, N26 transistors, and MTJ0 connect Sum to the ground, and N3, N8, N12, N17, N20, N23 transistors, and MTJ1 connect Sum to the ground. Finally, the actual value of outputs will be  $\overline{Sum}=1$  and Sum=0.

## 3.1.3 Carry Output

The next part of the first proposed hybrid MTJ/CMOS 5-2 compressor is the Carry circuit, shown in Figure 5. According to Eq. (3), Carry consists of a majority function with three inputs where one of them is the output of a 5-input XOR (A, B, C, D, C<sub>in1</sub>). When E and C<sub>in2</sub> have different values, the output of this XOR will be important. When  $A=D=E=V_{dd}$ ,  $B=C_{in1}=C_{in2}=0$ , and C=AP, then in this case N20 = N21=OFF and XOR determines the output. Therefore, N9, N6, N13, N18, MTJ0 create a path to the ground for Carry and N3, N8, N12, N15, MTJ1 connect (Carry) to the ground and finally Carry =0.



Fig. 5. Schematic of the Carry output of the first proposed design

## 3.1.4 Cout1

The The last part of the first proposed 5-2 compressor is the Cout1 module, shown in Figure 6. As described in the previous part and according to Eq. (5), Cout1 consists of a majority function with three inputs that one of them is fed from a 3-inputs XOR(A, B, C) (see Figure 2). When D and Cin1 have the same value, thus both are either1 or 0. Therefore, Cout1 will be determined irrespective of the value of the 3-input XOR (i.e., the Third input). If D and Cin1 differ in value (i.e.,"01" or "10"), the XOR of the other three inputs (A,B,C) will determine the output. For example: when D=Cin1=0 and A  $\oplus$  B  $\oplus$  C=0 or A  $\oplus$  B  $\oplus$ C=1, Cout1 will be zero.



Fig. 6. Schematic of the C<sub>out1</sub> output of the proposed designs

# 3.2. Second Proposed Design

Cout1 and Cout2 sub-circuits are similar to the first design. Thus, we will skip them.

#### 3.2.1 Sum Output

The function of the Sum output is an XOR with seven inputs as the previous design. However, we used 2-input XOR gates to reduce the number of transistors between the output and the ground in this design. Instead of designing a seven inputs XOR, we use a 3-input XOR, where each of its inputs connects to a 2-input XOR. This design reduces the height of the CMOS tree leading to a significant reduction in delay but increases power consumption. Also the sensitivity to process variation will be reduced.

Figure 7 shows the schematic of the Sum circuit of this design. For example, when  $A=B=C_{in2}=V_{dd}$  and  $D=E=C_{in1}=0$  then, Z=0, O=0 and X=V\_{dd}, consequently the Sum will be V<sub>dd</sub> (C=0 and MTJ1 has high resistance).

#### 3.2.2 Carry Output

The implementation of the Carry circuit is the same as the Sum circuit. We use 2-input XORs to reduce the critical path and also increase the robustness against process variations. Figure 8 shows the Carry circuit of the second design. The functionality of this circuit is similar to the first design with a smaller propagation delay.

## 4. Simulation Results

This section presents the simulation results of the proposed hybrid MTJ/CMOS 5-2 compressor designs and compares them with an existing design [20].

Note that the design in [20] is an approximate 5-2 compressor where some of its outputs are incorrect for









Fig. 8. Schematic of the Carry output of the second proposed design

some input patterns. Also we compared our designs with two CMOS designs[30][34]. We perform the simulations using the HSPICE tool with the spin torque transfer (STT) model for MTJs [32] and 45nm standard CMOS technology [33]. Table 1 lists the parameters of the STT-MTJ. lx, ly and lz are the dimension of the free layer. The width of the free layer affects the current density through the MTJ. A larger width along with the length of free layer increases the device area, impacting the RA product and the resulting resistance. The thickness of the free layer affects its magnetic properties. The Resistance-Area Product (RA) is a key parameter in Magnetic Tunnel Junction (MTJ) devices which represents the product of the resistance of the MTJ and its area and plays a crucial role in determining the device's performance characteristics. The saturation magnetization of a Magnetic Tunnel Junction (MTJ) refers to the maximum magnetization that the ferromagnetic layers of the MTJ can achieve when subjected to an external magnetic field. Simulations are conducted at 1.0V supply voltage. Figure 9 shows the transient response of the proposed designs. It illustrates the correct functionality of the proposed 5-2 compressors.

Our experiments collect the worst-case delay, average power consumption, the number of transistors, and the power-delay product (PDP). Table 2 shows our results compared to an existing design representing state-of-theart. As you can see in the results, in first proposed design decreases the power consumption as a critical parameter in digital circuits. This improvement is due to the reduction in the transistors count of the design. Furthermore, our designs have a lower worst-case delay overall by 34% for the first design and 67% for the second design. Moreover, the PDP of the first and the second



Fig. 9. Transient response of the proposed 5-2 compressor

**Table I**. MTJ device parameters used for simulations

| Parameter | Description                                       | Value              |  |  |  |
|-----------|---------------------------------------------------|--------------------|--|--|--|
| lx        | width, of free layer                              | 65nm               |  |  |  |
| ly        | length, of free layer                             | 65nm               |  |  |  |
| lz        | thickness of free layer                           | 1.48nm             |  |  |  |
| RA        | MTJ resistance-area product                       | $5 \Omega \mu m^2$ |  |  |  |
| TMR       | Tunnel magnetoresistance ratio under zero<br>bias | 150%               |  |  |  |
| MS0       | saturation magnetization                          | 1210               |  |  |  |
|           |                                                   |                    |  |  |  |

**Table II.** Simulation results with 45 nm technology

| Design                             | Delay<br>(pS) | power<br>(uW) | PDP(aJ) | Number of<br>Devices |
|------------------------------------|---------------|---------------|---------|----------------------|
| Design[30]                         | 483           | 31.4          | 15100   | 208                  |
| Design[34]                         | 145           | 29.5          | 4270    | 192                  |
| Design[20]                         | 73.804        | 2.813         | 207.628 | 148+8MTJs            |
| 1 <sup>st</sup> Proposed<br>design | 49.019        | 2.095         | 102.719 | 100+8MTJs            |
| 2 <sup>nd</sup> Proposed<br>design | 24.387        | 2.885         | 70.359  | 112+8MTJs            |

designs are lower than the design in [15] by 51% and 66%, respectively. The number of transistors has also decreased from 148 to 100 and 112 in the first and second designs. One of the most critical issues in nanoscale circuits is sensitivity to process variations. Monte Carlo simulations evaluate the process variations considering a Gaussian distribution and variation at the  $\pm 6\sigma$  level for the process parameters. Resistance-area product (RA) and the transistors gate length (Lg) are the primary sources of variations for MTJs and CMOS circuits, respectively.

Therefore, we have considered 5%, 10%, 15%, and 20% variations of these parameters in our simulations for process variation analysis. The results show that the previous design [20] operates correctly with up to 5% of RAP variation in the MTJs, and failed for higher values. Figure 10 show the robustness of the proposed designs against RAP variations of MTJs of up to 20%. For transistors Lg variations, the design in [20] failed for all variation values. Figure 11 show the robustness of our proposed designs when Lg varies by up to 10%.



Fig. 10. Variation of delay, power and PDP relative to RAP



Fig. 11. Variation of delay, power and PDP relative to Lg



Fig. 12. Delay, Power, and PDP of designs vs. supply voltage variation



Fig. 13. Delay, Power, and PDP of designs vs. temperature variation

Figure 12 evaluates the operation of the designs at different supply voltages, including 0.8V, 0.9V, 1.0V, and 1.1V. Results show that increasing the supply voltage results in an exponential increase in average power consumption and PDP.

Figure 13 shows the operation of the compressors at different temperatures, ranging from -20 °C to 90 °C. As anticipated, the operation of the compressors is dependent on the temperature changes. The results, shown in Figure 13, suggest that all three evaluated parameters vary as the temperature varies.

Generally, speed and power consumption are two factors in circuits which have reverse effect. For example, our second design is faster than the first one, but its power consumption is higher. In comparison with the previous design [20], our designs use less transistors by eliminating redundant transistors and unnecessary paths. Therefore, our designs are faster, and the first design used less power consumption. Ref[20], uses more transistors and consequently more resistance on the paths from ground to the output. So, the resistance value of MTJ and transistors are become important to have correct functionality and made it more sensitive to the process variations.

Because of the different design methodology of CMOS designs and MTJ/CMOS design, the CMOS designs are more reliable and robust against process variation and noise, but MTJ/CMOS designs are more power efficient.

## 5. Conclusion

Compressors are one of the most critical multipliers components that can be used in microprocessors, digital signal processors, and digital filters. This paper presents two new high speed hybrid MTJ/CMOS designs of 5-2 compressors. Our new designs offer a significantly simpler structure and lower energy consumption and delay compared to the state-of-the-art. The number of transistors used is reduced by about 33% and 24% in the first and second designs. Our designs used unique features of Spin Transfer Torque (STT) MTJs and PCSA sense amplifier to get better power consumption. For evaluation, we simulated these designs using the HSPICE simulator with 45 nm standard CMOS technology. The final results indicate the correct operation of the proposed circuits. We have studied the impact of process variation on the CMOS and MTJ on the performance of the designs using Monte Carlo simulations. The simulation results show about 34% and 51% improvement in delay and PDP, respectively, for the first design, whereas about 67%, 66%, improvement in delay and PDP, respectively, for the second design.

# 6. References

[1] A. Amirany and R. Rajaei, "Fully nonvolatile and low power full adder based on spin transfer torque magnetic tunnel junction with spin-hall effect assistance", *IEEE Transactions on Magnetics*, vol. 54, no. 12, pp. 1-7, 2018.
[2] F. Sharifi; A. H. Hosseini; M. Nouraei. "Design of a CNFET-based Quaternary Full Adder", *TABRIZ JOURNAL OF ELECTRICAL ENGINEERING*, vol. 54, no. 2, pp. 183-192, 2024, (in persian).

[3] R. Abbasi, and V. Jamshidi, "A Non-volatile, Low-Power, and Fast NCFET-based Flip-Flop with Simultaneous Backup Capability for Non-volatile Computing", *TABRIZ JOURNAL OF ELECTRICAL ENGINEERING*, vol. 54, no. 1, pp.35-43, 2024.

[4] N. S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J. S. Hu, *et al.*, "Leakage current: Moore's law meets static power", *Computer*, vol. 36, no. 12, pp. 68-75, 2003.

[5] F. Razi, M. H. Moaiyeri, R. Rajaei, and S. Mohammadi, "A variation-aware ternary spin-Hall assisted STT-RAM based on hybrid MTJ/GAA-CNTFET logic", *IEEE Transactions on Nanotechnology*, vol. 18, pp. 598-605, 2019.

[6] A. Amirany and R. Rajaei, "Nonvolatile, spin-based, and low-power inexact full adder circuits for computingin-memory image processing", *Spin*, vol. 9, no. 03, p. 1950013, 2019. [7] A. Zarei and F. Safaei, "Power and area-efficient design of VCMA-MRAM based full-adder using approximate computing for IoT applications", *Microelectronics journal*, vol. 82, pp. 62-70, 2018.

[8] H. Cai, Y. Wang, L. A. Naviner, Z. Wang, and W. Zhao, "Approximate computing in MOS/spintronic non-volatile full-adder", in *IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH* 2016), pp. 203-208.

[9] K. C. Chun, H. Zhao, J. D. Harms, T.-H. Kim, J.-P. Wang, and C. H. Kim, "A scaling roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-MRAMs for high-density cache memory", *IEEE journal of solid-state circuits*, vol. 48, no. 2, pp. 598-610, 2012.

[10] H. Thapliyal, A. Mohammad, S. D. Kumar, and F. Sharifi, "Energy-efficient magnetic 4-2 compressor", *Microelectronics journal*, vol. 67, pp. 1-9, 2017.

[11] N. Sakimura, T. Sugibayashi, R. Nebashi, and N. Kasai, "Nonvolatile magnetic flip-flop for standby-power-free SoCs", *IEEE Journal of Solid-State Circuits*, vol. 44, no. 8, pp. 2244-2250, 2009.

[12] S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, "Computing in memory with spin-transfer torque magnetic RAM", *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 26, no. 3, pp. 470-483, 2017.

[13] J. Sun, X. Zhao, and Y. Wang, "A Three Input Look-Up-Table Design Based on Memristor-CMOS", in Bioinspired Computing: Theories and Applications: 13th International Conference, BIC-TA November 2–4, 2018, Beijing, China, pp. 275-286.

[14] D. Cho, K. Kim, and C. Yoo, "A non-volatile ternary content-addressable memory cell for low-power and variation-toleration operation", *IEEE transactions on magnetics*, vol. 54, no.2, pp. 1-3, 2017.

[15] A. A. Javadi, M. Morsali, and M. H. Moaiyeri, "Magnetic nonvolatile flip-flops with spin-Hall assistance for power gating in ternary systems", *Journal of Computational Electronics*, vol. 19, no.3, pp. 1175-1186, 2020.

[16] C.-W. Tung and S.-H. Huang, "Low-power highaccuracy approximate multiplier using approximate highorder compressors", in *2nd International Conference on Communication Engineering and Technology (ICCET)* 2019, Nagoya, Japan, pp. 163-167.

[17] D. Balobas and N. Konofaos, "Low-power highperformance CMOS 5-2 compressor with 58 transistors", *Electronics Letters*, vol. 54, no.5, pp. 278-280, 2018.

[18] R. K. Senapati and J. Ravindra, "Low-power nearexplicit 5:2 compressor for superior performance multipliers", *International Journal of Engineering*, vol. 11, no. 4, pp. 529-545, 2018.

[19] S. Sowmiya, K. Stella, and V. Senthilkumar, "Design and analysis of 4-2 compressor for arithmetic application", *Asian Journal of Applied Science and Technology (AJAST)*, vol. 1, no. 1, pp. 106-109, 2017.

[20] M. Ahmadinejad and M. H. Moaiyeri, "Energyefficient magnetic 5: 2 compressors based on SHEassisted hybrid MTJ/FinFET logic", *Journal of Computational Electronics*, vol. 19, no. 1, pp. 206-221, 2020. [21] E. Deng, Y. Zhang, J.-O. Klein, D. Ravelsona, C. Chappert, and W. Zhao, "Low power magnetic full-adder based on spin transfer torque MRAM", *IEEE transactions on magnetics*, vol. 49, no. 9, pp. 4982-4987, 2013.

[22] F. Ren and D. Markovic, "True energy-performance analysis of the MTJ-based logic-in-memory architecture (1-bit full adder)", *IEEE transactions on electron devices*, vol. 57, no. 5, pp. 1023-1028. 2010.

[23] F. Sharifi, Z. Saifullah, and A.-H. Badawy, "Design of adiabatic MTJ-CMOS hybrid circuits", in *IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)* 2017, Boston, MA, USA, pp. 715-718.

[24] R. Zand, A. Roohi, and R. F. DeMara, "Energyefficient and process-variation-resilient write circuit schemes for spin hall effect MRAM device", *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 25, no. 9, pp. 2394-2401, 2017.

[25] W. Kang, L. Zhang, J.-O. Klein, Y. Zhang, D. Ravelosona, and W. Zhao, "Reconfigurable codesign of STT-MRAM under process variations in deeply scaled technology", *IEEE Transactions on Electron Devices*, vol. 62, no. 6, pp. 1769-1777, 2015.

[26] A. Amirany and R. Rajaei, "Spin-based fully nonvolatile full-adder circuit for computing in memory", Spin, vol. 9, no. 01, p. 1950007, 2018.

[27] Y. Gang, W. Zhao, J.-O. Klein, C. Chappert, and P. Mazoyer, "A high-reliability, low-power magnetic full adder", *IEEE Transactions on Magnetics*, vol. 47, no. 11, pp. 4611-4616, 2011.

[28] H. Thapliyal, F. Sharifi, and S. D. Kumar, "Energyefficient design of hybrid MTJ/CMOS and MTJ/nanoelectronics circuits", *IEEE Transactions on Magnetics*, vol. 54, no. 7,pp. 1-8, 2018.

[29] R. Rajaei and S. B. Mamaghani, "Ultra-low power, highly reliable, and nonvolatile hybrid MTJ/CMOS based full-adder for future VLSI design", *IEEE Transactions on device and materials reliability*, vol. 17, no. 1, pp. 213-220, 2016.

[30] C-H. Chang, J. Gu, and M. Zhang, "Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits", *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 51, no. 10, pp. 1985. 2004.

[31] S. Agarwal, G. Harish, S. Balamurugan, and R. Marimuthu, "Design of high speed 5: 2 and 7: 2 compressor using nanomagnetic logic", in *VLSI Design and Test: 22nd International Symposium, VDAT 2018, Madurai, India, June 28-30*, pp. 49-60.

[32] J. Kim, A. Chen, B. Behin-Aein, S. Kumar, J.-P. Wang, and C. H. Kim, "A technology-agnostic MTJ SPICE model with user-defined dimensions for STT-MRAM scalability studies", in *IEEE custom integrated circuits conference (CICC)*, 2015, San Jose, CA, USA, pp. 1-4.

[33] Bagherizadeh, M., Moaiyeri, M. H., & Eshghi, M. A "high-performance 5-to-2 compressor cell based on carbon nanotube FETs", *International Journal of Electronics*, vol.106, no. 6, pp. 912-927, 2019.

[34] Chu, Kan M., and David L. Pulfrey. "A comparison of CMOS circuit techniques: Differential cascode voltage switch logic versus conventional logic", *IEEE Journal of Solid-State Circuits* 22, no. 4, pp 528-532, 1987.