Design of Digital FIR Filter Based on MCMAT for 12 bit ALU using DADDA & WALLACE Tree Multiplier

Jami Venkata Suman¹, D. G. Jignash² and B. I. Neelgar³

¹Assistant Professor, Department of ECE, GMR Institute of Technology, Rajam, India, Mobile: +91-9985571930, Email: venkatasuman.j@gmrit.org.
²PG Student, Department of ECE, GMR Institute of Technology, Rajam, India, jigu.gunavanth@gmail.com.
³Professor & HOD, Department of ECE, GMR Institute of Technology, Rajam, India, hod_ece@gmrit.org.

Abstract

Over a period of time, there are different types of multipliers used to minimize cost of constructive parameters in digital FIR filter design in different forms. In these multipliers truncated multiplier design play’s an essential role. Due to multiple adders and delay elements the circuit in transpose form occupies more area and consumes power. In the proposed method effectiveness in different parameters are achieved with low cost, high speed and effective results are done by faithfully rounded truncated multipliers with operations of carry save adder. Direct form of FIR filter utilizes based on MCMAT for multiplication and accumulation operations; it will reduce the area by decreasing the number of different components like structural adders and registers. By using MCMAT operation in the digital FIR filters, optimal results can be obtained in terms of power and area specifications. The proposed 12 bit Arithmetic and Logic Unit is designed along with MCMAT based digital FIR filter for reducing area and increasing speed in real time applications.

Keywords: Finite Impulse Response Filter, multiple constant multipliers/accumulators with faithfully rounded truncation (MCMAT), TRUNCATED multiplier, WALLACE Tree multiplier, DADDA multiplier

1. Introduction

In field of electronic industry, digital filters are used extensively. Noise range can be gradually increases by using analog filters. Better noise performance can be obtained by using digital filters compared to analog filters. The proposed design includes optimization of bit width and hardware resources without any impact on frequency response and output signal [2]. Three basic mathematical operations such as Addition (or) subtraction, Multiplication, delaying digital signal for one or more sample periods that are used in digital filter are shown in Figure 1.1. By using mathematical operations, behavior of the filter can be described. Coefficients are multiplied by fixed-point constants using additions, subtractions and shifts in a multiplier block [5].
In VLSI Signal Processing, two types of digital filters widely used, one is FIR (Finite Impulse Response) and other is IIR (Infinite Impulse Response). FIR indicates that impulse is finite. In this filter phase is kept linear in order to avoid noise distortions and no feedback is used. As compared to IIR, FIR is very simple to design. Such type of FIR filters are used in DSP processors for high speed. In Digital Signal Processing Multiplication and addition requires a lot of time. High speed addition is done by parallel prefix adder and better version of truncated multiplier with fewer components [4]. For limited calculations IIR is used because all output is found separately, even though there is need to provide feedback. Digital Signal Processing, FIR filters define less number of bits which is designed by using finite-precision. If IIR filter uses feedback problem will raise. In FIR filter limited bits are efficient, which there is no feedback. Using fractional arithmetic can implement FIR filters. FIR filters require more co-efficient than IIR filter in order to implement same frequency response, therefore needing more memory and hardware resources to carry out mathematical operations [1]. The proposed FIR filter design can have less number of coefficients.

2. Multipliers

Now a day’s fast co-processor, digital signal processing chips and graphics processors has created to satisfy customer for high speed and area efficient multipliers. Current design ranges from small, low-performance shift and ADD multipliers to large high-performance array of multipliers. Higher performance is gained by Tree structures than linear arrays, but tree interconnection is more complex, less regular. In digital signal processors and microprocessors multiplier is one of the key hardware blocks in most of digital and high performance systems. Main motivation behind this paper is to offer high speed and lower power consumption without increase in silicon area.

2.1. Binary Multiplications

Figure 2.1 represent multiplication process of two binary numbers, multiplicand and Multiplier according to the multiplier rules. If inputs are n bit then output should be 2n bits. The first step of method is to form the partial product matrix and this is obtained by adding the multiplicand and multiplier bits. If the multiplier bit is 0, partial product is 0. If multiplier bit is 1, partial product is equal to multiplicand, repeats for every multiplier bit. Note that number of partial products equal to the width of the multiplier. To obtain final product elements in columns (from right to left) are added using binary logic 7. Any carries are carried on to next column. Result of this operation is stored in one bit of product and operation is repeated for each remaining column.
3. Truncated Multiplier

Truncated multiplication is a technique, only most significant columns multiplication matrixes are used, therefore area requirements shrinkage will be done. Truncation is a method, least significant amount of columns not formed in this way. ‘T’ defines the degree of truncation and T least significant bits of the product always result in ‘0’. Algorithm behind truncated multiplication is same, when dealing with non-truncated multiplication regardless of truncation degree. Effect is illustrated in Figure 3, where truncation column in the partial product matrix is not formed. Degree of T=8 and T=12 is applied. Notice that columns to the right of maroon vertical line are missing. In implementation of truncated multiplier extended bit width of multiplier operands. By increasing bit width complexity will be increased in operation of multiplication. In truncated method 8 bit and 12 bit operations are implemented by three operations of truncated multiplier which follows operation of 4 bit truncated multiplier operation those are deletion, truncation, rounding concepts [3] shows in Figure 3.1. In truncated multiplier removal of unnecessary PBs is composed of three processes.

![Figure 3.1. 8x8 Bit Binary Multiplication with Truncation Degree T=8](image)
3.1. Deletion

In truncated multiplier multiplication process is started with deletion operation only. In partial product more than half of bits are removed and then remaining bits become partial products in process. This is the main criteria of deletion.

3.2. Truncation

Truncation is a method where least significant columns in partial product matrix are not formed. ‘T’ defines degree of truncation. Least Significant Bits (LSB) of product always results in 0. Algorithm behind fixed width multiplication is same as when dealing with non-fixed width multiplication of the truncation degree. In FIR filter, zero order of non-uniform coefficient quantization is used to minimize cost and area [4].

3.3. Rounding

Conventionally an n-bit multiplicand and n-bit multiplier would render a 2n-bit product. Sometimes an n-bit output is desired to reduce number of stored bits. Let us consider instance of 8x8 bit multiplier.

![Figure 3.2. Partial Product Selection Logic for 8x8 Bit Multiplication](image.png)

Let us consider an instance of 12x12 bit multiplier in Figure 3.3. Truncated multiplication provides an efficient method for reducing power dissipation and area of rounded parallel multiplier. In those huge bit widths of implementations mainly focus on the performance of multiplier in lot of complexity levels also. In case of truncated method area shrinkage will automatically done at any number of bit width. So, other parameters of power and delay with those parameters depend on cost effective will be decayed a lot.
4. Wallace Tree Multiplier

To reduce number of partial products that are to be added in final intermediate result Wallace Tree multiplier is used. Basic operation of Wallace Tree is multiplication of two unsigned integer. An efficient hardware is required to implement a digital circuit, that multiplies two integers is Wallace Tree multiplier that is designed by an Australian Computer Scientist Chris in 1964. There are three steps in Wallace tree multiplier.

4.1. Partial Product Generation Stage

First step of binary multiplier is generation of Partial product. Which are generated based on multiplier value. If multiplier bit is ‘0’ (zero), then partial product row is also ‘0’ (zero), if it is ‘1’ (one). Each partial product row is shifted one unit to left from 2nd bit multiplication. Sign bit in signed multiplication also extended to left. For a conventional multiplier partial product generators are used. These contain a series of logic AND gates as shown in Figure 4.1. In this process of multiplication of two numbers, main operation is addition of partial products. Thus, performance and speed of multiplier depends on performance of the adder forms core of multiplier. Multiplier must be pipelined, to achieve higher performance.

4.2. Partial Product Reduction Stage

The design analyses begin with analysis of elementary algorithm for multiplication using Wallace Tree multiplier. Algorithm for 8-bits x 8-bits multiplication performs by Wallace Tree multiplier shows in Figure 4.1. The multiplication process is done in 5 stages. In each stage half adders are used and full adders that are denoted by 1 bit half adder and circle for 1-bit full adder. Partial products can be reduced by using half adders and full adders that are combined to build a carry-save adder (CSA). In next step remaining two rows are added by using a fast carry-propagate adder. Ripple-carry adder (RCA) uses the schematic of the conventional 8-bits x 8-bits. High speed Wallace multiplier is designed by referring to the algorithm. Block diagram for conventional high speed 8-bits x 8-bits Wallace Tree multiplier
is shown in Figure 4.1. The main aim of proposed architecture is to reduce overall latency. WALLACE TREE and DADDAA are two reduction techniques that are discussed in [8].

4.3. Partial Product Addition Stage

In this stage Wallace Tree multiplier method, ripple carry adders (RCA) are used to perform these addition operations. Three steps are used in Wallace method to process multiplication operation. They are

1. Construction of bit product(s)
2. Exhausting conventional adder, combine all product matrixes to form 2 vectors (carry and sum) outputs in first row.
3. Fast carry-propagate adder, remaining two rows are summed to produce the product.

5. Dadda Multiplier

DADDAA multiplier is designed by Luigi Dadda, computer scientist during 1965. DADDAA multiplier is mined form of parallel multiplier [6]. It increases speed and involves less number of gates. The parallel multiplier uses different type of schemes, DADDAA is one of schemes that fundamentally minimize number of adder stages required to perform summation of partial products. By using full adders and half adders number of rows in matrix, number of bits at each summation stage can be reduced. Wallace Tree multiplier is expensive compared to that of DADDAA multiplier. In this paper, DADDAA multiplier is designed and analyzed by considering different methods using full adders involving different logic styles.

5.1. Implementation of DADDAA MULTIPLIER

Algorithm of DADDAA multiplier is based on matrix form, as represents in Figure 5.1. The partial product bits are arranged in first stage are demonstrated in Figure 9. It represents the way of working process in DADDAA multiplier.
5.2. Steps Involved in DADDA Multipliers Algorithm

The wires carry different weights depending on situation of multiplied bits in Figure 5.2. To reduce number of partial products 2 layers of full adders are used. Group wires in two numbers are added with a conventional adder. Ripple Carry Adder is used to add more number of additions that are to be accomplished with carry in and carry outs that are to be chained. By using several full adders it is possible to create a logical circuit to add multiple-bit numbers. Each full adder input is Cin, which is Cout of previous full adder. Since each carry bit "ripples" to next full adder, the architecture of DADDA multiplier algorithm must use RCA procedure. Data is taken with 3 wires and added by using adders. The carry of each stage is added with next two data’s in same stage. At the final stage, same method of ripple carry adder is performed and hence product terms p1 to p8 is obtained in Figure 5.3.
Figure 5.3. 12 x 12 DADDA Process of Multiplication

Figure 5.3 indicates the process of multiplication of partial products is same as dealing with reducing the number of stored bits. Each partial product is shifted one unit to the left from the 2nd bit multiplication ahead.

6. Experimental Results

Table 6.1. Performance Comparison for Normal FIR and Truncated Multiplier

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Normal FIR Multiplier</th>
<th>Truncated Multiplier</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>4 bit</td>
<td>8 bit</td>
</tr>
<tr>
<td>Power(mw)</td>
<td>0.26</td>
<td>1.11</td>
</tr>
<tr>
<td>Delay(ns)</td>
<td>11.409</td>
<td>4.040</td>
</tr>
<tr>
<td>No of LUT’s</td>
<td>31</td>
<td>128</td>
</tr>
<tr>
<td>Memory(kb)</td>
<td>238284</td>
<td>254668</td>
</tr>
</tbody>
</table>

Table 6.2. Performance Comparison for Wallace Tree and Dadda Multiplier

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Wallace Tree Multiplier</th>
<th>Dadda Multiplier</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>4 bit</td>
<td>8 bit</td>
</tr>
<tr>
<td>Power(mw)</td>
<td>0.26</td>
<td>0.31</td>
</tr>
<tr>
<td>Delay(ns)</td>
<td>11.018</td>
<td>4.040</td>
</tr>
<tr>
<td>No of LUT’s</td>
<td>31</td>
<td>36</td>
</tr>
<tr>
<td>Memory(kb)</td>
<td>241292</td>
<td>260428</td>
</tr>
</tbody>
</table>

Table 6.1 and Table 6.2 shows the implementation results using Xilinx ISE simulator. By analyzing the experimental results it is clear that the delay requirement for the entire WALLACE Tree, DADDA, Truncated and normal FIR multipliers are same for 8 bit as
compared to 12 bit operation. The speed of execution for the WALLACE Tree, DADDAD, Truncated and normal FIR multipliers are different. In the same way the power consumption is better in DADDAD and WALLACE Tree compared with the Truncated and normal FIR multipliers.

7. Integrated ALU

Figure 7.1. RTL Schematic of Integrated ALU

Figure 7.1 represents the RTL schematic for proposed design of 12 bit ALU with MCMAT based DADDAD multiplier. When compared to normal ALU and FIR Filter, proposed design of 12 bit Integrated ALU will gives better performance. The execution process of 12 bit integrated ALU is faster than normal ALU and FIR Filter.

Table 7.1. Performance Comparison for Normal ALU, FIR Filter and 12bit Integrated ALU with Wallacetree and Dadda Multiplier

<table>
<thead>
<tr>
<th>Parameter</th>
<th>12 bit Normal ALU</th>
<th>12 bit FIR Filter</th>
<th>12 bit Integrated ALU</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>12 bit ALU with</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td>DADDAD multiplier</td>
</tr>
<tr>
<td>Power(mw)</td>
<td>1.13</td>
<td>2.93</td>
<td>6.43</td>
</tr>
<tr>
<td>Delay(ns)</td>
<td>17.859</td>
<td>4.040</td>
<td>17.859</td>
</tr>
<tr>
<td>No of LUT”S</td>
<td>130</td>
<td>337</td>
<td>740</td>
</tr>
<tr>
<td>Memory(kb)</td>
<td>252136</td>
<td>259372</td>
<td>382764</td>
</tr>
</tbody>
</table>
The performance comparisons of proposed 12 bit integrated ALU design is shown in Table 7.1. The 12 bit ALU with MCMAT based WALLACE Tree multiplier will give better results than 12 bit ALU with MCMAT based DADDA multiplier. The RTL schematic of 12 bit ALU with WALLACE Tree multiplier is similar to the RTL schematic of 12 bit ALU with DADDA multiplier.

8. Conclusion

In this paper direct form of digital FIR filter is recommended. It reduces area due to decrease of number of elements which are structural adders and storage elements. The design of 8 bit and 12 bit multiplier are proposed and the parameters such as power, area and delay for DADDA, normal multiplier and WALLACE Tree multiplier are compared. At last it is observed that truncated multiplier is not much efficient in terms of power factor and DADDA multiplier, WALLACE Tree multipliers are efficient in terms of delay and power analysis. The designing of 12 bit Arithmetic Logic Unit is done with MCMAT based 12 bit FIR using WALLACE Tree and DADDA multipliers. The results obtained shows that WALLACE Tree multiplier is efficient than DADDA multiplier, because speed and power are better in WALLACE Tree multiplier. By integrating ALU with MCMAT based digital FIR filter design in the real time applications, the overall speed and area can be improved. In future, it can be designed and implemented for large bit width also.

References

Authors

Jami Venkata Suman, he received his Bachelor of Engineering in Electronics and Communication Engineering from Tontadarya College of Engineering, Gadag, under Visvesvaraya Technological University, Belgaum, Karnataka state, India in the year 2004 andMaster of Technology in VLSI System Design from Annamacharya Institute of Technology and Sciences, Rajampet, under JNTUH, Hyderabad, Andhra Pradesh state, India in the year 2008. He is currently working as an Asst. Prof in the Department of Electronics and Communication Engineering at GMR Institute of Technology, Rajam. His areas of interest include VLSI and Radar Signal Processing.

D. G. Jignash, pursuing Master of Technology in VLSI & Embedded Systems at GMR Institute of Technology, Rajam and Received Bachelor of Technology in Electronics and Communication Engineering from Khader Memorial College of Engineering and Technology, under JNTUH in the year 2012. His areas of interest include VLSI and Embedded Systems.

B. I. Neelgar, he received his B.E. in Electronics and Communication Engineering from Karnataka University, Dharawad, Karnataka State, India in the year 1984 and M. Tech in Digital Electronics and Advanced Communication from Karnataka Regional Engineering College (NITK), Surathkal, Mangalore University, Karnataka State, India in the year 1995. Presently he is pursuing his Ph.D from Jawaharlal Nehru Technological University (JNTUH), Hyderabad and waiting for final Viva-Voce. He is Professor and Head of the Department of Electronics and Communication Engineering at GMR Institute of Technology, Rajam. He has an Industry experience of 4.5 years and Teaching experience of 23 years. His areas of interest include Antennas, Antenna Arrays, Communication and VLSI.