#### **ICCS 2023 & ICITES 2023**



#### A Resource-efficient FIR Filter Design Based on an RAG Improved Algorithm

#### Mengwei Hu, Zhengxiong Li, Xianyang Jiang\* Wuhan University

## **Presenter: Zhengxiong Li**



#### Outline

- 1. Motivation
- 2. Methodology
- 3. Synthesis Results
- 4. Conclusion

# 1. Motivation

- Comes from a real-world project about RFIC calibration
- Used to calculate the leak energy

#### Demands:

- High-speed
- High-performance
- Limited resources



## 1. Motivation

Basic information about FIR filter:

FIR filter's transfer function:  $H(z) = \sum_{n=0}^{N-1} h(n) z^{-n}$ 

FIR filter's differential equation:  $y(n) = \sum_{k=0}^{N-1} h(k)x(n-k)$ 

FIR filters' unit impulse response:  $h(n) = \sum_{i=0}^{N-1} h(i)\delta(n-i)$ 

#### **ICCS 2023 & ICITES 2023**

1. Motivation

Some existing FIR filters' architecture







Most of the resources are used to calculate addition and multiplication.

The highest clock frequency is restricted by critical path.





Steps 0: Separate the coefficients into different sets.

Set *coeff*  $\longrightarrow$  All filter's coefficients Set *coef f<sub>r</sub>*  $\longrightarrow$  smaller coefficients Set *coef f<sub>s</sub>*  $\longrightarrow$  larger coefficients Set *cost<sub>n</sub>*  $\longrightarrow$  adder depth for (1, 2, 3, 4) Set *cost<sub>o</sub>*  $\longrightarrow$  other adder depth



**Step 1:** Take the absolute values of all coefficients and store them in *coeff* set;

**Step 2:** Remove the duplicate coefficients and coefficients with value  $2^n$ , and the number of remaining coefficients is denoted as N; **Step 3:** The smaller coefficients are deposited into set  $coef f_r$ , and the number of coefficients deposited is  $\frac{N}{2}$  or  $\frac{N-1}{2}$ ;

**Step 4:** Deposit the remaining larger coefficients into set  $coef f_s$ ;

**Step 5:** Divide the even numbers in set  $coef f_r$  by  $2^n$  to obtain the base;

**Step 6:** Look up the table to get the depth of adder corresponding to each base number, store these coefficients in set  $cost_n$ , and store the coefficients which cannot be categorized by the table in set  $cost_o$ ;

**Step 7:** Realize coefficients in set *cost*<sub>1</sub>;

**Step 8:** Check the sum/difference of coefficients in all realized cost sets, realize the coefficients in higher cost sets by the sum/difference of coefficients and the realized coefficients, and finally realize the coefficients in set *cost*<sub>o</sub>;

**Step 9:** Realize the coefficients in set  $coef f_s$  according to the hardware structure of systolic FIR filter with symmetric coefficients.



#### **Concise summary:**

For those smaller one, shift and add.

For those bigger one, multiply.

Aim:

Keep the balance of DSP consumption and LUT consumption to achieve resource-efficient.



#### 3. Synthesis Results

| Resources/<br>Performance | Algorithm Architecture   |                  |                              |  |
|---------------------------|--------------------------|------------------|------------------------------|--|
|                           | Pulsed Fully<br>Parallel | RAG<br>Algorithm | RAG<br>Improved<br>Algorithm |  |
| LUT                       | 574                      | 4956             | 934                          |  |
| FF                        | 1286                     | 528              | 904                          |  |
| DSP                       | 4                        | 0                | 2                            |  |
| Power(W)                  | 32.8                     | 234.7            | 38.6                         |  |
| Temp(°C)                  | 70.8                     | 125.0            | 79                           |  |

Table 1: 64<sup>th</sup> order FIR filter



#### 3. Synthesis Results

| Resources<br>/<br>Performan<br>ce | Algorithm Architecture      |                  |                              |  |
|-----------------------------------|-----------------------------|------------------|------------------------------|--|
|                                   | Pulsed<br>Fully<br>Parallel | RAG<br>Algorithm | RAG<br>Improved<br>Algorithm |  |
| LUT                               | 358                         | 695              | 555                          |  |
| FF                                | 679                         | 287              | 538                          |  |
| DSP                               | 4                           | 0                | 2                            |  |
| Power(W)                          | 21.34                       | 24.52            | 19.75                        |  |
| Temp(°C)                          | 54.8                        | 59.3             | 52.6                         |  |

Table 2: 32<sup>th</sup> order FIR filter

| Resources/      | Algorithm Architecture   |                  |                              |  |
|-----------------|--------------------------|------------------|------------------------------|--|
| Performanc<br>e | Pulsed Fully<br>Parallel | RAG<br>Algorithm | RAG<br>Improved<br>Algorithm |  |
| LUT             | 141                      | 212              | 185                          |  |
| FF              | 203                      | 120              | 222                          |  |
| DSP             | 4                        | 0                | 2                            |  |
| Power(W)        | 41.762                   | 33.673           | 36.75                        |  |
| Temp(°C)        | 83.4                     | 72.1             | 76.4                         |  |

Table 3: 8<sup>th</sup> order FIR filter

## 4. Conclusion



Comparison of 64th order filter hardware





# Thank you!





# If you have any question, feel free to ask!

