

Dr. Endric Schubert, Univ. Ulm & Missing Link Electronics Lisa Halder, Univ. Ulm & Xilinx Labs Ireland

#### We are

a Silicon Valley based technology company with Offices in Germany. We are Partner of leading electronic device and solution providers and have been enabling key innovators in the automotive, industrial, test & measurement markets to build better Embedded Systems, faster.

### **Our Mission is**

To develop and market technology solutions for Embedded Systems Realization via pre-validated IP and expert application support, and to combine off-the-shelf FPGA devices with Open-Source Software for dependable, configurable Embedded System platforms

### **Our Expertise is**

I/O connectivity and Acceleration of data communication protocols, additionally opening up FPGA technology for analog applications, and the integration and optimization of Open Source Linux and Android software stacks on modern extensible processing architectures.



## **Introduction and Motivation**

Intel<br/>i7-4770Xilinx<br/>Zynq 7045Compute~100 GFLOPS5 GFLOPS (PS)<br/>778 GFLOPS (PL)TDP84 W<20 W (typ)</td>

SoC FPGA as (yet) another computer

SOC FPGA has 4x more compute with ¼ the power dissipation!





http://www.xilinx.com/support/documentation/sw\_manuals/xilinx2015\_4/ug1027-sdsoc-user-guide.pdf (c) 2016 MLEcorp.com EW2016-Session18-MLE-FPGAAccellWithHLS



### **Concept of High-Level Synthesis**





# **Concept of Interface Synthesis**

### **RTL-Designflow:**

- Use of implemented IP-Cores
- Integration of the IP-Core manually
- Implementation of SW and HW for controlling the communication manually

### **HLS-Designflow:**

- Generation of HW-blocks out of source code
   by High-Level-Synthesis tools
- Automatic integration of the HW-Block
   through Interface-Synthesis
- Generation of HW and SW for controlling the communication





### **Design Example**





### Profiling

- Profiling tool: valgrind callgrind
- analyzation of data transfer through ssh via Dropbear SSH
- 128-Bit AES encryption
- Percentage of AES encryption "rijndael\_ecb\_encrypt" very high (40,37%)
- AES implementation out of Dropbear SSH





```
#define ECB_ENC rijndael_ecb_encrypt
int ECB_ENC(const unsigned char *pt, unsigned char *ct,
    symmetric_key *skey)
```



```
#define ECB_ENC rijndael_ecb_encrypt
int ECB_ENC(const unsigned char *pt, unsigned char *ct,
    symmetric_key *skey)
```



modified source code

int rijndael\_ecb\_encrypt\_hw(const unsigned char pt[16], unsigned char ct[20], uint32\_t eK)







modified source code

int rijndael\_ecb\_encrypt\_hw(const unsigned char pt[16], unsigned char ct[20], uint32\_t eK)





### Datamover

- Datamover are generated automatically by SDSoC
- user can influence the generated Datamover by modifiying the dataflow e.g. declaring array as contigous memory

| SDSoC<br>Data Mover | Vivado IP<br>Data Mover | Accelerator<br>IP Port Types | Transfer<br>Size | Contigous<br>Memory Only |
|---------------------|-------------------------|------------------------------|------------------|--------------------------|
| axi_lite            | processing_system7      | register                     |                  |                          |
| axi_fifo            | axi_fifo_mm_s           | bram, ap_fifo, axis          | < 300 B          |                          |
| axi_dma_simple      | axi_dma                 | bram, ap_fifo, axis          | < 8 MB           | ✓                        |
| axi_dma_sg          | axi_dma                 | bram, ap_fifo, axis          |                  |                          |
| zero_copy           | accelerator IP          | aximm master                 |                  | ~                        |
| axi_dma_2d          | axi_dma                 | bram                         |                  | ~                        |



# The simple (and inefficient) Way

Encryption of a 16 Byte Array, which corresponds to one call of rijndael\_ecb\_encrypt



### HW-function (AES) takes 0,44 µs for execution

(c) 2016 MLEcorp.com



















# Analyzation of axi\_dma\_simple

|                                                   | Value    | 0           | 100    | 200           |         | 400                    |      | 1500            | 500                |              | 00 | 1800 | 900         | 1,000    1,            |
|---------------------------------------------------|----------|-------------|--------|---------------|---------|------------------------|------|-----------------|--------------------|--------------|----|------|-------------|------------------------|
| i/axi_interconnect_S_AXI_ACP_M00_AXI_ARADDR[31:0] |          |             |        | 000000        |         |                        | 1    | 00600000        |                    |              |    |      | 00800       |                        |
| i/axi_interconnect_S_AXI_ACP_M00_AXI_AWADDR[31:0] | 0020000  |             |        |               | 0000    | 0000                   |      |                 |                    |              |    |      |             | 0020000                |
| i/axi_interconnect_S_AXI_ACP_M00_AXI_RDATA[63:0]  | b7b44c78 | (*          | fdo    | 17fffff€fffff |         |                        | K 0: | afc6cc4055ba859 |                    |              |    |      | b7b44c7     | 82e2c3b64              |
| i/axi_interconnect_S_AXI_ACP_M00_AXI_WDATA[63:0]  | 0000000  |             |        |               | 0000000 | 00000000               |      |                 |                    |              |    |      |             | 0000000000             |
| /ps7_M_AXI_GP0_ARADDR[31:0]                       | 43c001c0 |             |        |               |         | effff7ee               |      |                 |                    |              |    |      | (4041)()(40 | 40400004 (.            |
| i/ps7_M_AXI_GP0_AWADDR[31:0]                      | 4042005  |             | 43c000 | 28            | X)      | ()(4041 <mark>0</mark> | 028  | XXX4            | 4040002            | 8 X.         | XX |      |             | 40420058               |
| i/ps7_M_AXI_GP0_RDATA[31:0]                       | 0000000  | (           |        |               |         | 0000000                |      |                 |                    | -            |    |      | x0000xx     | oo <u>X</u> ooooooop ) |
| i/ps7_M_AXI_GP0_WDATA[31:0]                       | 0000001  | ···· X··· X | 00     | 020000        | X)      | ()(0000 <mark>0</mark> | 0 0  |                 | 00000 <del>f</del> | <u>o (</u> . | XX |      |             | 00000014               |
| i/axi_interconnect_S_AXI_ACP_M00_AXI_RVALID       | 0        |             |        |               |         |                        |      |                 |                    |              |    |      |             |                        |
| i/axi_interconnect_S_AXI_ACP_M00_AXI_WVALID       | 0        |             |        |               |         |                        |      |                 |                    |              |    |      |             |                        |
| i/ps7_M_AXI_GP0_RVALID                            | 0        |             |        |               |         |                        |      |                 |                    |              |    |      |             |                        |
| i/ps7_M_AXI_GP0_WVALID                            | 0        |             |        |               |         |                        |      |                 |                    |              |    |      |             |                        |
|                                                   |          |             |        |               |         |                        |      |                 |                    |              |    |      |             |                        |

| Frequency:                                                                                                           | 142 MHz                                                          | μs                                                             |
|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|----------------------------------------------------------------|
| Initialization:<br>Read access:<br>HW-function:<br>Write access:<br>Waiting time:<br>Reading of the status register: | 614 HWTZ<br>31 HWTZ<br>166 HWTZ<br>7 HWTZ<br>82 HWTZ<br>218 HWTZ | 4,32 μs<br>0,22 μs<br>1,17 μs<br>0,05 μs<br>0,58 μs<br>1,53 μs |
| Sum:                                                                                                                 | 1118 HWTZ                                                        | 7,87 µs                                                        |



## **Dataflow optimization**





### Conclusion

- Designing FPGA accelerators for (legacy) Software
  - HLS and Interface Synthesis automates migration from SW to HW
  - Xilinx SDSoC a new and efficient methodology for Programmable SoC
  - Not all SW constructs are fully supported (yet) but efficient work-around exist
  - ==> Development times reduced from weeks down to days
- Please visit us at the show:



MLE – Hall 2 Booth 2-421

