# High-Level-Synthesis for FPGA Implementation of Network Protocols

Simon Lever, Univ. Ulm

▶ Dr. Endric Schubert, Univ. Ulm / Missing Link Electronics

#### We are

a Silicon Valley based technology company with offices in Germany. We are partner of leading electronic device and solution providers and have been enabling key innovators in the automotive, industrial, test & measurement markets to build better Embedded Systems, faster.

#### **Our Mission is**

To develop and market technology solutions for Embedded Systems Realization via pre-validated IP and expert application support, and to combine off-the-shelf FPGA devices with Open-Source Software for dependable, configurable Embedded System platforms

#### **Our Expertise is**

I/O connectivity and acceleration of data communication protocols, additionally opening up FPGA technology for analog applications, and the integration and optimization of Open Source Linux and Android software stacks on modern extensible processing architectures.



## **Motivation: Network Processing for Embedded Systems**

 10 GigE will soon push from data center into embedded markets



Transporting 1 bit per second needs 1 Hz

- 1 GigE → 1 CPU at 1 GHz
- 10 GigE → 4 CPUs at 2.5 GHz





# **Design Choices for Network Processing in SoC FPGAs**

#### SoC FPGA as (yet) another computer

|         | Intel<br>i7-4770 | Xilinx<br>Zynq 7045              |
|---------|------------------|----------------------------------|
| Compute | ~100 GFLOPS      | 5 GFLOPS (PS)<br>778 GFLOPS (PL) |
| TDP     | 84 W             | <20 W (typ)                      |

SOC FPGA has 4x more compute With ¼ the power dissipation!



[http://www.xilinx.com/products/technology/dsp.html]



#### Network Stack in RTL from Fraunhofer Heinrich-Hertz-Institute

• Brings full TCP/UDP/IP connectivity to FPGAs even when there is no CPU available. Accelerate CPUs by offloading TCP/UDP/IP processing into programmable logic.







#### **Network Protocol Acceleration Platform Architecture**



Network protocol processing at application layer (ISO Layer 7) can more efficiently be implemented via a programming approach (in C or C++) than by digital circuit design (in VHDL or Verilog).



# **High-Level Synthesis Design Flow for SoC FPGA**

Input C/C++/SystemC into High-Level Synthesis to generate VHDL/Verilog code





## **Working Principles of High-Level Synthesis**

• Design automation runs scheduling and resource allocation to generate RTL code comprising data path plus state machines for control.





## **Benefits of High-Level Synthesis**

 Automatic performance optimization via parallelization at dataflow level

```
void top (a,b,c,d) {
...
func_A(a,b,i1);
func_B(c,i1,i2);
func_C(i2,d)

return d;
}
```



 Automatic interface synthesis and code generation for variety of real-life HW/SW connectivity

| Bus Interfaces  AXI4 |      |        | Argument          | Variable Pass-by- value |   | Pointer<br>Variable |        | Array Pass-by- reference |     |   | Reference<br>Variable<br>Pass-by-reference |        |        |        |    |   |
|----------------------|------|--------|-------------------|-------------------------|---|---------------------|--------|--------------------------|-----|---|--------------------------------------------|--------|--------|--------|----|---|
|                      |      |        | Туре              |                         |   | Pass-by-reference   |        |                          |     |   |                                            |        |        |        |    |   |
| Stream               | Lite | Master |                   | Interface Type          | I | IO                  | 0      | I                        | IO  | 0 | I                                          | IO     | 0      | I      | IO | 0 |
|                      |      |        | <b>=</b>          | ap_none                 | D |                     |        | D                        |     |   |                                            |        |        | D      |    |   |
|                      |      |        | $\Leftrightarrow$ | ap_stable               |   |                     |        |                          |     |   |                                            |        |        |        |    |   |
|                      |      |        | $\Leftrightarrow$ | ap_ack                  |   |                     |        |                          |     |   |                                            |        |        |        |    |   |
|                      |      |        | $\Leftrightarrow$ | ap_vld                  |   |                     |        |                          |     | D |                                            |        |        |        |    | D |
|                      |      |        | $\Leftrightarrow$ | ap_ovld                 |   |                     |        |                          | D   |   |                                            |        |        |        | D  |   |
|                      |      |        | $\Leftrightarrow$ | ap_hs                   |   |                     |        |                          |     |   |                                            |        |        |        |    |   |
|                      |      |        | $\Leftrightarrow$ | ap_memory               |   |                     |        |                          |     |   | D                                          | D      | D      |        |    |   |
|                      |      |        | $\Leftrightarrow$ | ap_fifo                 |   |                     |        |                          |     |   |                                            |        |        |        |    |   |
|                      |      |        | $\Leftrightarrow$ | ap_bus                  |   |                     |        |                          |     |   |                                            |        |        |        |    |   |
|                      |      |        | $\Leftrightarrow$ | ap_ctrl_none            |   |                     |        |                          |     |   |                                            |        |        |        |    |   |
|                      |      |        | $\Leftrightarrow$ | ap_ctrl_hs              |   |                     | D      |                          |     |   |                                            |        |        |        |    |   |
|                      |      |        |                   | ap_ctrl_chain           |   |                     |        |                          |     |   |                                            |        |        |        |    |   |
|                      |      |        |                   |                         |   | Sup                 | ported | l Interf                 | ace |   | Unsi                                       | upport | ed Int | erface |    |   |



# Visualization and User Interaction in High-Level Synthesis Tool





# Design Example for Extending NPAP with High-Level Synthesis

Network Time Protocol (NTP)
 Packet according to RFC5905

| I acite ac                      | COIG     | יי סייי         | J   (               |
|---------------------------------|----------|-----------------|---------------------|
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | 67890123 | 4 5 6 7 8 9 0 1 | Byte Nr.<br>Bit Nr. |
| LI VN Mod Stratum               |          |                 |                     |
| Root                            | 128 Bit  |                 |                     |
| Root Dis                        | 128 BIT  |                 |                     |
| Reference                       |          |                 |                     |
| Reference                       | 128 Bit  |                 |                     |
| Origin Ti                       | 120 Bit  |                 |                     |
| Recieve T                       | 128 Bit  |                 |                     |
| Transmit 1                      |          |                 |                     |
| Extensio                        | Variable |                 |                     |
| Extensio                        | Variable |                 |                     |
| Key Id                          | 32 Bit   |                 |                     |
| Dg                              | 128 Bit  |                 |                     |

 Block diagram of the NTP Server implemented in Programmable Logic





## **Implementation Example**

- Implement the NTP server as an "IP core" using Vivado HLS
- The network processing stack, including the NTP server IP core, runs on Xilinx Zynq-7000 SoC
- Xilinx ZC706 evaluation kit connected via SFP+ with Intel 10GbE NIC inside PC
- PC runs NTP client application to query Zynq-implemented NTP server





#### **Conclusion and References**

- Significant productivity increase for protocol oriented or dataflow based design blocks.
- Easy to adopt: Known languages C/C++ combined with known tool chain.
- → Add this to your bag of tricks!

2015-02-25

- UG998 Introduction to FPGA Design Using High-Level Synthesis
- UG871 Vivado Design Suite Tutorial: High-Level Synthesis
- XAPP1209 Designing Protocol Processing Systems with Vivado High-Level Synthesis
- UG949 UltraFast Design Methodology Guide for the Vivado Design Suite



#### **Contact Information**

Simon Lever <u>simon.lever@uni-ulm.de</u> <u>simon.lever@MLEcorp.com</u>

Dr. Endric Schubert

<u>endric.schubert@uni-ulm.de</u>

<u>endric@MLEcorp.com</u>

Phone US: +1 (408) 320-6139

2015-02-25

Phone DE: +49 (731) 141149-66



