100/200/400G Fast FPGA RAID (FFRAID)
FFRAID
100/200/400G Fast FPGA RAID
FFRAID: Accelerate NVMe FPGA RAID Data Recording and Replay
Some High-Speed Data Acquisition Systems do require storing the data in non-volatile memory. For those cases where the read/write data rate exceeds the capabilities of even the highest performance NVMe SSDs, MLE has developed an NVMe Fast FPGA RAID (FFRAID) solution:
Now, you can transfer bulky data from multiple sensors to a RAID of NVMe SSDs at speeds up to 400 Gbps. MLE’s Fast FPGA RAID implements a channel-based architecture, supports data-in-motion pre- and post-processing and is highly scalable with regards to bandwidth and recording capacity. Multiple systems can further be cascaded via high-accuracy IEEE time-synchronization for faster or deeper recording.
Adaptable signal front-ends support many different I/O standards in a “mix & match” fashion.
MLE’s Fast FPGA RAID is compatible with Linux Software-RAID (via the Linux MD driver). This allows recording at high data rates and replaying at slower speeds, or vice versa.
Channel-based Architecture of Fast FPGA RAID (FFRAID)
MLE’s Fast FPGA RAID (FFRAID) implements a channel-based architecture where each data source/sink can be associated with a dedicated RAID engine and a dedicated storage space. Each channel can have 10/25/50/75/100 Gbps, or combinations thereof.
This channel-based architecture along with the combination of FPGA NVMe Recording Stack plus a well-tuned PCIe setup, delivers a best-in-class price/performance ratio for high-speed data acquisition, recording and replay. MLE’s multi-core NVMe Host Controller Subsystem supports dedicated NVMe queues per SSD in a PCIe Peer-to-Peer communication.
Applications
-
- Autonomous Vehicle Path Record & Replay
- Automotive / Medical / Industrial Test Equipment
- High-speed Radar / Lidar / Camera Data Acquisition & Storage
- Very Deep Network Packet Capture of Ethernet or IPv4 or TCP/UDP Data
Key Features
- Scalable from 100Gbps to 400Gbps
- Cascading of multiple systems with time-sync
- Start-Pause-Stop Data Recording
- Pre-trigger Data Recording in circular buffers
- Adaptable signal front-ends
- Read/write compatible with Linux Software-RAID
Scalability
MLE’s Fast FPGA RAID supports a wide range of NVMe SSDs and can be scaled from M.2 SSDs for small and light-weight embedded systems up to large 19” racks using high-performance U.2 or U.3 SSDs. Scalability also includes selecting from different SSD capacities and Drive-Writes-per-Day (DWPD) models. Here a table of possible recording times in minutes:
Recording Speed (Gbps) | ||||||||
Storage (TiB) | 100 | 150 | 200 | 250 | 300 | 350 | 400 | |
5 | 7.2 | 4.8 | 3.6 | 2.9 | 2.4 | 2 | 1.8 | |
10 | 14.3 | 9.5 | 7.2 | 5.7 | 4.8 | 4.1 | 3.6 | |
15 | 21.5 | 14.3 | 10.7 | 8.6 | 7.2 | 6.1 | 5.4 | |
20 | 28.6 | 19.1 | 14.3 | 11.5 | 9.5 | 8.2 | 7.2 | |
25 | 35.8 | 23.9 | 17.9 | 14.3 | 11.9 | 10.2 | 8.9 | |
30 | 42.9 | 28.6 | 21.5 | 17.2 | 14.3 | 12.3 | 10.7 | |
35 | 50.1 | 33.4 | 25.1 | 20.0 | 16.7 | 14.3 | 12.5 | |
40 | 57.3 | 38.2 | 28.6 | 22.9 | 19.1 | 16.4 | 14.3 | |
45 | 64.4 | 42.9 | 32.2 | 25.8 | 21.5 | 18.4 | 16.1 | |
50 | 71.6 | 47.7 | 35.8 | 28.6 | 23.9 | 20.5 | 17.9 | |
Recording Time in Minute(s) |
FFRAID High Speed Data Recording Use Cases
Besides record/replay of raw data we support data-in-motion pre- and post-processing that enables you to add your custom algorithms for indexing and metadata generation, on-the-fly data decimation, or running in “spy-mode” as a transparent data proxy.
- Ingress data from the high-speed sensors are transferred and recorded at-speed and as-is onto the Fast FPGA RAID.
Exemplary Evaluation Reference Design of Fast FPGA RAID (FFRAID)
The implementation example of MLE 100/200/400G Fast FPGA RAID is built with AMD EPYC server-grade motherboard, NVMe FPGA RAID cards, and U.3 SSDs. This evaluation reference design is available now for testing with your front-end devices or SSDs.
Test with Your Front-End Device Now!
Test with Your SSDs Now!
Pricing
MLE’s license fee structure reflects the needs for best-in-class price/performance ratio for 100/200/400G Fast FPGA RAID:
Product Name | Deliverables | Example Pricing |
---|---|---|
Evaluation Reference Design (ERD) | Available upon request as FPGA design project, with optional customizations (different target device, different transceivers, etc) | Upon Request |
Turnkey System | Completely hardware-software integrated ready-to-use system, available sizes from medium-sized embedded PC to 19″ rack mount, lab-use or ruggedized. You can bring your own SSDs, or choose SSDs from our many options depending on the storage capacity required. | Starting at $14,800.- (depends on FPGA device, line rate, and SSDs) |
Fast FPGA RAID Card | Fast FPGA RAID Card based on off-the-shelf 3rd party FPGA cards, including AMD Alveo U50 / U55C with Ultrascale+ and HBM, AMD Alveo V80 with Versal and HBM, and Intel/Altera Agilex 7 AGF014 with DDR4. | Starting at $8,000.- |
Documentation
- FPGA Based 400GBit/s Data Recorder – Insight into Different Pitfalls and Design Choices
- Linux ZynqMP PS-PCIe Root Port Driver (A software-only, non-accelerated alternative described by the Xilinx Wiki)
- Example designs on GitHub from Opsero (for PS-based NVMe supporting various FPGA and MPSoC evaluation boards)
Frequently Asked Questions
Does the "100/200/400G Fast FPGA RAID (FFRAID)" data recorder, support file systems?
No, the MLE 100/200/400G Fast FPGA RAID data recorder is so-called Block Storage. So, no file systems are not supported. For each data transfer the user application logic selects a start and maximum end address, and then data is written to flash in a linear fashion. This achieves best performance and avoids write amplifications.
Does the "100/200/400G Fast FPGA RAID (FFRAID)" data recorder support drive partitions?
Partitions are not explicitly supported. However, the user application logic can use the 100/200/400G Fast FPGA RAID data recorder to read the SSD’s partition table and then set up transfers with start and maximum end address to be aligned to partitions.
Does the "100/200/400G Fast FPGA RAID (FFRAID)" data recorder support NVMe namespaces?
Only one single namespace is supported per SSD.
How many SSDs can be connected to the "100/200/400G Fast FPGA RAID (FFRAID)" data recorder?
The standards for the 100/200/400G Fast FPGA RAID data recorder is 4/8/16 SSDs. The number of SSDs can be adjusted to your application within certain limits, for example: the accumulated sustained write speed should be faster then the incoming data stream, or to many SSDs can cause latency issues. However, we can customize the 100/200/400G Fast FPGA RAID data recorder for your application to support more complex PCIe topologies. Please ask us for more details.
How many NVMe IO Queues does the "100/200/400G Fast FPGA RAID (FFRAID)" data recorder support and what is the depth of the NVMe IO Queue?
The 100/200/400G Fast FPGA RAID data recorder currently supports one single IO Queue per SSD. This IO Queue can have up to 128 entries, each with up to 128 KiB data. I.e. you can have up to 16 MiB of “data in flight” per SSD. If needed, we can change the depth and size of this IO Queue. However, given the needs of streaming applications increasing the number of IO Queues may not be advantageous.
Does the "100/200/400G Fast FPGA RAID (FFRAID)" data recorder support PCIe Peer-to-Peer?
Yes, this is supported. Peer-to-Peer transfers can be very attractive as it frees up the host CPU. Team MLE can customize the 100/200/400G Fast FPGA RAID data recorder for your application to support many more complex PCIe topologies, including multiple direct-attached SSDs, multiple SSDs connected via a 3rd party PCIe switch chip, including PCIe Peer-to-Peer. Please ask us for more details.
What are the user interfaces of the "100/200/400G Fast FPGA RAID (FFRAID)" data recorder?
The 100/200/400G Fast FPGA RAID data recorder can be configured via an AXI4-Lite register space which can also be handled by a provided Linux handler. This register space is also used to set up and control streaming transfers. The actual data exchange then is handled via an AXI4-Stream master and slave. Some GPIO style status signals for informational purposes, like LEDs, are provided as well. This is documented in our developers guide.
How many parallel streams can be processed?
Currently, the 100/200/400G Fast FPGA RAID (FFRAID) data recorder handles 16 independent data streams. To save resources, the number of streams can be reduced without loosing the overall performance by widening the data paths.
Does the "100/200/400G Fast FPGA RAID (FFRAID)" data recorder support m.2 PCIe connectivity?
Yes. Because the 100/200/400G Fast FPGA RAID data recorder is agnostic to the formfactor of your SSD m.2, u.3/u.3, EDSFF and so on are supported, as long as your SSD “speaks the NVMe protocol” and not SATA nor SAS.
What are the best SSDs to use and from which vendor?
While, again, the 100/200/400G Fast FPGA RAID (FFRAID) data recorder is compatible to work with any NVMe SSD, there are a couple of other aspects to keep in mind when selecting an NVMe SSD: Noise, vibration, harshness, temperature throttling, local RAM buffers, SLC, MLC, TLC, QLC, 3D-XPoint, etc. To enable our customers to deliver dependable performance solutions, we have worked with a set of 3rd party SSD vendors and would be happy to give you technical guidance in your project. Please inquire.
Robo/TSN
Robo/TSN
Virtualized Industrial Networks
Robo/TSN Virtualizing Industrial Networks
Artificial Intelligence algorithms and machine vision with high-resolutions and fast frame-rates continue to drive towards higher bandwidth demands in real-time industrial network. At the same time lines between IT and OT start to blur as modern datacenter hardware can reduce TCO for OT equipment, significantly.
Under the working title “Robo/TSN”, MLE and partners have developed network virtualization technology for multi-Gigabit TSN (Time Sensitive Networking). Solutions are based on open standards and include access points and SmartNICs which scales to 100 Gbps, and more. Use cases are:
-
- Connect high speed sensors in the field with AI engines in edge cloud data centers.
- Tunnel multiple protocols such as PCIe, MIPI CSI-2, GMSL, field buses like Ethercat or Profinet via virtual connections over high-speed TSN.
- Virtualize PLCs to run on datacenter infrastructure with advantages like virtualization, containerization, redundancy, backup organization etc. and without any compromises concerning real-time and high-speed requirements.
- Use all advantages imaginable by having a high speed time sensitive network through the whole production plant.
Robo/TSN for Industrial Network with Multi-Gigabit Ethernet & Multi Protocols
AI and videos need high data rates and are breaking industrial networking. In industrial environments and control technology, real time requirements are common practice. Since camera technology, image recognition or AI become more and more usual, hard real time requirements combine with extremely high data rates. Sensors provide up to 25 Gbps data which has to be transferred, processed and evaluated real time with reliable low latency.
MLE’s backbone-oriented approach allows it to tunnel modern multi-Gig sensor data (GigEVision, PCIe, MIPI CSI-2, GMSL, …) as well as industrial protocols like Ethercat or Profinet. Robo/TSN builds on patented and patent pending technology from MLE (US Patent Nos. 9,209,828 10,708,199 10,848,442 11,356,388 11,695,708, other Patents Pending).
Features and Benefits
-
- Bridging/ Tunneling of several protocols like PCIe, Ethercat, Profinet, Ethernet, CAN, etc.
- Scalable from 1 to 100 Gbps
- Precision time synchronization with IEEE TSN or IEEE 1588 v2 (CERN White Rabbit)
- Hardware accelerated deterministic transport with Ultra Low Latency (RTT < 600 ns)
- Reliable transports via TCP/IP and/or Quad-RP/IP
- Optional security features MACsec, IPSec, TLS
Real-time Compliance
Time Sync IEEE 802.1AS (IEEE 1588 profile)
- Bounded Low Latency
- IEEE 802.1Qav Credit Based Traffic Shaper
- IEEE 802.1Qbu / 802.3br Preemption
- IEEE 802.1Qbv Scheduled Traffic
- IEEE 802.1Qcr Async Traffic Shaping
- Reliability: IEEE 802.1Qca Path Control
- IEEE 802.1CB Frame Replication & Elim.
- Dedicated Resources & API
- IEEE 802.1Qcc TSN Configuration
- IEEE 802.1Qat Stream Reservation
- IEEE 802.1CS Link-Local Reservation
Robo/TSN Module for Industrial Network
- Linux device drivers (GPL sources)
- Application-specific expert design service
- Appliance implementation
- License for ASIC/ FPGA Full System Stack
- Pre-configured SmartNIC PCIe-Card, ready-to-run
Fraunhofer Heinrich-Hertz Institute
Founded in 1949, the German Fraunhofer-Gesellschaft undertakes applied research of direct utility to private and public enterprise and of wide benefit to society. With a workforce of over 23,000, the Fraunhofer-Gesellschaft is Europe’s biggest organization for applied research, and currently operates a total of 67 institutes and research units. The organization’s core task is to carry out research of practical utility in close cooperation with its customers from industry and the public sector.
Fraunhofer HHI was founded in 1928 as “Heinrich-Hertz-Institut für Schwingungs- forschung“ and joined in 2003 the Fraunhofer-Gesellschaft as the “Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut„. Today it is the leading research institute for networking and telecommunications technology, “Driving the Gigabit Society” .
Fraunhofer Institute for Photonic Microsystems
Fraunhofer IPMS is a worldwide leader in research and development services for electronic and photonic microsystems in the fields of Smart Industrial Solutions, Medical & Health applications and Improved Quality of Life. Innovative products can be found in all large markets – such as ICT, consumer products, automobile technology, semiconductor technology, measurement and medical technology – products which are based upon various technology developed at Fraunhofer IPMS.
Data Diodes
Data Diodes
Data Diodes - Unidirectional Security Gateway
A data diode or unidirectional network bridge / unidirectional security gateway is a piece of hardware used to connect two separated networks with the purpose to allow data to travel only in one direction, specifically, from one network into another. Applications are found in high security environments where they connect two or more networks of differing security classifications while making it physically impossible to transfer data in the direction from the lower to the higher security classification.
For this, MLE has partnered with Fraunhofer HHI to provide the industry-proven TCP/UDP/IP Network Protocol Acceleration Platform (NPAP) in form of NPAC, a PCIe Network Protocol Accelerator Card with quad-port 10G Ethernet. NPAC-40G implements reliable high-bandwidth low-latency TCP/UDP/IP transport plus Linux PCIe stream device drivers and can run customizable In-Network Processing such as red/ black network separation functionality on the integrated FPGA subsystem.
Features and Benefits of Data Diodes
- FHHL PCIe Card, PCIe 3.1 x8
- 4x SFP+ for 10 Gig Ethernet
- Intel Stratix 10 GX 400 FPGA, hardened
- Tx-only and Rx-only (data-diode) network paths disconnect at PCB level or at circuit level
- Optional TCP/IP Tx-only or Rx-only (FPGA-integrated TCP endpoint)
- Optional In-Network Processing for Deep Packet Inspection and/or Firewall
- Optional access logging
- Customizable, Ready-to-Run
Applications
- Sending status Information from sensitive industrial plants
- Sending video streams from sensitive video equipment / cameras
- Protect classified data in high security networks and prevent it from leaking to low security networks, e.g. in defense
- Critical Infrastructure and Industrial Internet of Things (IIoT)
- Power plants and nuclear power plants
- Power and water utilities and providers
- Oil and gas deployments
- Transportation, rail and air
- Intelligence & Defense
- Data Center
- Tactical and removable media solutions
- Commercial
- Financial services
- Manufacturing
- Cloud services
- Telecommunications providers
- Security Information and Event Management logs
- Intrusion Detection logs
Availability
MLE Data Diodes are available as a licensable full system stack or delivered as an integrated hardware/firmware/software solution in form of customizable FPGA-based Network Interface Cards (NIC) or as FPGA-based appliances.
Deliverables include:
- Pre-configured PCIe Card, ready-to-run
- Linux device drivers (GPL sources)
- Application-specific expert design service (optional)
- Appliance implementation (optional)
Documentation
- FPGA-based Data Diodes
- Ultra-Reliable, Low-Latency, Deterministic Networking
- The Function Accelerator Card – NPAC-40G
Auto/TSN
Auto/TSN
In-Vehicle Network - Auto/TSN
Auto/TSN stands for automotive data over Time-Sensitive Networks which is an in-vehicle network infrastructure based on open standards such as IEEE Ethernet. Auto/TSN is the results of a collaborative effort between MLE and MLE partners, Fraunhofer HHI and Fraunhofer IPMS.
Fundamentally, Auto/TSN virtualizes the in-vehicle network infrastructure: Key objective is to reduce costs, increase scalability and enable upgradability for next-generation automotive architectures including electric and/or autonomous vehicles.
By tunneling sensor data along with PCIe and NVMe over Real-Time Multi-Gigabit Automotive Ethernet Auto/TSN simplifies the wire harness and enables more centralized architectures with higher levels of hardware / software integration. By offering PCIe as a common interface (for sensor-to-CPU and CPU-to-CPU connectivity) Auto/TSN different semiconductor SoCs become interchangeable. This significantly reduces semiconductor dependencies and infrastructure costs at the same time.
Auto/TSN is highly scalable and supports line-rates up to 50 Gbps in FPGA and 100 Gbps in ASIC. Based on IPv4, the space for addressing nodes is 32 bits wide. The small hardware footprint allows zonal gateways with many ports.
Auto/TSN is “software-defined” and builds from open standards such as IEEE 802.1Q TSN, IEEE 802.3 Ethernet, IETF TCP/IP, MIPI CSI-2, PCIe 4.0 and NVMe 1.4 and open-source Linux which eases hardware / software / system upgradability.
Features & Benefits of In-Vehicle Network Auto/TSN
Auto/TSN is a network infrastructure with a system/software focus which reduces the complexity of connecting sensors and centralized computers because it follows de-facto standards of open source network APIs such as RDMA, Linux netdev or SOME/IP.
Benefits include:
- Significant cost-down for in-vehicle networking and wire harnesses
- Digital circuit implementation for zero CPU load
- Deterministic and very low transport latencies, typ. within 5 micro-seconds
- Low footprint enables ASIC or FPGA implementation
The current implementation of TSN supports time-synchronization (IEEE 802.1AS) with 20 nanosecond precision, traffic shaping (IEEE 802.1Qav, 802.1Qbv), frame replication (IEEE 802.1CB) and stream prioritization (IEEE 802.1Qat) for high reliability, low-cost redundancy for functional safety and real-time behavior. Because for PCIe “best effort” is not sufficient, Auto/TSN implements a reliable transport on top of TSN which is compliant to IETF TCP/IP.
For in-vehicle network security Auto/TSN can be complemented with state-of-the-art IEEE 802.1AE MAC Security Entities (MACsec) and/or IETF RFC 6071 Internet Protocol Security (IPsec) and/or IETF RFC8446 Transport Layer Security (TLS).
Various Connectivity Schemes are supported:
- Single CPU (PCIe Root-Port) to multiple devices (PCIe Endpoints)
- Single CPU to multiple SSDs via NVM Express (NVMe)
- Multiple CPUIs to multiple NVMe SSDs (via NVMe proxy)
- Multiple CPUs to multiple CPUs via Inter-System Bridge (a.k.a. PCIe NTB)
- Asymmetric sensor connectivity, e.g. MIPI CSI-2 to PCIe
- IEEE 1722 style video transport
Data-in-motion processing runs on dedicated on-chip full accelerators and frees up the CPUs from protocol handling. Our patented and patent-pending Heterogeneous Packet-Based Transport mechanism packetizes and de-packetizes PCIe, MIPI CSI-2 and other packet-based protocols and features low protocol overhead for high bandwidth and low and deterministic micro-second transport latency.
PCIe Over Auto/TSN
Auto/TSN implements a PCIe switch compliant with PCI-SIG Base Specification 3.0 (or newer) and NVM Express Specification 1.2 (or newer).
PCIe Inter-System-Bridge for Auto/TSN
Integrated PCIe Inter-System Bridges (a.k.a Non Transparent Bridges / NTB) enable CPU-to-CPU connectivity. The PCIe Inter-System Bridges use a least-cost write-only protocol to deliver very high read/write performance. This allows direct connectivity between sensors and multiple CPUs, GPUs, FPGA, SoCs, peripherals and next-generation storage within the entire vehicle.
MIPI CSI-2 Over Auto/TSN
Image sensors can connect via standard MIPI D-PHY and MIPI CSI-2, or else. Multicast functionality transports data from each image sensor to one, or more, central compute units under real-time conditions. Hence, Auto/TSN allows symmetric (e.g. PCIe-to-PCIe) and asymmetric (e.g. MIPI CSI-2-to-PCIe) communication schemes.
IEEE 1722 Video Transport Over Auto/TSN
Complementing the MIPI CSI-2 over Auto/TSN transport, MLE has also implemented a solution that follows the Raw Video PDU Format from IEEE 1722.Similarly, this IEEE 1722 Raw PDU Transport supports point-to-point connectivity or multicast where one sensor's image data can be sent to multiple CPUs simultaneously.
Availability
Auto/TSN is available as a licensable integrated subsystem stack comprising digital circuit implementations and device driver software. This business model gives OEMs and Tier1s full control over how to integrate, either as a dedicated semiconductor component, or as modular function blocks inside a custom System-on-Chip with additional customer-specified functionality.
MLE has been working with key semiconductor partners to deliver FPGA and ASIC based implementations of Auto/TSN ready for design and for production. Our early access program supports OEMs and Tier1s to perform in-house benchmarking and validation of Auto/TSN.
Current implementations support gateway nodes with PCIe 3.0 and NVMe 1.2 with up to 4 lanes and with 5 or 8 GT/s, MIPI D-PHY 2.0 with up to 4 lanes and 2 Gbps and MIPI CSI-2 2.0 and up to 8 1G/10G Ethernet ports over copper or over fiber. “Lab Cars” based on professional 3rd party ASIC hardware emulators are available upon request.
Documentation
- Architecture and Performance of Integrated High-speed and Versatile Embedded Networking (presented at Embedded World 2024)
- Zone-Based Automotive Backbones Tunneling PCIe (presented at PCI-SIG Developers Conference 2021)
- PCIe-over-TCP-over-TSN-over-10/25 GigE (presented at the FPGA-for-ADAS Workshop 2020)
- Sensor Fusion and Data-in-Motion Processing for Autonomous Vehicles (presented at PCI-SIG Developers Conference 2019)
- PCIe Range Extension via Robust, Long Reach Protocol Tunnels (presented at PCI-SIG Developers Conference 2018)
Fraunhofer Heinrich-Hertz Institute
Founded in 1949, the German Fraunhofer-Gesellschaft undertakes applied research of direct utility to private and public enterprise and of wide benefit to society. With a workforce of over 23,000, the Fraunhofer-Gesellschaft is Europe’s biggest organization for applied research, and currently operates a total of 67 institutes and research units. The organization’s core task is to carry out research of practical utility in close cooperation with its customers from industry and the public sector.
Fraunhofer HHI was founded in 1928 as “Heinrich-Hertz-Institut für Schwingungs- forschung“ and joined in 2003 the Fraunhofer-Gesellschaft as the “Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut„. Today it is the leading research institute for networking and telecommunications technology, “Driving the Gigabit Society” .
Fraunhofer Institute for Photonic Microsystems
Fraunhofer IPMS is a worldwide leader in research and development services for electronic and photonic microsystems in the fields of Smart Industrial Solutions, Medical & Health applications and Improved Quality of Life. Innovative products can be found in all large markets – such as ICT, consumer products, automobile technology, semiconductor technology, measurement and medical technology – products which are based upon various technology developed at Fraunhofer IPMS.
Security & Trust
OP-TEE Security & Trust
OP-TEE: Open-Source Secure Operating System
OP-TEE: Open Portable Trusted Execution Environment OP-TEE is a small secure operating system which, after authentication and decryption, gets loaded in an secured area in the memory. A Rich OS (e. g. Xilinx PetaLinunx) driver can request, via a Secure Monitor Call, the execution of a trusted application.
OP-TEE is an Open-Source initiative driven by a Linaro team who maintains the code and makes it available for download at GitHub.
MLE took the effort to port OP-TEE to Xilinx Zynq UltraScale+ MPSoC and RFSoC devices and included device specific optimizations. The outcome is two-fold: Tightly integrated Open-Source maintained by experts of Xilinx System-on-Chip and ACAP technology. And, additional professional services for customization and product life cycle support.
Key Features of OP-TEE in FPGAs
- Enables running secure & trusted applications from within a rich Linux operating system
- Utilizes standard ARM Trusted Execution Environment (TEE)
- Utilizes advanced security functions in Xilinx Zynq UltraScale+ MPSoC and RFSoC devices
- Optional hardware acceleration for AES-CGM, RSA, SHA3, etc
- Optional secure key handling with integrated PUF (Physically Unclonable Function) support
- Optional handling for integrated eFUSE burning
- Secure and non-secure bitstream loading
- Support for custom secure functions in Programmable Logic
Applications
- Secure data storage
- Secure communication
- Secure Over-the-Air (SOTA) updates
- Key to meet compliance with standards such as IEC 62443, IEC 27001 etc
- Protect Functional Safety (SIL, ASIL) related designs
- Secure touch inputs
- Secure key handling
Pricing
MLE OP-TEE is available as pure Open-Source or as a professionally maintained source code deliverable:
Product Name | Deliverables | Pricing |
---|---|---|
OP-TEE Open-Source Edition (OP-TEE Free) | Licensed under BSD / Linaro terms and available for download from GitHub. | Free of chargeDownload Now |
OP-TEE Professional Edition (OP-TEE PRO) | MLE Single-Site or Multi-Site Source Code License. Delivered by MLE in electronic form. | Annual subscription fees starting at $42,800.- |
Application / Project specific Expert Design Services | System-level design, modeling, implementation and test for realizing Domain-Specific Secure appplications. | $1,880.- per engineering day (or fixed price project fee) |
OP-TEE Free
(Open Source Edition)
The OP-TEE Open Source Edition for Zynq UltraScale+ MPSoC and RFSoC is licensed under Linaro / BSD license as Open Source and comes with all source code and necessary packages. This version is ideal to explore the TEE world and develop your own trusted application.
Key Features and Benefits:
- Open Source and Free of Charge
- Runs in external PS-attached DDR memory
- No Hardware acceleration for AES, RSA, SHA3
- No access to PUF
OP-TEE PRO
(Professional Edition)
MLE OP-TEE PRO can be licensed from MLE and will provide all source code and necessary packages to run OP-TEE on Zynq UltraScale+ MPSoC and RFSoC.
Key Features and Benefits:
- Hardware acceleration for AES
- Hardware acceleration for RSA
- Hardware acceleration for SHA3
- With secure boot: access to PUF (Physical Uncloneable Function) functionality
- Can load OP-TEE into TCM
Comparison Between OP-TEE Free And OP-TEE PRO
Functionality Supported
OP-TEE Free
OP-TEE PRO
(extended)
Documentation
- Security / Trusted Execution Environment and Functional Safety with Zync Ultrascale+ MPSoC / RFSoC (presentation at the 3rd Workshop “Programmable Processing for the Autonmous / Connected Vehicle 2019”)
- OP-TEE online documentation
- OP-TEE Security Advisories
- Developer Services for Security from Linaro
- Xilinx Security Website (including presentations by MLE, available under NDA only)
- Whitepaper 513 from Xilinx on IEC 62443 Compliant Product Enablement
Frequently Asked Questions
What is the difference between MLE OP-TEE PRO and the open source edition?
Most of the Zynq UltraScale+ MPSoC / RFSoC (ZynqMP) specific code for OP-TEE is currently going upstream to become part of the free open-source edition.
However, there are functions of ZynqMP which require special handling, like the one-time-programmable eFuses, or support for custom secure PL functions, for example. Such device and/or application specific security functions of ZynqMP will be covered only by the PRO edition. Please refer to the comparison table above.
Is source code for OP-TEE PRO available?
Yes, MLE ships source code for OP-TEE PRO. Those ZynqMP platform specific code portions are available today and have passed review by the Xilinx Security Center of Excellence (COE). The most recent review was November 2019.
What is the cost of OP-TEE Free?
OP-TEE Free is free-of-charge open-source software (FOSS) and can be downloaded from here: https://github.com/OP-TEE/ under a BSD 2-Clause License.
What is the cost of OP-TEE PRO?
Please refer to the Pricing section above.
Function Accelerators
Network Function Accelerators, FACs, NICs and SmartNICs
NICs, SmartNICs, and Function Accelerator Cards with Network Accelerator
A Network Interface Card (NIC) is a component that connects computers via networks, these days mostly via IEEE Ethernet – but what makes a NIC a SmartNIC? How can FPGA Network Accelerator make it operate more efficiently and enhance its performance to deliver deterministic networking?
With the push for Software-Defined Networking, (mostly open source) software running on standard server CPUs became a more flexible and cost-effective alternative to custom networking silicon and appliances. However, in the post Dennard scaling area, server CPU performance improvements cannot keep up with increasing computational demand of faster network port speeds.
This widening performance gap creates the need for so-called SmartNICs. SmartNIC not only implement Domain-Specific Architecture for network processing but also offload host CPUs from running portions of the network processing stack and, thereby, free up CPU cores to run the “real” application.
According to Gartner, Function Accelerator Cards (FACs) incorporate functions on the NIC that would have been done on dedicated network appliances. Hence, all FACs are essentially NICs, but not all NICs/SmartNICs are FACs. When deployed properly, FACs can increase bandwidth performance, can reduce transport latencies and can improve compute efficiency, which translates to less energy consumption.
Features of FPGA Network Accelerator
Ultra-Reliable, Low-Latency, Deterministic Networking
With ultra-reliable, low-latency, deterministic networking we have borrowed a concept from 5G wireless communication (5G URLLC) and have applied this to LAN (Local Area Network) and WAN (Wide Area Network) wired communication:
- Ultra-Reliable means no packets get lost in transport
- Low-Latency means that packets get processed by a FAC at a fraction of CPU processing times
- Deterministic means that there is an upper bound for transport and for processing latency
We do this by combining the TCP protocol, fully accelerated (in FPGA or ASIC using NPAP), with TSN (Time Sensitive Networking) optimized for stream processing at data rates of 10/25/50/100 Gbps. These so-called TCP-TSN-Cores, the FPGA network accelerator, not only give us precise time synchronization but also traffic shaping, traffic scheduling and stream reservation with priorities.
We believe that FPGAs are very well positioned as programmable compute engines for network processing because FPGAs can implement “stream processing” more efficiently than CPUs or GPUs can do. In particular, when the networking data stays local to the FPGA fabric Data-in-Motion processing can be done within 100s of clock cycles (which is 100s of nano-seconds) and can be sent back a few 100 clock cycles later, an aspect with is referred to as Full-Accelerated In-Network Compute.
While FPGA technology has been on the forefront of Moore’s Law and modern devices such as AMD/Xilinx Versal Prime or Intel Agilex or Achronix Speedster7t can hold millions of gates, FPGA processing resources must be used wisely, when Bill-of-Materials costs are important. Therefore, at MLE we have put together a unique combination of FPGA and open-source software to achieve best-in-class performance while addressing cost metrics more in-line with CPU-based SmartNICs.
Unique and Cost-Efficient Combination of Open Source
The Open Source Technologies We Borrow From
Meanwhile highly optimized for networking
An open source multi-layer network switch
An open source High-Level Synthesis engine
The GitHub project focusing on AMD/Xilinx Alveo cards
The High-Level Synthesis Frontend for Xilinx FPGAs
High-Level Synthesis plays a vital role in our implementation as it allows MLE and MLE customers to turn algorithms implemented in C/C++/SystemC into efficient FPGA logic which is portable between different FPGA vendors.
To build a high-performance FAC platform, portions of the above have been integrated together with proven 3rd party networking technologies:
- NPAP, the Network Protocol Accelerator Platform which is a TCP/UDP/IP Full Accelerator that comes from Fraunhofer HHI
- TSN, which is Time Sensitive Networking, a collection of IEEE Standards implemented by Fraunhofer IPMS
Corundum In-Network Compute + TCP Full Accelerator
Corundum is an open-source FPGA-based NIC which features a high-performance datapath between multiple 10/25/50/100 Gigabit Ethernet ports and the PCIe link to the host CPU. Corundum has several unique architectural features: For example, transmit, receive, completion, and event queue states are stored efficiently in block RAM or ultra RAM, enabling support for thousands of individually-controllable queues.
MLE is a contributor to the Corundum project. Please visit our Developer Zone for services and downloads for Corundum full system stacks pre-built for various in-house and off-the-shelf FPGA boards.
MLE combines the Corundum NIC with NPAP, the TCP/UDP/IP Full Accelerator from Fraunhofer HHI, via a so-called TCP Bypass which minimizes processing latency of network packets: Each packet gets processed in parallel by the Corundum NIC and by NPAP. The moment it can be determined that the packet shall be handled by NPAP (based on IP address and port number) this packet gets invalidated inside the Corundum NIC. If a packet shall not be processed by NPAP, it get’s dropped in NPAP and will solely be processed by the Corundum NIC.
Fundamentally, this implements network protocol processing in multiple stages: Network data which is latency sensitive does get processed using full acceleration, while all other network traffic is handled either by a companion CPU and/or by the host CPU.
Applications of FPGA Network Accelerator
MLE’s Network Accelerators are of particular value where network bandwidth and latency constraints are key:
- Wired and Wireless Networking
- Acceleration of Software-Defined Wide Area Networks (SD-WAN)
- Video Conferencing
- Online Gaming
- Industrial Internet-of-Things (IIoT)
- Handling of Application Oriented Network Services
- Mobile 5G User-Plane Function Acceleration
- Mobile 5G URLLC Core Network Processing with TSN
- Offloading OpenvSwitch (OvS), vRouter, etc
Key Benefits
The following shows the key benefits of MLE’s technology by comparing open-source SD-WAN switching in native CPU software mode against MLE’s FPGA Network Accelerator:
Compared with plain CPU software processing MLE’s Ultra-Reliable Low-Latency Deterministic Networking increases network bandwidth and throughput close to Ethernet line rates, in particular for smaller packets, which reduces the need for over-provisioning within the backbone. And, processing latencies can be shortened significantly which is important, for example, when delivering a lively audio/video conferencing experience over WAN.
Availability
MLE’s FPGA Network Accelerator is available as a licensable full system stack and delivered as an integrated hardware/firmware/software solution. In close collaboration with partners in the FPGA ecosystem, MLE has ported and tested variations of the stack on a growing list of FPGA cards. Currently, this list comprises high-performance 3rd party hardware as well as MLE-designed cost-optimized hardware:
FPGA Card | Hardware Description & Features | Status |
NPAC-Ketch, MLE-designed single-slot FHHL PCIe card
| Available Inquire | |
Alveo U280, AMD/Xilinx-designed dual-slot FHFL PCIe card
| Early Access | |
N6000-PL, Intel-designed single-slot FHHL PCIe card
| Early Access |
Documentation
- Ultra-Reliable, Low-Latency, Deterministic Networking
- Function Accelerator Card – NPAC-40G
Solutions
Solutions
FPGA-Based Accelerator and Security Solutions
To meet the growing demands of modern systems for higher data throughputs, lower processing latencies and heightened security, MLE leverages FPGAs to accelerate software-rich system stacks and protocols. Our solutions typically ship as FPGA Full System Stacks comprising software (based on Linux), hardware (based on vetted 3rd party FPGA boards), and FPGA design projects.
Automotive
Modern Zone-based automotive architectures require high data rates and real-time behavior at the same time. MLE solutions include stacks and subsystems for in-vehicle networking, and prototyping systems for efficient development and testing.
Industrial & Robotics
Generative AI and high data rate machine vision push bandwidth demands for real-time industrial networks beyond 10 Gbps. MLE solution include SmartNICs and access points for low-latency, low-jitter real-time networking, and for virtualizing PLCs.
MLE provides FPGA Function Accelerator Cards and SmartNICs for wired and wireless communications. A unique combination of technology from Fraunhofer HHI plus MLE’s patented and patent-pending Full Accelerators accelerates Software-Defined Networking (SDN) and SD-WAN.
Storage Solutions
Next-generation storage protocols such as NVMExpress (NVMe) provide significant performance benefits and, when combined with FPGAs, can be used as storage acceleration IP cores for Computational Storage, Data-in-Motion processing and high-speed data capture and recording.
Test & Measurement
MLE has been working with 3rd party system companies to deliver reliable, customizable turnkey solutions for high-speed data acquisition, recording and replay, as they often are needed in Test & Measurement systems for automotive, aerospace, defense, industrial, etc.
Security and Trust
MLE provides solutions to secure products based on FPGA technology: MLE has ported OP-TEE to AMD/Xilinx Zynq UltraScale+ MPSoC and RFSoC including support for black keys, PUFs, eFUSES, etc. Furthermore, MLE has introduced networking security products for deep packet inspection and Data Diodes.