Latest update: 2021-07-14
This guide illustrates how to create and integrate an accelerator designed in Verilog, SystemVerilog or VHDL with ESP.
Note: This RTL accelerator design flow is a preliminary version. Like other accelerator design flow in ESP, it includes the automated integration of the accelerator and the generation of Linux device driver and skeletons of the bare-metal and Linux test applications. However, it generates an empty top module of the accelerator, and the job of implementing the accelerator is left to the designer.
Note: Make sure to complete the prequisite tutorials before getting started with this one. This tutorial assumes that accelerator designers are familiar with the ESP infrastructure and know how to run basic Make targets to create a simple instance of ESP, integrating just a single core.
- 1. Accelerator design
- 2. Accelerator integration
1. Accelerator design
ESP provides an interactive script that generates a skeleton of the accelerator and its software test applications. It also generates the accelerator device driver. In the preliminary version of this flow, the accelerator skeleton simply consists in an empty Verilog top level of the accelerator, which has the correct interface to allow for an automated integration in ESP. You can modify the RTL language used for the skeleton as long as the interface and functionality remain the same, at the moment ESP support Verilog, SystemVerilog and VHDL.
Even if the accelerator skeleton is empty, this flow leverages the same interactive script used by other accelerator design flows in ESP.
# Move to the ESP root folder cd <esp> # Run the accelerator initialization script and respond as follows ./tools/accgen/accgen.sh === Initializing ESP accelerator template === * Enter accelerator name [dummy]: example * Select design flow (Stratus HLS, Vivado HLS, hls4ml, RTL) [S]: R * Enter ESP path [/home/davide/Repos/esp/esp-rtlflow]: * Enter unique accelerator id as three hex digits [04A]: 075 * Enter accelerator registers - register 0 name [size]: reg1 - register 0 default value : 8 - register 0 max value : 8 - register 1 name : reg2 - register 1 default value : 8 - register 1 max value : 8 - register 2 name : reg3 - register 2 default value : 8 - register 2 max value : 8 - register 3 name : * Configure PLM size and create skeleton for load and store: - Enter data bit-width (8, 16, 32, 64) : - Enter input data size in terms of configuration registers (e.g. 2 * reg2}) [reg2]: data_in_size_max = 8 - Enter output data size in terms of configuration registers (e.g. 2 * reg2) [reg2]: data_out_size_max = 8 - Enter an integer chunking factor (use 1 if you want PLM size equal to data size) : Input PLM has 8 32-bits words Output PLM has 8 32-bits words - Enter number of input data to be processed in batch (can be function of configuration registers) : batching_factor_max = 1 - Is output stored in place? [N]: === Generated accelerator skeleton for example ===
The detailed description of the entries of this configuration script is in the guide for the SystemC accelerator flow with Stratus HLS. In this case, however, the generated accelerator is empty. Thus, the default and max values of the configuration registers, as well as the questions that follow, are only used for creating the skeleton of the test applications. The names of the registers, instead, are used in various places, including the interface of the empty accelerator and the accelerator XML file used by ESP to generate the accelerator tile socket.
Executing the initialization script with the above parameters,
generates the accelerator empty skeleton, located at the path
In addition, the accelerator’s device driver, bare metal application
and user-space linux application are generated at the path
# Complete list of generated files <esp>/accelerators/rtl/example_rtl/ ├── hw │ ├── example.xml # Accelerator description and register list │ ├── hls │ │ └── Makefile -> ../../../common/hls/Makefile │ └── src │ ├── example_rtl_basic_dma32 │ │ └── example_rtl_basic_dma32.v # Empty top level of the accelerator (32bit SoC) │ └── example_rtl_basic_dma64 │ └── example_rtl_basic_dma64.v # Empty top level of the accelerator (64bit SoC) └── sw ├── baremetal # Bare metal test application │ ├── example.c │ └── Makefile └── linux ├── app # Linux test application │ ├── cfg.h │ ├── example.c │ └── Makefile ├── driver # Linux device driver │ ├── example_rtl.c │ ├── Kbuild │ └── Makefile └── include └── example_rtl.h
Accelerator behavior implementation
For this preliminary RTL accelerator flow, this step consists in fully
implementing the accelerator, starting from the empty skeleton
generated at the path
The interface of the empty accelerator must not be modified,
because it is the interface that ESP expects and it allows for an
automated integration. In addition to the interface definition, the
empty accelerator has a few
assign statements. These can be removed
as they are only there to assign a constant value to some of the
outputs, and to raise the
acc_done signal as soon as the accelerator
receives the configuration (
allow you to run the bare-metal and Linux test applications to
completion even with the empty skeleton, where otherwise they would
get stuck because of undefined outputs and the
acc_done signal never
To design the accelerator body, you should comply to the ESP accelerator interface specifications. The following guide describes it in detail:
The current implementation of the flow does not generate a testbench to test the accelerator in isolation. The remainder of this guide shows how to test the accelerator as part of a full SoC.
Choose the FPGA board or ASIC technology that you want to target for your SoC. The design paths in this tutorial refer to the Xilinx VC707 evaluation FPGA board, but all instructions are valid for any of the supported boards or ASIC technologies.
After creating the
example_rtl accelerator, ESP automatically
discovers it in the library of components and generates a set of
make targets for it. Here are the instructions to install the
# Move to the Xilinx VC707 working folder cd <esp>/socs/xilinx-vc707-xc7vx485t # Install (i.e. copy) the accelerator RTL to the `tech/virtex7/acc` folder make example_rtl-hls
Only after this step you will be able to instantiate the accelerator in an SoC with the ESP SoC configuration GUI.
2. Accelerator integration
It is recommended to try the following steps before editing the
accelerator and software automatically generated in the previous
steps. Since this flow generated an empty accelerator, the full-system
RTL simulation and the FPGA prototyping tests will not perform
meaningful computation. However, the accelerator will still receive
the configuration and raise the
acc_done signal to notify the
CPU. Testing this can confirm the correct integration of the
accelerator, and it’s a good baseline on top of which to start the
Bare-metal and user applications implementation
In this tutorial we select the RISC-V Ariane core and use the corresponding paths to the software source code. Please note, however, that all instructions are valid for the other CPUs available in ESP (e.g. Leon3, Ibex).
Both baremetal and Linux test applications for the
accelerator are generated at the path
<esp>/accelerators/rtl/example_rtl/sw. To complete them, you need
to apply the same edits to both baremetal and Linux applications. The
changes consist in initializing inputs and golden outputs. More
details on this step are described in the guide for the SystemC
accelerator flow with Stratus
The final steps of the tutorial coincide with those presented in the tutorial about the SystemC accelerator flow with Stratus HLS. We recommend you review those steps if you are not familiar with ESP. More in general, the SoC design flow is the same regardless of which design flow was used for generating or integrating an accelerator.
# Move to the Xilinx VC707 working folder cd <esp>/socs/xilinx-vc707-xc7vx485t
Follow the “Debug link configuration” instructions from the “How to: design a single-core SoC” guide. Then configure the SoC using the ESP configuration GUI.
# Run the ESP configuration GUI make esp-xconfig
Select the processor that you prefer in the “CPU Architecture” frame and enable/disable the caches from the “Cache configuration” frame as you please. In the case of this guide we use Ariane and no caches. Select a 2x2 layout and set 1 memory tile, 1 processor tile, 1 I/O tile and 1
EXAMPLE_RTL tile. The implementation for
EXAMPLE_RTL will default to basic_dma64, because Ariane-based ESP
SoCs are 64-bit systems.
Users can run a full-system RTL simulation of the
EXAMPLE_RTL accelerator driven by the
baremetal application running on the processor tile and invoking the accelerator.
# Compile baremetal application make example_rtl-baremetal # Modelsim TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make sim[-gui] # Incisive TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make ncsim[-gui]
<cpu> corresponds to
ariane because we selected the Ariane core in the “SoC Configuration” step.
Follow the “FPGA prototyping” instructions from the “How to: design a single-core SoC” guide.
The only difference is that, just like for the RTL simulation, you
need to specify the
TEST_PROGRAM variable when launching the
bare-metal test on FPGA:
TEST_PROGRAM=./soft-build/<cpu>/baremetal/example_rtl.exe make fpga-run
For what concerns the execution of the Linux application, after logging into Linux from the ESP Linux terminal run the
example_rtl test application:
cd /applications/test/ ./example_rtl.exe