Guide – How to: integrate ASIC technology into ESP
Latest update: 2024-07-07
This guide illustrates how to setup your ASIC project and how to integrate ASIC technology into ESP. At the moment, ESP memory integration supports only accelerators designed with Stratus HLS (How to: design an accelerator in SystemC (Cadence Stratus HLS)) and SoCs that utilize the RISC-V Ariane core.
- 1. Project setup
- 2. Memory integration
- 3. Pad integration
- 4. SoC generation
- 5. Clocking strategy and DCO integration
Note: Make sure to complete the prequisite tutorials before getting started with this one. This tutorial assumes that designers are familiar with the ESP infrastructure and know how to run basic make targets to create a simple instance of ESP, integrating just a single core.
0. Prerequisites
ASIC technology integration requires at least the Verilog and the Liberty models/views for high level synthesis (HLS) and RTL simulation. In particular, this tutorial uses Skywater 130nm as an example technology. This requires the Skywater PDK and OpenRAM. For installation, please refer to the respective documentation of these projects.
1. Project setup
The ESP FPGA flow uses the FPGA-specific design folders under <esp>/socs
as a
working directory for the SoC configuration, simulation, and FPGA prototyping.
The ESP ASIC flow, instead, creates a directory structure outside the ESP
repository. The directory structure has two main folders in addition to the ESP
repository: 1) a technology folder and 2) a project folder. The technology
folder has four directories: lib
for memories, standard cells and gpio
liberty files; verilog
for memories, standard cells and gpio verilog models;
mem_wrappers
for the memory wrappers; and pad_wrappers
for the IO wrappers.
The project folder is the working directory for SoC configuration, RTL
simulation, and all downstream steps of the ASIC flow.
To create the ASIC directory structure, execute the following steps:
# Create the main folder for your project and move the folder
mkdir project_x
cd project_x
# Clone ESP repository
git clone git@github.com:sld-columbia/esp.git
# Interactively install submodules
cd esp
./utils/scripts/submodule_init.sh
# Move to asicgen folder
cd esp/tools/asicgen
# Edit the file asic_defs.sh with your project version and technology used
vim asic_defs.sh
DIRTECH_NAME="sky130"
PROJECT_NAME="projec_x_v1.0"
# Execute asic_dirstruct.sh script
./asic_dirstruct.sh
Once you have executed the steps above, the project’s directory structure will be as follows:
project_x
├── project_x_v1.0
│ ├── Makefile
│ ├── ESP_ASIC_TOP.vhd # ESP top level RTL
│ ├── chip_emu_top.vhd # ESP top level wrapper to enable FGPA emulation
│ ├── esp_defconfig # ESP SoC default configuration
│ ├── fpga_proxy_top.vhd # FPGA proxy
│ ├── grlib_config.in
│ ├── grlib_defconfig
│ ├── pads_loc.vhd # default pad types (horizontal = 1 vertical = 0)
│ ├── vsim.tcl # Modelsim script for simulation using DCO
│ ├── systest.c # main system simulation output
│ ├── testbench.vhd # Top level testbench
│ └── top.vhd # Top level composed by chip_emu_top and fpga_proxy_top
├── sky130
│ ├── lib # Technology folder for liberty files
│ ├── verilog # Technology folder for memory verilog models
│ ├── dco_wrappers # DCO verilog wrappers folder
│ ├── mem_wrappers # Memory verilog wrappers folder
│ │ └─ tb # Memory verilog wrappers testbench folder
│ └── pad_wrappers # IO verilog wrappers folder
└── esp
From now on, project_x_v1.0 is the working directory folder and the esp repository will be used as is.
Note: Do not create the directories manually, only use the provided scripts to do so!
2. Memory integration
ESP memory requirements
Although we work hard to make ESP as flexible as possible, the ESP architecture has a few memory requirements listed below for each tile.
IO/AUX tile:
Two dual port (1RW1R) memories are needed for the Ethernet MAC, when Ethernet is enabled in the SoC. The first has a depth of 4096 words and 16-bit words (4096x16). The second has a depth of 256 words and 32-bit words (256x32). Alternatively, these memories can be implementented as register files.
One single port memory with a depth of 2048 words and 8-bit words(2048x8).
CPU tile:
One single port memory with a depth of 256 words and 64-bit words (256x64)
for the Ariane L1 cache.
One single port memory with depth of user’s choice and 64-bit words for L2
cache lines.
One single port memory with depth of user’s choice and 24-bit words for L2
cache line metadata. This memory must support a bit-granularity write mask.
MEM tile:
One single port memory with depth of user’s choice and 64-bit words for LLC
cache lines.
One single port memory with depth of user’s choice and 28-bit words for the
LLC directory. This memory must support a bit granulariy write mask.
One single port memory with depth of user’s choice and 16-bit words for LLC
sharer’s lists.
Note: The chosen depth for L2 and LLC caches must match. For instance, if a depth of 256 is chosen for the L2 cache lines, then the same address size must be used for the L2 cache line metadata and all LLC memories.
SLM tile:
One single port memory with depth of user’s choice and 64-bit words.
Accelerator tile:
For the Stratus HLS design flow, ESP uses a custom tool (plmgen) to generate
the private local memory banks for the accelerator. plmgen can use both
single (1RW) and dual (2RW) port memories of any size. The tool will identify
all memories available to automatically generate memory banks with the
objective of minimizing area. Depending on the characteristics of the given accelerator, some memory sizes will give a better performance and area. We suggest to have different memory
sizes available for both types, i.e. single-port and dual-port. For instance: 256x8,
256x16, 256x32, 256x64, 512x32, 512x64, 1024x32, 1024x64.
Technology setup
Once you have generated all required memory macros for your target technology,
you should copy/link the models to the verilog
folder in your project. In
this guide we use soft links to avoid having multiple copies of the technology
files.
Link the memory verilog models inside project_x/sky130/verilog
:
project_x/sky130/verilog > ln -sf /tech/sky130/memories/sky130_sram_dp_256x32/sky130_sram_dp_256x32.v
project_x/sky130/verilog > ln -sf /tech/sky130/memories/sky130_sram_dp_4096x16/sky130_sram_dp_4096x16.v
project_x/sky130/verilog > ln -sf /tech/sky130/memories/sky130_sram_dp_512x32/sky130_sram_dp_512x32.v
project_x/sky130/verilog > ln -sf /tech/sky130/memories/sky130_sram_sp_2048x8/sky130_sram_sp_2048x8.v
project_x/sky130/verilog > ln -sf /tech/sky130/memories/sky130_sram_sp_256x64/sky130_sram_sp_256x64.v
project_x/sky130/verilog > ln -sf /tech/sky130/memories/sky130_sram_sp_512x16/sky130_sram_sp_512x16.v
project_x/sky130/verilog > ln -sf /tech/sky130/memories/sky130_sram_sp_512x64/sky130_sram_sp_512x64.v
project_x/sky130/verilog > ln -sf /tech/sky130/libraries/sky130_fd_sc_hd/latest/verilog/sky130_fd_sc_hd.v
project_x/sky130/verilog > ln -sf /tech/sky130/libraries/sky130_fd_io/latest/verilog/sky130_fd_io__top_gpiov2.behavioral.v
Link the memory liberty models and the standard cell liberty model inside project_x/sky130/lib
:
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_dp_256x32/sky130_sram_dp_256x32_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_dp_4096x16/sky130_sram_dp_4096x16_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_dp_512x32_1r1w/sky130_sram_sp_512x32_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_sp_2048x8/sky130_sram_sp_2048x8_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_sp_256x64/sky130_sram_sp_256x64_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_sp_512x16/sky130_sram_sp_512x16_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_sp_512x24/sky130_sram_sp_512x24_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_sp_512x28/sky130_sram_sp_512x28_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/memories/sky130_sram_sp_512x64/sky130_sram_sp_512x64_TT_1p8V_25C.lib
project_x/sky130/lib > ln -sf /tech/sky130/libraries/sky130_fd_sc_hd/latest/timing/sky130_fd_sc_hd__tt_025C_1v80.lib
project_x/sky130/lib > ln -sf /tech/sky130/libraries/sky130_fd_io/latest/timing/sky130_fd_io__top_gpiov2_tt_tt_025C_1v80_3v30.lib
Note 1: In this guide, we use a dual-port memory with a word length of 32 bits to replace the single-port memory with a word length of 28 bits for illustration purpose only.
Note 2: In this guide, we use single corner liberty for simplicity. We highly recommend, however, to link multiple liberty corners to enable multi-mode multi-corner timing analysis.
In the next section, we will generate the memory wrapper skeletons and edit them to comply with the interfaces expected by ESP and used by the memory macros generated by OpenRAM.
ASIC memory list file
The memory wrapper skeletons are automatically generated based on the
asic_memlist.txt
specification file. The user needs to create this file in
the project working directory (project_x/project_x_v1.0
) with the following
syntax:
<type> <macro_name> <addr_size> <word_size> <area> <port_type>
Where:
<type> is the component type (llc, l2, l1, slm, io, acc)
<macro_name> is the memory model's module name
<addr_size> is the memory depth
<word_size> is the memory word size
<area> is the memory's area
<port_type> is the memory's number of ports (1RW, 2RW, 1RW1R)
The following is an example of the asic_memlist.txt
for the sky130
memories.
llc sky130_sram_sp_512x64 512 64 283536 1rw
llc sky130_sram_dp_512x32 512 28 294077 2rw
llc sky130_sram_sp_512x16 512 16 110981 1rw
l2 sky130_sram_sp_512x64 512 64 283536 1rw
l2 sky130_sram_sp_512x24 512 24 128492 1rw
l1 sky130_sram_sp_256x64 256 64 175672 1rw
slm sky130_sram_sp_512x64 512 64 283536 1rw
io sky130_sram_dp_4096x16 4096 16 913452 1rw1r
io sky130_sram_dp_256x32 256 32 197460 1rw1r
io sky130_sram_sp_2048x8 2048 8 267554 1rw
acc sky130_sram_dp_256x32 256 32 197460 2rw
acc sky130_sram_dp_512x32 512 32 294077 2rw
acc sky130_sram_sp_512x64 512 64 283536 1rw
Note: Even though a dual-port, 32-bit word memory is being used for the L2 cache line metadata, the specified size is still 28 bits.
Once this file is ready, execute the following command:
project_x/project_x_v1.0 > make mem_wrapper
A file called cache_def_mem_asic.v
is generated in the working directory in
addition to the memory wrapper skeletons in project_x/sky130/mem_wrappers
.
In the next section we will edit the wrappers connecting the memory macro ports to the memory wrapper ports.
Memory wrappers editing
Move to the directory project_x/sky130/mem_wrappers
. Under this directory you
will see the following files:
LLC_SRAM_SP_LINE_512x64.v
LLC_SRAM_SP_MIXED_512x28.v
LLC_SRAM_SP_SHARED_512x16.v
L2_SRAM_SP_LINE_512x64.v
L2_SRAM_SP_MIXED_512x24.v
L1_SRAM_SP.v
SLM_SRAM_SP_512x64.v
IO_DP_4096x16.v
IO_DP_256x32.v
IO_SP_2048x8.v
ACC_SRAM_DP_256x32.v
ACC_SRAM_DP_512x32.v
ACC_SRAM_SP_512x64.v
llc_lib.txt
l2_lib.txt
l1_lib.txt
slm_lib.txt
io_lib.txt
acc_lib.txt
The library text files (<mem>_lib.txt
) give the user the control of which
memories to use according to the memory type. For instance, if there are
several memories with different sizes available for accelerators but the user
wants a specific memory for a particular accelerator, they only need to comment
out other memory types in acc_lib.txt. This will force plmgen to use only the
uncommented memories for that accelerator. By default, plmgen selects the
memory that gives the smaller area for each tile type.
Each memory wrapper skeleton, shown below for the ACC_SRAM_DP_256x32, has the module name, the inputs and outputs expected by ESP, the memory macro instantiation and commented glue logic the user can adapt to match the technology-dependent memory macro protocols.
ESP expects single-cycle memories compliant with the following protocol for single-port and dual-port memories:
1RW Port
Port name | Diretion | Active | Description |
---|---|---|---|
CLK0 | Input | N/A | Clock port 0 |
CE0 | Input | High | Chip enable port 0 |
A0 | Input | N/A | Address port 0 |
D0 | Input | N/A | Input data port 0 |
WE0 | Input | High | Write enable port 0 |
WEM0 | Input | High | Write mask enable port 0 |
Q0 | Output | N/A | Output data port 0 |
1RW1R Port
Port name | Direction | Active | Description |
---|---|---|---|
CLK0 | Input | N/A | Clock port 0 |
CE0 | Input | High | Chip enable port 0 |
A0 | Input | N/A | Address port 0 |
Q0 | Output | N/A | Output data port 0 |
CLK1 | Input | N/A | Clock port 1 |
CE1 | Input | High | Chip enable port 1 |
A1 | Input | N/A | Address port 1 |
D1 | Input | N/A | Input data port 1 |
WE1 | Input | High | Write enable port 1 |
WEM1 | Input | High | Write mask enable port 1 |
1RW1R Port
Port name | Direction | Active | Description |
---|---|---|---|
CLK0 | Input | N/A | Clock port 0 |
CE0 | Input | High | Chip enable port 0 |
A0 | Input | N/A | Address port 0 |
D0 | Input | N/A | Input data port 0 |
WE0 | Input | High | Write enable port 0 |
WEM0 | Input | High | Write mask enable port 0 |
Q0 | Output | N/A | Output data port 0 |
CLK1 | Input | N/A | Clock port 1 |
CE1 | Input | High | Chip enable port 1 |
A1 | Input | N/A | Address port 1 |
D1 | Input | N/A | Input data port 1 |
WE1 | Input | High | Write enable port 1 |
WEM1 | Input | High | Write mask enable port 1 |
Q1 | Output | N/A | Output data port 1 |
Below is the final edited wrapper for the memory generated by OpenRAM targeting Skywater 130 nm technology.
The user has the freedom to change the body of all memory wrappers to integrate the generated memory macros. This can include instantiating multiple memory macros inside a single wrapper to create the expected width and depth. The module name and input and output ports, however, must not be changed.
For example, when we use a 32-bit word macro for the LLC directory, which only requires 28-bit words, we set the 4 most-significant bits of the write data to zero and discard the 4 most-significant bits of read data.
Once the memory wrappers are completed they can be tested individually using
the testbenches automatically generated inside mem_wrappers/tb
. These tests
ensure that the memory wrappers are compliant with what ESP expects and can prevent
errors in full-system tests, which are lengthy to debug. The test succeeds when
it prints that zero errors occurred.
After all wrappers are successfully tested, the memory integration is completed. The next section will detail a similar process for the technology IO cells integration.
3. Pad integration
For each technology, there are different IO cells according to functionality and vendor. Therefore, supporting all interfaces that can exist for each cell type, vendor and technology is a challenging task. ESP supports the basic digital IO cell interfaces for input, output and bi-directional ports. ESP provides an additional 3-bit signal (CFG) for output ports that can be changed through a control and status register (CSR). This interface was added to configure slew rate and drive strength, but the user has freedom to use for other purposes.
The following sections details the IO cell integration process starting from the generation of pad wrapper skeletons and finishing with the population of these wrappers.
ASIC pad list files
The pad wrapper skeletons are automatically generated based on the
asic_padlist.txt
specification file. The user needs to create this file in
the project working directory (project_x/project_x_v1.0
) according to the
following syntax:
<direction> <macro_name> <location>
Where:
<direction> is the pad type (in, out, io)
<macro_name> is the pad model module name
<location> is the location of the pad (H for horizontal, V for vertical)
The location follows the illustration:
_____________
|\ /|
| \ V(0) / |
| \ / |
| \ / |
| \ / |
| \ / |
| H(1) \ H(1) |
| / \ |
| / \ |
| / \ |
| / \ |
| / V(0) \ |
|/___________\|
The following is an example of the asic_padlist.txt
using the sky130 gpiov2
pads.
.
in sky130_fd_io__top_gpiov2 H
in sky130_fd_io__top_gpiov2 V
out sky130_fd_io__top_gpiov2 H
out sky130_fd_io__top_gpiov2 V
io sky130_fd_io__top_gpiov2 H
io sky130_fd_io__top_gpiov2 V
Note: All direction (in/out/io) and both types (H/V), are required. Since
sky130 gpiov2
has only one inout model that can be configured in the three directions, we specified the same macro module in all six models.
Once the file is created, the user can generate the pad wrapper skeletons by running the following target:
project_x/project_x_v1.0 > make pad_wrapper
We provide a default VHDL file, pads_loc.vhd, that configures whether a H or V
pad is used for each IO signal. The default file accounts for one Memory tile
instantiation only. If the user would like to use more than one Memory tile or
would like to change the pad configuration (H or V), a customized file has to
be generated.
We provide a make target that generates the VHDL file based on a text file (pads_loc.txt) with the following format:
<signal> <index> <location>
where:
<signal> is the design port
<index> is the index of the port if it is an array
<location> is the location of the pad (0 for V and 1 for H)
Below is an example of the pads_loc.txt.
clk_div_acc0 0 0
clk_div_acc1 0 0
clk_div_cpu 0 0
dummy 0 0
erx_clk 0 0
erxd 0 0
erxd 1 0
erxd 2 0
erxd 3 0
erx_dv 0 0
etx_clk 0 0
ext_clk_acc0 0 0
ext_clk_acc1 0 0
ext_clk_cpu 0 0
fpga_clk_in 2 0
fpga_clk_in 3 0
...
To generate the customized pads_loc.vhd
, run the target:
project_x/project_x_v1.0 > make pad_location
In the next section, we will edit the pad wrapper skeletons to be compliant with the ESP IO protocol.
Pad wrappers editing
Move to the directory project_x/sky130/pad_wrappers
. Under this directory you will see the following files:
INPAD_H.v
INPAD_V.v
IOPAD_H.v
IOPAD_V.v
OUTPAD_V.v
OUTPAD_H.v
Each pad wrapper skeleton, shown below for the IOPAD_H.v
, has the module
name, the inputs and outputs expected by ESP, and the pad macro instantiation.
ESP expects the pads to be compliant with the following protocol for input, output and bi-directional pads:
Input pads
Port name | Diretion | Active | Description |
---|---|---|---|
PAD | Inout | N/A | Pad inout port |
Y | Output | N/A | Chip input pin |
Output pads
Port name | Diretion | Active | Description |
---|---|---|---|
PAD | Inout | N/A | Pad inout port |
A | Input | N/A | Chip output pin |
CFG | Input | High | Pad configuration (3 bits) |
Bidirectional pads
Port name | Diretion | Active | Description |
---|---|---|---|
PAD | Inout | N/A | Pad inout port |
Y | Output | N/A | Chip input pin |
A | Input | N/A | Chip output pin |
OE | Input | High | Pad output enable pin |
CFG | Input | High | Pad configuration (3 bits) |
The CFG
is controlable by the configuration status registers (CSRs) and can be used to configure the pad drive
strength, slew rate, or any other option available in the pad macro. Its
default value (on reset) is 011
.
Below we illustrate the final edited pad wrappers. Since the pads of location H and V have the same interface, we are illustrating only one of them.
Input pad
Output pad
Bi-directional pad
In the example above, some pad inputs are in a different power domain than the
core (VDDC and VDDHIB). HLD_H_N
, ENABLE_H
, and ENABLE_INP_H
are in the
VDDIO power domain. ENABLE_VSWITCH_H
is in the VSWITCH power domain. We tied
them to high or low for correct simulation of the Verilog models. This is done
for simulation purposes only, while for logic synthesis and other downstream
physical design steps, these ports should be connected to the appropriate power
domain.
The user has the freedom to change the body of all pad wrappers to integrate the pad macros. The module name and input and output ports, however, must not be changed.
In these examples, we only used one bit of CFG
to control the output slew
rate. The usage of all 3 bits will depend on the configurability of the pad and
the user’s design choices.
4. SoC generation
Once all technology models (Liberty and Verilog) are in the correct folders, and the memory and pad wrappers are completed, the ESP SoC can be generated. In this tutorial, we are going to generate a SoC with the following tiles: IO, CPU (Ariane), MEM, SLM, and Acc (FFT).
The first step is to run HLS on the FFT accelerator design to synthesis the
RTL implementation based on the technology standard cell and memory timing models linked in
project_x/sky130/lib
. Then, we will generate the ESP SoC as usual. Finally we
will simulate all tiles in our SoC.
HLS using ESP integrated FFT accelerator
To run HLS, execute the following command under project_x/project_x_v1.0
:
project_x/projec_x_v1.0 > make fft_stratus-hls
ESP will run HLS for the already-integrated FFT accelerator and will use the
provided memory wrappers to generate the Private Local Memory (PLM) specified
under project_x/esp/accelerators/stratus/fft_stratus/hw/memlist.txt
.
The outputs of HLS are saved in project_x/esp/tech/sky130/acc
and
project_x/esp/tech/sky130/memgen
.
SoC configuration
The next step is to configure the SoC. Run the following command under project_x/project_x_v1.0
to call the ESP graphical interface:
project_x/project_x_v1.0 > make esp-xconfig
Our target SoC contains one IO tile, one MEM tile, one CPU (Ariane) tile, one
SLM tile and one Accelerator (FFT) tile according to the image below.
Here we set the size of the shared local memory tile to 64KB. The user can
select from a range of sizes in the graphical interface and ESP will generate
memory banks according to the SLM memory wrappers available in
project_x/sky130/mem_wrappers/slm_lib.txt
.
Once the SoC is configured, click generate and close the window.
To generate the SLM banks with the size specified in the configuration, run the following step:
project_x/project_x_v1.0 > make mem_slmgen
The user can find the generated bank under project_x/esp/tech/sky130/memgen/slm_gen
.
ESP is now ready for RTL simulation.
SoC RTL simulation: main system
The first simulation will confirm that the IO tile, the CPU tile and the MEM tile are working properly. This simulation loads the bootloader into the IO tile. Then, the IO tile starts the CPU tile which reads the bootram and executes read (instruction/data) and write (data) requests to the MEM tile. The MEM tile accesses the external memory according to the CPU tile requests. If the simulation succeeds, a “Hello from ESP” message appears in the prompt.
Run the following command to execute the Modelsim simulation and confirm that all memories were properly integrated.
project_x/project_x_v1.0 > make sim
Once all files have been compiled and loaded, execute the simulation in Modelsim:
VSIM 1> run -all
SoC RTL simulation: accelerator
The second simulation tests the accelerator’s functionality in the SoC. This simulation invokes the accelerator baremetal program which accesses the MEM tile to read and write data. Three accelerator coherence modes are tested: Non-coherent DMA, LLC-coherent DMA, and Coherent DMA.
Run the following commands to simulate the accelerator:
project_x/project_x_v1.0 > make fft_stratus-baremetal
project_x/project_x_v1.0 > TEST_PROGRAM=soft-build/ariane/baremetal/fft_stratus.exe make sim
VSIM 1> run -all
The prompt will state “PASS” for the three coherence modes.
SoC RTL simulation: shared local memory
Lastly, if your SoC configuration has the SLM tile, run the following command to confirm that the integrated memories can be read and written properly:
project_x/project_x_v1.0 > make slm_rw_test-baremetal
project_x/project_x_v1.0 > TEST_PROGRAM=soft-build/ariane/baremetal/slm_rw_test.exe make sim
VSIM 1> run -all
The prompt message “Completed with 0 errors” confirms that the test succeeded.
5. Clocking strategy and DCO integration
There are several ways to provide the clock for the SoC. The simplest one is having a single clock for the whole system driven by an external (off-chip) oscillator. There are other factors in play, however, that can require more complex clocking schemes. For instance, SoCs that must operate at higher frequencies (GHz) need to generate the clock internally. In addition, large SoCs can have blocks operating at different frequencies, thereby demanding a clock divider or separate clock generators.
This section details the clocking strategy options ESP supports and a Digital Clock Oscillator (DCO) specification and integration for high speed SoCs.
Clocking strategy
ESP supports three main clock strategies:
- Dual external: One clock source for the tiles and one for the NoC; both clocks are driven from external sources
- Multi DCO: Each tile has its own DCO. Similarly, the NoC has its own DCO located inside the IO tile
- Single DCO: There is only one DCO in the IO tile clocking the full SoC
In the ESP configuration GUI, the user can select one of these strategies. A message appears in the white box below detailing the selected configuration.
Except for the “Dual external” clocking strategy, the option we used for RTL simulation above, the user needs to provide a DCO mapped to the target technology compliant with ESP’s specification. The following sections detail the DCO specfication and its integration to ESP for RTL simulation, logic synthesis, and place and route steps.
DCO specification
ESP provides a CSR with 21 configuration bits that the user can leverage to
configure their own DCO block. For reference, we provide a technology
independent example of a DCO we used in our projects in
esp/rtl/techmap/inferred/
. File DCO_common.v
has the primitives that need to be
mapped to the target technology. File DCO.v
instantiates the primitives in the DCO
top-level.
The following figure illustrates our reference DCO architecture:
The following table details the reference DCO interface:
Port name | Diretion | Active | Description |
---|---|---|---|
RSTN | Input | Low | Divider reset |
EN | Input | High | Enable DCO oscillation |
CC_SEL | Input | N/A | Coarse tune bits, the larger the value, the lower the CLK frequency (6 bits) |
FC_SEL | Input | N/A | Fine tune bits, the larger the value, the lower the CLK frequency (6 bits) |
EXT_CLK | Input | N/A | External clock source (Backup clock) |
CLK_SEL | Input | N/A | Select clock: 0: DCO clock; 1: External clock |
FREQ_SEL | Input | N/A | Divider for tile clock. 00: DCO clk, 01: DCO÷2, 11: DCO÷4 (2 bits) |
DIV_SEL | Input | N/A | Divider ratio for test clock: 1: ÷2; 2: ÷4; 3: ÷8; …; 7: ÷128 (3 bits) |
CLK | Output | N/A | High-speed clock for tile logic |
CLK_DIV | Output | N/A | Divided clock. Routed off chip for testing |
Regardless of the chosen DCO architecture, Spice simulation of the technology-mapped DCO is highly recommended for frequency and jitter characterization in various corners.
DCO integration
ESP treats the DCO as a macro block, similarly to memories and standard cells. Therefore, it is expected that the following views are generated after place and route:
- DCO.v, DCO_common.v -> verilog used for RTL and netlist simulation
- DCO_tt.sdf -> standard delay formar (SDF) used for RTL and netlist simulation
- DCO.lib -> liberty view used for logic and physical synthesis
- DCO.lef -> layout exchange format (LEF) used for logic and physical synthesis
- DCO_pp.v -> verilog with power connections used for LVS
- DCO.gds -> used for DRC and LVS
Following our example for “project_x” using the technology skywater 130nm, the
DCO.v, DCO_common.v, and DCO_tt.sdf must be located in the directory
project_x/sky130/dco_wrappers
. The DCO.lib should be in
project_x/sky130/lib
.
As long as the file and module names and the interface match the reference provided and the views are located in the specified folders, the DCO is automatic integrated into ESP.
The present document targets RTL integration only. Future document will detail the remaining view locations.
SoC RTL simulation: main system using DCO clocking
To enable Modelsim to simulate the DCO with the annotated delays the user needs
to remove the comments inside project_x/project_x_v1.0/vsim.tcl
to match the
following:
#echo "Restarting simulation with SDF annotation for DCO and delay line"
set sdf ""
set TECHLIB $::env(TECHLIB)
set ESP_ROOT $::env(ESP_ROOT)
set VSIMOPT $::env(VSIMOPT)
foreach inst [find instances -nodu -bydu DCO] {
append sdf "-sdfmax "
append sdf [string map {( [} [string map {) ]} $inst]]
append sdf "=${ESP_ROOT}/rtl/techmap/${TECHLIB}/dco_wrappers/DCO_tt.sdf "
}
append sdf "-suppress 3438"
eval vsim $sdf $VSIMOPT
A similar script can be used for the XCelium simulator.
The RTL simulation targets are the same detailed in the previous section. For the “Hello from ESP” simulation, for instance, execute:
project_x/project_x_v1.0 > make sim