Bambu: examples

The distribution includes several examples under directory example. Here is the list of directories currently included:

add_device_simple (link)

This example shows how to add a non-supported device to the Bambu synthesis flow.
The file xc7z045-2ffg900-VVD.xml has copied from the framework distribution etc/devices/Xilinx_devices/xc7z020-1clg484-VVD.xml and then renamed in xc7z045-2ffg900-VVD.xml.
After copying the file few changes have been made. All of them relate to the new device characteristics: model, package and speed grade.
Here it follows the changed part of the xml file:
<model value="xc7z045"/>
<package value="ffg900"/>
<speed_grade value="-2"/>

Note that the field
<family value="Zynq-VVD"/>
refers to the synthesis script stored in etc/devices/Xilinx_devices/Zynq-VVD.xml.
So, the bambu.sh will first simulate and then synthesize the C based description using the above specified Zynq device.

Note that, this example shows another nice feature of the HLS framework. The file module.c contains the C specification of the factorial function in its recursive form.
Bambu is not actually able to synthesize recursive functions but GCC is able to automatically translate it in its non-recursive form once -O2 option is passed. To understand what exactly
has been synthesized please check the a.c in the sim or synth directory created by bambu.sh.
The new device considered in this example is very similar to one of the already supported. In case the device is not very similar to one of the already characterized devices, the user should
check and accordingly add the characterization scripts. Examples of characterization scripts based on eucalyptus are available in etc/devices.
Note that, eucalyptus is automatically built once an RTL synthesis back-end is configured.

arf (link)

This directory includes a simple example of High Level synthesis and generation of RTL simulation&synthesis scripts.
The results of the HLS synthesis could be inspected by looking into testbench/hls_summary_0.xml.
The result of the scheduling could be graphically viewed by exploiting a viewer of dot files (e.g., xdot or dotty).
In particular, Bambu generates several dot files by passing the option –print-dot.
The scheduling of the arf function is stored in file HLS_output/dot/arf/HLS_scheduling.dot while the FSM of the arf function annotated with the C statements is stored in file HLS_output/dot/arf/HLS_STGraph.dot.

arf_res_sharing (link)

In this directory, the impact of resource sharing on multipliers for the arf benchmark is considered. Two sets of scripts are provided: constrained and non-constrained based synthesis scripts.
The devices considered are the ones supported by Bambu.
In all the synthesis performed, the WB4 interface has been used to avoid issues with the high number of IO pins required by the arf function when synthesized alone.
Basically, adding a constraint on the number of used multipliers used requires to pass to Bambu a xml file structured in this way:

<?xml version="1.0"?>
<constraints>
   <HLS_constraints>
      <tech_constraints fu_name="mult_expr_FU" fu_library="STD_FU" n="1"/>
   </HLS_constraints>
</constraints>

crc (link)

This directory collects several scripts to test the multi-bus feature of bambu.
The file test_icrc.xml shows how to write xml testcases for array-based function parameters.

crc_yosys (link)

This directory shows an example of how it is possible to write a C-based testbench to test a given kernel.
The kernel function is defined through the option –top-rtldesign-name.

This design flow requires to add two attributes to the kernel function:

  __attribute__ ((noinline)) __attribute__ ((used))  

and to insert this two timing functions:

        __builtin_bambu_time_start();
        __builtin_bambu_time_stop();

These two functions will start and stop a timer used by Bambu to compute the total number of cycles spent in the kernel function.
The target device is a Zynq xc7z020,-1,clg484 and the back-end flow is based on yosys open-source RTL synthesis tool (http://www.clifford.at/yosys/).

crypto_designs (link)

This example starts from the reference C description of Keccak crypto function distributed through this website http://keccak.noekeon.org/.
Keccak has been selected by NIST to become the new SHA-3 standard (see http://www.nist.gov/hash-competition and http://ehash.iaik.tugraz.at/wiki/The_SHA-3_Zoo).
Further details can be found at http://ehash.iaik.tugraz.at/wiki/Keccak.
Together with the C implementation optimized for processors, there exist several implementations for FPGA and ASIC.
So, as a referenced it has been selected one of the Low-Area Implementations developed by the authors of the Keccak algorithm (i.e., Guido Bertoni-STMicroelectronics, Joan Daemen-STMicroelectronics, Michaël Peeters-NXP Semiconductors and Gilles Van Assche-STMicroelectronics).

The results reported at this link http://ehash.iaik.tugraz.at/wiki/SHA-3_Hardware_Implementations are:

Altera Cyclone III 1559LEs 47.8Mbit/s 181 MHz

Xilinx Virtex 5 444slices 70.1Mbit/s 265 MHz

Starting from the C description delivered as a reference, it has been built an equivalent C function (equivalent to the VHDL reference design).
After two days of hacking and design space exploration, here are 5 different alternatives using different FPGAs:

Altera Cyclone II 5460LEs 66.9Mbit/s 107MHz (directory keccak_CycloneII_10)

Altera Cyclone II 8681LEs 150.8Mbit/s 262MHz (directory keccak_CycloneII_4hl)

Lattice ECP3 3789slices 80.2Mbit/s 128MHz (directory keccak_ECP3_10_09)

Lattice ECP3 3831slices 80.2Mbit/s 128MHz (directory keccak_ECP3_9)

Xilinx Virtex 5 7015slices 152.69Mbit/s 252MHz (directory keccak_V5_4hl)

These results have been obtained with PandA framework 0.9.3.

Along with this example, another one comes showing how it is possible to build an Autotools project for the high-level synthesis with bambu: directory crypto_designs/multi-keccak.

fft_example (link)

This directory includes an example program which computes the FFT of a short pulse in a sample of length 128.

function_pointers (link)

Scripts, updated results, and code related to this paper:
Marco Minutoli, Vito Giovanni Castellana, Antonino Tumeo, Fabrizio Ferrandi: Inter-procedural resource sharing in High Level Synthesis through function proxies. FPL 2015: 1-8.

CHStone (link)

This directory contains the CHStone v1.11 benchmarks taken from http://www.ertl.jp/chstone/ and all the scripts used and results obtained with bambu.

mm (link)

In this directory, it is shown how to write a test.xml file when multi-dimensional arrays are used as function parameters.
The example uses the option –memory-allocation-policy=EXT_PIPELINED_BRAM. This option is used to declare that the parameters are allocated on a block ram memory (e.g., pipelining access is possible).

mm_float (link)

This example is very similar to the mm example.
There are mainly two differences:
– the two dimensions of the arrays are passed as a parameter;
– the matrix elements are floats.

libm (link)

This directory contains scripts and results obtained on the libm functions supported by bambu.

VGA (link)

Vga Adapter on Altera DE1 Cyclone II (EP2C20F484C7N).
The main aim of the project is to develop an application written in C which drives a VGA-compatible screen connected to a DE1 Altera FPGA.
The design includes some Verilog IPs which control the VGA port and shows how Bambu can manage existing IPs described by using hardware description languages.

VGA_Nexys4 (link)

This simple example shows how to integrate C code with low-level interfaces written in Verilog.
The design improves the VGA example by adapting such design to the more capable NEXYS4 prototyping board.

file_simulate (link)

In this directory, an example of how Bambu can use IO libc primitives (open, read, write and close) is shown.

IP_integration (link)

This directory contains a simple example describing how to integrate and verify existing IPs with functions written in C that receives structs passed by pointers.

simple_asm (link)

This simple example shows how to integrate small snippet of Verilog in the HLS flow by making Bambu use Verilog as third assembler dialect.
Currently, only single output asm instructions are supported. In case outputs are included to pass the simulation the Intel and the ATT asm should be included. For asm having only inputs, such asm string could be safely left empty.
A detailed reference on how asm statements are considered by GCC could be found at this link:https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html.

python-bindings (link)

This directory includes an example showing how to integrate Python for design verification.

led_example (link)

This directory includes an example of a simple GPIO controller developed to show how to integrate Verilog IPs with plain C.

pong (link)

This directory includes the Pong game ported to Nexys4 prototyping board. Pong was the first game developed by Atari Inc. and was designed and built by Allan Alcorn. Further information can be found at https://en.wikipedia.org/wiki/Pong.
The code has been ported by Fabrizio Ferrandi by adapting an SDL based tutorial to the PandA methodology for the integration of low-level IP cores written in Verilog.
The original SDL code can be found at http://www.aaroncox.net/tutorials/arcade/PaddleBattle.html.
The artificial intelligence used to control the computer paddle is based on a random function described at http://burtleburtle.net/bob/rand/smallprng.html

breakout (link)

This directory includes the breakout game ported to Nexys4 prototyping board. The game was designed by Nolan Bushnell, Steve Wozniak, and Steve Bristow. History of Breakout game can be found at this link: https://en.wikipedia.org/wiki/Breakout_%28video_game%29.
The code has been ported by Fabrizio Ferrandi by adapting an SDL based tutorial to the PandA methodology for the integration of low-level IP cores written in Verilog.
The original SDL code can be found at http://www.aaroncox.net/tutorials/arcade/BRICKBreaker.html.

MachSuite (link)

This directory contains the scripts, the results and code of the MachSuite benchmarks set which is described in this paper:

Brandon Reagen, Robert Adolf, Sophia Yakun Shao, Gu-Yeon Wei, and David Brooks.
“MachSuite: Benchmarks for Accelerator Design and Customized Architectures.”
2014 IEEE International Symposium on Workload Characterization.

hls_study (link)

This directory includes the scripts, the updated results and the code related to this paper:

R. Nane, V. M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. T. Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels, “A Survey and Evaluation of FPGA High-Level Synthesis Tools,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. PP, iss. 99, pp. 1-1, 2016.

softfloat (link)

This directory includes scripts and code testing single and double precision basic operations: division, subtraction, addition and multiplication.

A framework for Hardware-Software Co-Design of Embedded Systems