Introduction
We evaluate accelerators generated by Bambu on the standard benchmark suite PolyBench:
Louis-Noel Pouchet and Tomofumi Yuki. PolyBench/C 4.2.1. htp://polybench.sourceforge.net.
Two sets of experiments were carried out, one with floating-point kernels and one with integer versions of the same kernels. A comparison with a standard commercial HLS tool is provided when possible.
Setup
– Target hardware: AMD/Xilinx Virtex7 FPGA.
– Target frequency: 200 MHz.
– Source code for the benchmarks is available in examples/PolyBench.
Summary
Speedup over commercial HLS tool for a set of different Bambu configurations across all benchmarks.
Latency is measured in ns (clock cycles * achieved period post-implementation). > 1 is better.
Area consumption over commercial HLS tool for a set of different Bambu configurations across all benchmarks.
Area is measured in Equivalent LUTs (BRAMs * 40 + DRAMs * 40 + DSPs * 40 + Registers * 0.5 + LUTs). < 1 is better.
Trade-offs
We highlight the effect of selecting different Bambu configuration options, which can steer Bambu towards different trade-offs between performance and area. The following plots also show the Pareto front.
Pareto plots (Latency vs. Area) for selected benchmarks. Points marked with x are dominated.
Detailed Results
Post-p&r timing and area metrics for each benchmark and each selected configuration are available in a separate table.