PolyBench results

Introduction

We evaluate accelerators generated by Bambu on the standard benchmark suite PolyBench:

Louis-Noel Pouchet and Tomofumi Yuki. PolyBench/C 4.2.1. htp://polybench.sourceforge.net.

Two sets of experiments were carried out, one with floating-point kernels and one with integer versions of the same kernels. A comparison with a standard commercial HLS tool is provided when possible.

Setup

– Target hardware: AMD/Xilinx Virtex7 FPGA.
– Target frequency: 200 MHz.
– Source code for the benchmarks is available in examples/PolyBench.

Summary

Speedup over commercial HLS tool for a set of different Bambu configurations across all benchmarks.

Latency is measured in ns (clock cycles * achieved period post-implementation). > 1 is better.

Area consumption over commercial HLS tool for a set of different Bambu configurations across all benchmarks.

Area is measured in Equivalent LUTs (BRAMs * 40 + DRAMs * 40 + DSPs * 40 + Registers * 0.5 + LUTs). < 1 is better.

Trade-offs

We highlight the effect of selecting different Bambu configuration options, which can steer Bambu towards different trade-offs between performance and area. The following plots also show the Pareto front.

Pareto plots (Latency vs. Area) for selected benchmarks. Points marked with x are dominated.





Detailed Results

Post-p&r timing and area metrics for each benchmark and each selected configuration are available in a separate table.

A framework for Hardware-Software Co-Design of Embedded Systems