Category Archives: Uncategorized

PandA 0.9.8 released

We are glad to announce a new release of the PandA-bambu project, an open-source framework for research in high-level synthesis and HW/SW co-design.

Panda Bambu is an open-source framework aimed at assisting the designer during the high-level synthesis of complex applications, supporting most of the C constructs. It is developed for Linux systems, it is written in C++11, and its pre-compiled binaries can be downloaded under GPL license as an AppImage package (link). The source code is also publicly available as a GitHub repository at this link.

Issues, pull requests, and patches can be submitted by using the GitHub website. Install instructions, tutorials, and other info can be found on the PandA website.

New features introduced:

  • Added an LLVM based tree height reduction step.
  • Updated support to NanoXplore NXmap: NXmap3 support, NG-ULTRA nx2h540tsc device family support
  • Added support for AXI Master interface.
  • Improved regression test, benchmarking, and integration flow through GitHub Actions workflows

What’s Changed

New Contributors

Full Changelog: v0.9.7…v0.9.8

Link to the binary AppImage: https://release.bambuhls.eu/appimage/bambu-0.9.8.AppImage

PandA 0.9.7 released

We are glad to announce a new release of the PandA-bambu project, an open-source framework for research in high-level synthesis and HW/SW co-design.

Panda Bambu is an open-source framework aimed at assisting the designer during the high-level synthesis of complex applications, supporting most of the C constructs. It is developed for Linux systems, it is written in C++11, and its pre-compiled binaries can be downloaded under GPL license as an AppImage package (link). The source code is also publicly available as a GitHub repository at this link.

Issues, pull requests, and patches can be submitted by using the GitHub website. Install instructions, tutorials, and other info can be found on the PandA website.

New features introduced:

  • Added support to CLANG/LLVM compiler versions 8, 9,10, 11, and 12.
  • Added support to Xilinx Vitis HLS LLVM 2020.2. (link)
  • Added a Google Colab notebook with many examples to play with Bambu.
  • Improved support to NanoXplore FPGAs (e.g., NG-Medium and NG-Large) in the contest of hermes-h2020-project.
  • Added initial support to ASIC flow based on Yosys+OpenROAD projects. Nangate45 and ASAP7 PDKs supported.
  • Simplified the simulation/synthesis backends integration. In case the used simulator or synthesizer is in the system path, the configuration can be as simple as ../configure –prefix=/opt/panda
  • Added support to multi-thread simulation when Verilator version 4 is used (it is disabled by default).
  • Improved bambu memory consumption.
  • Added support to AppImage bambu binary distribution.
  • Added support to xc7z045-2ffg900-VVD device.
  • Added support to ECP5 Lattice semiconductor devices (e.g., LFE5UM85F8BG756C, LFE5U85F8BG756C).
  • Improved and better integrated value range analysis.
  • Improved support to interface synthesis by allowing references in called functions.
  • Added a first support to axis interface.
  • Simulation and synthesis tools are now detected at runtime. Configure options have been removed, now vendor-specific install directories may be specified using Bambu option –<vendor>-root=<path> (e.g. –xilinx-root=/opt/Xilinx).
  • PandA-bambu reference paper has been published at DAC: F. Ferrandi, V. G. Castellana, S. Curzel, P. Fezzardi, M. Fiorito, M. Lattuada, M. Minutoli, C. Pilato, and A. Tumeo, “Invited: Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications,” in 2021 58th ACM/IEEE Design Automation Conference (DAC), 2021, pp. 1327-1330.
  • Added support to Svelto methodology. Reference paper: M. Minutoli, V. Castellana, N. Saporetti, S. Devecchi, M. Lattuada, P. Fezzardi, A. Tumeo, and F. Ferrandi, “Svelto: High-Level Synthesis of Multi-Threaded Accelerators for Graph Analytics,” IEEE Transactions on Computers, iss. 01, pp. 1-14, 2021.
  • Added support to Tensor flow optimization. Reference paper: M. Siracusa and F. Ferrandi, “Tensor Optimization for High-Level Synthesis Design Flows,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Best Paper Candidate of CODES+ISSS 2020, vol. 39, iss. 11, pp. 4217-4228, 2020.
  • Example of integration between Soda-Opt and Bambu: Reference paper: S. Curzel, N. Bohm Agostini, S. Song, I. Dagli, A. Limaye, C. Tan, M. Minutoli, V. G. Castellana, V. Amatya, J. Manzano, A. Das, F. Ferrandi, A. Tumeo, “Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators”, 2021 40th International Conference on Computer-Aided Design (ICCAD).
  • Improved support for discrepancy analysis. Reference paper: P. Fezzardi and F. Ferrandi, “Automated Bug Detection for High-Level Synthesis of Multi-Threaded Irregular Applications,” ACM Trans. Parallel Comput., vol. 7, iss. 4, 2020.
  • Paper describing the PandA-bambu framework: F. Ferrandi, V.G. Castellana, S. Curzel, P. Fezzardi, M, Fiorito, M. Lattuada, M. Minutoli, C. Pilato, A. Tumeo, “Invited: Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications,” 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 1327-1330.

 

What’s Changed

Full Changelog: v0.9.6.1…v0.9.7

Link to the binary AppImage: https://release.bambuhls.eu/appimage/bambu-0.9.7.AppImage

PandA 0.9.6 released

New features introduced:

  • Added support to BRAVE FPGAs. Both NG-Medium and NG-Large are supported.
  • Added support to TASTE model-based design flow. A new option has been added to activate the customized HLS flow: –experimental-setup=BAMBU-TASTE.
  • Added support for Hardware Discrepancy Analysis. Reference paper: Pietro Fezzardi, Marco Lattuada, Fabrizio Ferrandi, Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis. ACM Trans. Embedded Comput. Syst. 16(5): 149:1-149:19 (2017).
  • Added support for OpenMP for. Reference papers: M. Minutoli, V. G. Castellana, A. Tumeo, M. Lattuada, and F. Ferrandi, “Efficient Synthesis of Graph Methods: A Dynamically Scheduled Architecture,” in Proceedings of the 35th International Conference on Computer-Aided Design, New York, NY, USA, 2016, p. 128:1–128:8. V. G. Castellana, M. Minutoli, A. Morari, A. Tumeo, M. Lattuada, and F. Ferrandi, “High level synthesis of RDF queries for graph analytics,” in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015, pp. 323-330.
  • Customized Dynamic HLS flow updated. Reference paper: M. Lattuada and F. Ferrandi, “A Design Flow Engine for the Support of Customized Dynamic High Level Synthesis Flows”, ACM Trans. Reconfigurable Technol. Syst. 12, 4, Article 19 (October 2019), 26 pages.
  • Added initial support to C++/fortran high-level synthesis. Now, Ubuntu distributions require the installation of g++-multilib package.
  • Added support to CLANG/LLVM compiler version 4, 5, 6 and 7. On multiple files, LTO is exploited.
    Added support to high-level synthesis of spec in LLVM format (i.e., file with .ll extension).
  • Added to CLANG/LLVM based analysis an interprocedural value range analysis based on the following paper: Fernando Magno Quintao Pereira, Raphael Ernani Rodrigues, and Victor Hugo Sperle Campos. “A fast and low-overhead technique to secure programs against integer overflows”, In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (CGO ’13). Washington, DC, USA, 1-11. The implementation started from code available at this link:https://code.google.com/archive/p/range-analysis/. The original code went through a deep revision and it has been changed to be compatible with a recent version of LLVM. Some extensions have been also added:
    • Added anti range support.
    • Redesigned many Range operations to take into account wrapping and to improve the reductions performed.
    • Integrated the LLVM lazy value range analysis.
    • Added support to range value propagation of load from constant arrays.
    • Added support to range value propagation of load and store from generic arrays.
  • Added to CLANG/LLVM based analysis an Andersen based pointer analysis. The specific version used is described in: “The Ant and the Grasshopper: Fast and Accurate Pointer Analysis for Millions of Lines of Code”, by Ben Hardekopf & Calvin Lin, in PLDI 2007. The BDD library used is the Buddy BDD package By Jørn Lind-Nielsen. Users working on a Debian/Ubuntu distribution can install the libbdd-dev package.
  • Added support to GCC compiler version 8.
  • Added support to mingw-w64. A Windows-7 64bit binary distribution is now available. This minimal msys2 binary distribution includes bambu, GCC8 with plugin and multilib enabled and a customized version of clang7.
  • Added support to Mac OSX exploiting Mac Ports project (https://www.macports.org/)
  • Improved the vagrant based virtual machine generation scripts. Such scripts now create an Ubuntu 16.04 32bit VirtualBox image and an Ubuntu 18.04 64bit VirtualBox image.
  • Added Vagrant scripts for Ubuntu precise, trusty,xenial and bionic, Fedora 29, CentOs7, MacOSX MacPorts.
    Improved timing constraints and timing reports.
  • Added –registered-inputs=top option.
  • Improved the support for discrepancy when Verilator is used as simulator and a better check of basic floating-point operations.
  • Added an example using C++14 constexpr declaration and a simple GCD example written in C++.
  • The building system is now based on a single configure.ac.
    Regressions are now exploiting a Jenkins based infrastructure.
  • Some performance, style, and c++ improvements have been done following the suggestions coming from cppcheck, clang static analyzer, and Codacy.
  • A single-precision floating-point faithfully rounded powf function has been added. It follows the method published in: Florent De Dinechin, Pedro Echeverria, Marisa Lopez-Vallejo, Bogdan Pasca. Floating-Point Exponentiation Units for Reconfigurable Computing. ACM Transactions on Reconfigurable Technology and Systems (TRETS), ACM, 2013, 6 (1), pp.4:1–4:15. The powf function currently does not supports subnormals.
  • Ported some parts of the code to C++11 standard by applying clang-tidy modernize-deprecated-headers, modernize-pass-by-value, modernize-use-auto, modernize-use-bool-literals, modernize-use-equals-default, modernize-use-equals-delete, modernize-loop-convert and modernize-use-override.
  • Added a SRT4 implementation. Added –hls-fpdiv to select which floating-point division will be used. Current options: SRT4 for Sweeney, Robertson, Tocher floating-point division with radix 4 and G for Goldschmidt floating point division.
  • Added support to ac_types and ac_math library from Mentor Graphics. Concerning the original library, the bambu/PandA library does support both LLVM/Clang and GCC compiler and it requires c++14 standard for the compilation. Initial support to ap_* objects used by VIVADO HLS has been added as well.
  • Added support to top design interfaces: none (ap_none), acknowledge (ap_ack), valid (ap_vld), ovalid (ap_ovld), handshake (ap_hs), fifo (ap_fifo) and array (ap_memory). These interfaces are activated by pragmas added to the source code and when the option –generate-interface=INFER is passed to bambu. Examples of pragma use can be found in panda_regressions/hls/bambu_specific_test4.
  • Added some examples taken from https://github.com/Xilinx/HLx_Examples
  • Added option –clock-name, –reset-name, –start-name and –done-name to specify the top component controlling signal names.
  • Renamed top component and removed the suffix _minimal_interface. Now, the top component has the same name of the top function.
  • Improved and fixed the synthesis of empty functions.
  • Added option –VHDL-library=libraryname to specify the library in which the synthesized function has to be compiled.
  • Extended the VHDL support to other bambu library components.
  • If it is supported, the compiler used to compile the PandA framework uses the c++17 standard (-std=c++17), otherwise, it uses c++11 standard.
  • Integrated Mockturtle library from EPFL (https://github.com/lsils/mockturtle) to simplify LUT-based expressions.
  • Integrated Abseil C++ libraries (https://abseil.io/)
  • Added examples of parallel_queries from ICCAD15 and ICCAD16 papers.
  • Added FPT tutorial material.
  • Added PNNL19 tutorial material.
  • Added PACT19 tutorial material.
  • Improved makefile parallelization.
  • Improved bambu determinism: two runs with the same specs produce the same HDL output.
  • Fixed COND_EXPR_RESTRUCTURING step
  • Fixed omp tests
  • Fixed libm tests
  • Fixed make check and cpp examples
  • Fixed c++ struct initialization
  • Fixed floating point division
  • Fixed make dist command under Github repository

PandA 0.9.5 released

New features introduced:
– Added support to GCC 6 and GCC 7 (GCC 4.9 is still the preferred GCC compiler).
– Added support to bitfields.
– Added support for pointers and memory operations to the Discrepancy Analysis. Reference paper: Pietro Fezzardi and Fabrizio Ferrandi, “Automated bug detection for pointers and memory accesses in High-Level Synthesis compilers”, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016.
– Added new option: –discrepancy-only=comma,separated,list,of,function,names
Restricts the discrepancy analysis only to the functions whose name is in the list passed as argument.
– Added new option: –discrepancy-permissive-ptrs
Do not trigger hard errors on pointer variables.
– added preliminary support to TASTE integration. Reference paper: M. Lattuada, F. Ferrandi, and M. Perrotin, “Computer Assisted Design and Integration of FPGA Accelerators in Aerospace Systems,” in Proceedings of the IEEE Aerospace Conference, 2016, pp. 1-11.
– Added support to OpenMP SIMD. Reference paper: M. Lattuada and F. Ferrandi, “Exploiting Vectorization in High Level Synthesis of Nested Irregular Loops,” Journal of Systems Architecture, vol. 75, pp. 1-14, 2017.
– Default golden reference is now input C code without any modification.
– The options –synthesize and –objectives have been removed. Now the same values passed with –objectives can
be directly passed through the option –evaluation.
– Improved the precision and the effectiveness of the Bit Value analysis and optimizations.
– Improved detection of irreducible loops.
– Improved CSE.
– Added a frontend transformations that merge some operations into FPGA LUTs.
– Now frontend explicitly introduces function calls to softfloat functions.
– Added support to block RAM with latency = 3 (–high-latency=4).
– Added bambu option –fsm-encoding=[auto,one-hot,binary].
– Added a new option: –disable-reg-init-value
Used to remove the INIT value from registers (useful for ASIC designs)
– Improved mapping of multiplications on DSPs.
– Added a GCC plugin to apply the whole program optimization starting from the topfname function instead of main function (currently only GCC 4.9 is supported).
– Added further integer division algorithms:
– non-restoring division with unrolling factor equal to 1 (–hls-div=nr1) which becomes the default division algorithm.
– non-restoring division with unrolling factor equal to 2 (–hls-div=nr2)
– align divisor shift dividend method (–hls-div=as)
– Added a specialization of the integer division working with 64bits dividend and 32bits divisor.
– Single precision floating point faithfully rounded expf and logf functions implemented following the HOTBM method published by
– Jeremie Detrey and Florent de Dinechin, “Parameterized floating-point logarithm and exponential functions for FPGAs”, Microprocessors and Microsystems, vol.31,n.8, 2007, pp.537-545.
The code has been exhaustively tested and it supports subnormals.
– Single precision floating point faithfully rounded sin, cos, sincos and tan functions implemented following the HOTBM method published by
– Jeremie Detrey and Florent de Dinechin, “Floating-point Trigonometric Functions for FPGAs” FPL 2007.
The code has been exhaustively tested and it supports subnormals.
– Single precision floating point faithfully rounded sqrt function implemented following the method published by
– Florent de Dinechin, Mioara Joldes, Bogdan Pasca, Guillaume Revy: Multiplicative Square Root Algorithms for FPGAs. FPL 2010: 574-577
The code has been exhaustively tested and it supports subnormals.
– Implemented the port swapping algorithm as described in the following paper:
– Hao Cong, Song Chen and T. Yoshimura, “Port assignment for interconnect reduction in high-level synthesis,” Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, Hsinchu, 2012, pp. 1-4.
– Improved support to structs passed by copy.
– Improved ROM identification.
– Added a new option: –rom-duplication
Assume that read-only memories can be duplicated in case timing requires.
– Improved memory initialization.
– Added some transformations that lowered some memcpy and memset call to simple instructions.
– Improved softfloat functions for basic single and double precisions operations: sum, sub, mul and division.
Now addition and subtraction operations correctly manage operand equal to +0 and -0.
– Added three options to control which softfloat and libm libraries are used: –softfloat-subnormal, –libm-std-rounding and –soft-fp.
– Fixed builtin isnanf.
– Added double precision implementation of libm round function.
– Added __builtin_lrint, __builtin_llrint, __builtin_nearbyint to libm library.
– Fixed and improved tgamma and tgammaf function.
– Added support to parallel compilation of bambu libraries.
– Added support to the automatic configuration of newer releases of Quartus for IntelFPGAs.
– Improved verilator detection.
– Improved libicu detection.
– Improved boost filesystem macro.
– Fixed problems due to -m32 under arch linux.
– Fixed compilation problems with glpk and ubuntu 14.04.
– Fixed a problem with long double. They now have the same size of double.
– Added support to Mentor Visualizer.
– Improved components characterization and timing models.
– Extended support to VHDL.
– Now VHDL modelsim simulation uses 2008 standard.
– Extended set of synthesis scripts and synthesis results.
– Improved area reporting for Virtex4 devices.
– Improved characterization of asynchronous RAMs.
– Fixed extraction of slack delay from ISE trce and Lattice reports.
– Fixed yosys backend wr.r.t the newer Vivado releases.
– Added SLICES to the set of data collected by characterization.
– Extended set of regression tests.

Quality of results of this release on different target FPGAs could be found at:
CHStone QoR.
libm QoR.
Basic FP operations QoR.

For any information or bug report, please write to panda-info@polimi.it or visit the google group page.

PandAxICT5

2 minutes for a pitch at H2020 Info Day http://panda.dei.polimi.it/wp-content/uploads/ICT05-PandA.pdf https://ec.europa.eu/digital-single-market/en/news/h2020-info-day-factories-future-12-ict-5-and-ict-31-ict-innovation-manufacturing-smes-i4ms #ICT5 #DSMeu #UE #PandA4Design

PandA 0.9.3 released

New features introduced:
– general improvement of performances of generated circuits
– added full support to GCC 4.9 family which is now the default
– improved retrieving of GCC alias analysis information
– added first version of VHDL backend
– added support to CycloneV
– added support to Artix7
– extended support to Virtex7 boards family
– added option –top-rtldesign-name that controls which is the function to be synthesized by the RTL backed
– it is now possible to write the testbench in C instead of using the xml file
– added a first experimental backend to yosys (yosys link )
– added examples/crc_yosys which tests yosys backend and C based testbenches
– improved Verilog testbench generation: it is now fully compliant with cycle based simulators (e.g., VERILATOR)
– added option –backend-script-extensions to pass further constraints to the RTL synthesis (e.g., pin assignment)
– added examples/VGA showing how to integrate existing HDL based IPs in a real FPGA design
– added scripts and results for CHStone synthesis of Lattice based designs
– improved support of complex numbers
– single precision soft-float functions redesigned: now –soft-float is the default and –flopoco becomes optional
– single precision floating point division implemented exploiting Goldshmidt algorithm
– improved synthesis of libm functions
– improved libm regression test
– improved architectural timing model
– improved graphviz representation of FSMs: timing information has been added
– added option –post-rescheduling to further improve the resource usage
– parameter registering is now performed and it can be controlled by using option –registered-inputs
– added a full implementation of Bit Value analysis and coupled with Value Range analysis performed by GCC
– added option –experimental-setup to control bambu defaults:
* BAMBU-PERFORMANCE-MP – multi-port performance oriented setup
* BAMBU-PERFORMANCE – single port performance oriented setup
* BAMBU-AREA-MP – multi-port area oriented setup
* BAMBU-AREA – single-port area oriented setup
* BAMBU – no specific optimizations enabled
– improved code speculation
– improved memory localization
– added option –do-not-expose-globals making possible localization of globals, as it is similarly done by some commercial tools
– added support of high latency memories and of distributed memories: zero, one and two delays memories are supported
– added option –aligned-access to drive the memory allocation towards more simple block RAM models: it can be used under some restricted assumptions (e.g., no vectorization and no structs used)
– ported the GCC algorithm which rewrites a division by a constant in adds and shifts
– added option –hls-div that maps integer divisions and modulus on a C based implementation of the Newton-Raphson algorithm
– improved technology libraries management:
* technology libraries and contraints are now managed in a independent way
* multiple technology libraries can be provided to the tool at the same time
– improved and parallelized PandA test regression infrastructure
– added support to Centos7, fedora 21, Ubuntu 14.04 and Ubuntu 14.10 distributions
– complete refactoring of output messages

Problems fixed:
– fixed problem related to Bison 2.7
– fixed reinstallation of PandA in a different folder
– fixed installation problems on systems where boost and gcc are not installed in default locations
– removed some implicit conversions from generated verilog circuits

For any information or bug report, please write to panda-info@elet.polimi.it or to