Tactics For Fast, Predictable FPGA Timing Optimization

It is the start of a brand new day. You come to the office, open up a medical imaging design that met timing the day before and finds that a bugfix made by someone else has led to timing failures.

In the demanding world of high-performance FPGA design, there is no free lunch.

We take pains to understand the target device architecture in order to write the appropriate RTL and integrate suitable IP blocks. Subsequently, we craft the design constraints carefully so that synthesis and implementation understand our timing intent well. Finally, we spend time exploring helpful switches in the FPGA tools to converge on a good result.

It would be really nice to have this hard work pay off for all subsequent design iterations, wouldn't it?

However, minor design changes can suddenly degrade performance, especially for heavily-utilized or congested designs. There is considerable complexity in FPGA synthesis and implementation that the tools have to navigate - a non-trivial task.

How can you ensure that results are more predictable?

The solution we proposed is simple.

Find sets of known-good parameters that are tuned for your design and are likely to yield similar performance across modifications.

A proven method with roots in Machine Learning is to develop an understanding of how design characteristics, FPGA tool parameters and performance statistics are correlated.

Gather relevant design characteristics e.g. logic levels, utilization percentages, etc.
Generate sets of synthesis and implementation parameters based on experience and history of results for this design
Compile all parameter sets
Analyze results and correlate them to the parameters
Reuse good parameters for subsequent design changes

This might sound daunting but that is what we built. Plunify's InTime software is used to train a database and optimize the design at the same time. Let's look at two customer designs used in office automation and medical imaging respectively.

Real-World Case Study #1 - Project Cyclone (Office Automation)

This design has 70% logic utilization and meets timing in 1 out of 20 placement seed builds, which leaves a little too much to random chance for the designer's comfort. Steps 1-5 above are executed with 20 sets of parameters per iteration until the timing is met.

In Figure 1 below, each revision refers to a different version of the RTL.
The Y-axis shows the number of builds taken to meet timing.

Figure 1: Builds to timing closure vs. design revision (Project C)

The initial optimization took 61 builds before timing was met. Subsequent optimization attempts after design changes (Rev 1, Rev 2, and so on) took dramatically fewer builds because the good build parameters found for Revision 1 were reused.

Real-World Case Study #2 - Project Zynq (Medical Imaging)

Smaller than the previous design, this one has 40% utilization and meets timing via a sweep of implementation tool directives.
InTime is used to find good build parameters in anticipation of design changes.

Figure 2: Builds to timing closure per design revision (Project Z)

InTime took 77 builds to meet timing for the first revision. This number dropped significantly after that as the successful parameters were applied to subsequent revisions.

In Revision 2 and 3, the best synthesis parameters found for Revision 1 were used for synthesis, then InTime was used to explore the implementation parameters.

This is an approach we highlight in our timing closure methodology.

Conclusion

The number of compilations needed per revision will still vary, as seen in the charts above. This is dependent on the extent of design changes and how well the Machine Learning database was trained.
Armed with sets of known-good parameters, the turnaround time for performance optimization becomes a lot more consistent.

To find out more about InTime, request for a free evaluation here.

References:

InTime technology backgrounder

Tactics For Fast, Predictable FPGA Timing Optimization