InTime Service promises and delivers results in 3-7 days, often successfully optimizing designs with high Worst Negative Slack (WNS). To give a clearer picture, high WNS is defined as a slack value that fails timing by more than 1ns. Here we share 5 tips when dealing with such designs. Contrary to popular belief, successfully optimizing a design is less about design skills and more about the reasons listed below:
- Resource crunch. Most FPGA teams only have a couple of servers, or worse, individual desktop PCs. They also work on multiple versions of the same design in parallel.
- The design is being migrated from an older device to a newer one. Some coding patterns do not take advantage of newer architectural or compiler features, and require time to tweak.
- IP blocks written by 3rd parties or by previous designers who left without leaving sufficient documentation.
- IP blocks which cannot be changed due to complex dependencies, e.g. different variations are used in other designs.
- The design is an ASIC prototype aiming to improve clock frequencies.
As a result, we get designs with WNS of -1ns, -2ns and even worse. Although we are happy to have successfully met customers' performance targets (ranging from less than -100ps of failing slack to completely meeting timing - read more about this service here), unfortunately our approach does not resolve bad constraints or make up for poor design architectures. Every design that does well in InTime Service is carefully crafted and meticulously worked on by experienced designers.
The Optimization Hammer
A comment from a US partner on improving performance stands out in particular:
"The optimization hammer is largest at the top"
In other words, you get the most bang for the buck in synthesis. During this step, there are larger variations in timing, area and other design metrics as a result of design changes. These variations get smaller as we proceed to placement and finally routing. (See below)
We cannot agree more with his statement. If your design misses timing by nanoseconds of Worst Slack, you would have to first satisfy certain intermediate timing estimates before continuing onto the latter stages of optimization. Otherwise improving performance will be a moonshot.
In the ideal case, as your design undergoes synthesis followed by placement, the intermediate timing estimates become more accurate as well as better in terms of reducing failing paths. Steadily, the optimization funnel narrows and timing closes after routing. However, the fear is that the funnel somehow widens like the one below and timing takes a turn for the worse!
The good news is that we don't see the funnel widen that often, based on builds done for InTime Service. Each project may take up to 200 builds to close timing (running in parallel on multiple machines). We monitor every timing estimate. 200 builds might feel like a lot, but it is a drop in the ocean compared to a quadrillion possible combinations in the design space.
Here are some tips for designs with high WNS. Assumption: Your project has been properly designed and constrained. The following is not about how to write better RTL! (That is already well-covered by your FPGA vendor)
Tip 1: Abandon poor builds early
As many already know, the optimization funnel can widen towards the end. You may have good post-synthesis timing estimates but everything can go south the moment placement or routing kicks in. In fact, the actual situation is a bit more complicated. Timing estimates fluctuate widely based on the compilation stage you are in. One technique we use to exploit these estimates is to stop a build based on post-placement estimates, saving up to 50% of the build time. This allows us to run 2x more builds for InTime to learn about what is working and what is not.
Tip 2: Go hard on synthesis + placement settings
There is a strong correlation between post-placement timing estimates and the final timing result. Getting good estimates is crucial. To be specific, if your WNS is very poor, the most effective "hammer" is swung during synthesis. However, there is no good way to extrapolate post-synthesis timing estimates to the final result. This is because synthesis estimates are not accurate enough (understandably due to the lack of physical timing information and yet-to-be-optimized hardware elements). The key is to find good combinations of synthesis and placement parameters - InTime does this for you.
An interesting customer anecdote: InTime found that the MAX_DSP* synthesis option for a design "should be higher in order to pass timing". After looking at the results, the customer informed us that their design actually did not use many DSPs at all, except for a very small block! The MAX_DSP value should not have mattered. This suggests two possibilities - firstly, MAX_DSP may have had some tangible effect after all; and secondly, MAX_DSP triggered variations in the build algorithms which were detected by InTime!
*MAX_DSP is a synthesis setting found in the Vivado tool.
Tip 3: Don't crank all your knobs up to maximum
Many customers provide us with designs that already use the most aggressive tool settings such as the highest Effort Levels. One designer even tried to "reverse-engineer" InTime by turning all compiler knobs to maximum values to see if that yielded better results. Our advice - don't do that. Runtime and Quality-of-Results do not seem to be strongly correlated at the extremes. Another point of view is that if there were a "golden" combination of settings, your FAE or FPGA vendor would have provided it. Every knob cranked to its maximum is not the "golden" combination. Designs are different and so are the optimal settings for each one of them.
Tip 4: Seeds are only for low WNS
This is probably the most well-known and documented tidbit. From "AN 584: Timing Closure Methodology for Advanced FPGA Designs" document,
Using different seed values causes a variance in the Fitter optimization results for compilations on the same design. For example, a seed value of 2 might have the best results for one of your designs, but when you make other changes in the design, either in the source or in the settings, that value might not produce the best results. Also, in a different design, a seed value of 1 might give the best results. On average, you can expect to see about ±5% variance in the results across different seed values.
Remember the optimization hammer? Seeds have a lower range of variation as they take effect during the placement stage. If your WNS is off the charts, finding the perfect seed is akin to looking for the proverbial needle in a haystack - you will waste a lot more build time than necessary.
Tip 5: Consider using the cloud and machine learning
We all know that not having enough compute resources slows things down. What if we give you an unlimited number of build servers? How will you use them effectively?
This is where InTime comes in handy. InTime uses machine learning and that means the need to generate and analyze a lot of data. In the cloud, we can start tens to hundreds of servers for several hours, learn from the results, shut them down and start them up again. This quick-fire repetition enables us to explore and learn at a rapid pace. With Plunify Cloud as a platform, you have all the necessary licenses, software, and resources to make this optimization flow hassle-free. Instead of leaving builds languishing in a job queue, use the cloud for higher productivity and better results on-demand.
I hope these tips help.
Plunify aggregates optimization approaches learned from your design and uses the insights for the next revision/project. With machine learning, we have continuously shortened the total number of runs for subsequent versions of the same design. Our experience with InTime Service tells us that you don't need 200 servers each time to solve your timing problems. You need a disciplined way to reuse what you have learned. Check it out at https://www.plunify.com/en/service/.
Let us know if you have more tips on the optimization build process and make life easier for everyone!