How Much Do You Really Know About Placement Seed Sweep?

InTime

How Much Do You Really Know About Placement Seed Sweep?

This whitepaper compares the effectiveness of two timing optimization methods: The InTime Default recipe provided by the InTime FPGA design optimization tool and another one commonly known as a “Seed Sweep”.

InTime Default is a machine learning approach that finds good synthesis and place-and-route setting combinations for a design. It shares data insights across different designs and produces predictable effects.

A Seed Sweep varies the Quartus Fitter SEED value which affects the initial placement of a design. Changing the seed modifies the conditions of a design at the start of place-and-route and leads to fluctuations in the Fitter results . This is a well-known approach used by design teams worldwide to optimize their designs. However, the effects are random and there is no “golden” seed value that applies to all designs.

The experiment described here is performed using a Stratix V design and an Arria 10 design, each compiling 200 data points for the InTime Default recipe and 200 data points for a Seed Sweep for each design. The best Worst Slack (WS), Total Negative Slack (TNS), Fmax and runtimes are then compared.

For the Stratix V design, InTime Default improved the Worst Slack by 57.38% while Seed Sweep only had 13.46% improvement.

For the Arria 10 design, InTime Default recipe improved the Worst Slack by 34.98% while Seed Sweep only had 11.44% improvement.

In summary, the InTime Default recipe performed better than a Seed Sweep with respect to timing performance.

Design Details

The two designs used for this experiment are listed below.

Table 2.1: Stratix V design

Design Info 
 Quartus Prime Version  17.1.0 Build 590 10/25/2017 SJ Standard Edition
 Revision Name  jesdsv
 Top-level Entity Name  jesdrx
 Family  Stratix V                                           
 Device  5SGSMD3H3F35I4                               
 Timing Info 
Clock name rxlink_clk
Corner Slow 850mV -40C Model
Worst Slack , WS (ns) -0.765
Total Negative Slack, TNS (ns) -897.292
Fmax (MHz) 306.28
Utilization Info
 Logic utilization (in ALMs)  42,553 / 89,000 ( 48 % )
 Total registers 47330
 Total pins  138 / 544 ( 25 % )
 Total virtual pins 6,540
 Total block memory bits  81,920 / 14,090,240 ( < 1 % )
 Total DSP Blocks  0 / 600 ( 0 % )
 Total HSSI STD RX PCSs  0 / 24 ( 0 % )
 Total HSSI 10G RX PCSs  0 / 24 ( 0 % )
 Total HSSI GEN3 RX PCSs  0 / 24 ( 0 % )
 Total HSSI PMA RX Deserializers  0 / 24 ( 0 % )
 Total HSSI STD TX PCSs  0 / 24 ( 0 % )
 Total HSSI 10G TX PCSs  0 / 24 ( 0 % )
 Total HSSI GEN3 TX PCSs  0 / 24 ( 0 % )
 Total HSSI PMA TX Serializers  0 / 24 ( 0 % )
 Total HSSI PIPE GEN1_2s  0 / 24 ( 0 % )
 Total HSSI GEN3s  0 / 24 ( 0 % )
 Total PLLs  0 / 52 ( 0 % )
 Total DLLs  0 / 4 ( 0 % )

Table 2.2: Arria 10 design

Design Info      
 Quartus Prime Version  17.1.0 Build 240 10/25/2017 SJ Pro Edition
 Revision Name  jesda10
 Top-level Entity Name  jesdrx
 Family  Arria 10                                             
 Device  10AX027E4F27E3LG                                    
 Timing Info  
Clock name rxlink_clk
Corner Slow 900mV 100C Model
Worst Slack, WS (ns) -0.769
Total Negative Slack, TNS (ns) -977.742
Fmax (MHz) 305.90
Utilization Info
 Logic utilization (in ALMs)  52,860 / 101,620 ( 52 % )
 Total registers 56480
 Total pins  66 / 296 ( 22 % )
 Total virtual pins 8,262
 Total block memory bits  102,400 / 15,360,000 ( < 1 % )
 Total DSP Blocks  0 / 830 ( 0 % )
 Total HSSI RX channels  0 / 12 ( 0 % )
 Total HSSI TX channels  0 / 12 ( 0 % )
 Total PLLs  0 / 32 ( 0 % )

Test Procedures

The experiment was performed via the following steps:

  1.  Open Stratix V design in InTime.
  2.  Run InTime Default recipe for 20 rounds. Each round will run 20 compilations.
  3.  Run Seed Sweep for 200 compilations, each with a different seed value.
  4.  Compare the Worst Slack (WS) ,Total Negative Slack (TNS) and Fmax.
  5.  Repeat Steps 1 to 4 for the Arria 10 design.

Results

Tables 4.1 ,4.2 and 4.3 below compare the timing results between the Stratix V and Arria 10 designs using WS ,TNS values and Fmax.

Table 4.1: Worst Slack (WS) comparison

Best Worst Slack, WS (ns)

Original (ns) Seed Sweep InTime Default
Slack(ns) % Slack(ns) %
Stratix V design -0.765 -0.662 13.46 -0.326 57.38
Arria 10 design -0.769 -0.681 11.44 -0.5 34.98

Table 4.2: Total Negative Slack (TNS) comparison

Best Total Negative Slack, TNS (ns)

Original (ns) Seed Sweep InTime Default
Slack(ns) % Slack(ns) %
Stratix V design -897.292 -745.228 16.95 -39.736 95.57
Arria 10 design -977.742 -635.613 34.99 -75.44 92.28

Table 4.3:Fmax comparison

Fmax  (MHz)

Original (MHz) Seed Sweep InTime Default
Fmax(MHz) % Fmax(MHz) %
Stratix V design 306.28 316.25 3.26 353.89 15.54
Arria 10 design 305.90 314.37 2.76 333.33 8.97

Optimization Process and Run Time

Running a Seed Sweep is a simple process of specifying “Fitter Seed” values from 1 to 200. The default value is 1. The InTime Default recipe involves running 10 rounds of 20 compilations each. Each round starts after the previous round has ended. Therefore, there are differences in the overall runtime of the two approaches.

Table 5.1 below shows design runtime differences. It is worth noting that the average runtime for InTime Default is higher than that for Seed Sweep. In addition, there are larger variations in the compilation times for InTime Default as the settings attempted are more varied than seeds. The total runtime is also longer for InTime Default due to the need for analyzing results at the end of each round to generate settings for the next.

Table 5.1: Run Time comparison based on different concurrent runs and rounds

Run Time (h)

Original Run Time (h) Concurrent Runs Seed Sweep
(1 round of 200 compilations)
InTime Default
(10 rounds of 20 compilations)
Total Run Time (h) Avg Run Time (h) Total Run Time (h) Avg Run Time (h)
Stratix V design 0.5 5 26 0.6 37 0.7
Arria 10 design 0.5 3 37 0.8 38 1.0

5.1 Stratix V design results

Here are the results for the Stratix V design.

5.1.1 Seed Sweep

5.1.1

Figure 5.1.1: TNS(ns) values for 200 seeds

5.1.2

Figure 5.1.2: WS(ns) values for 200 seeds

5.1.2 InTime Default

(Note: The Y-axis is in a logarithmic scale due to larger fluctuations)

5.1.3

Figure 5.1.3: TNS values for InTime Default

 

5.1.4

Figure 5.1.4: WS values for InTime Default

The chart below shows how the results improved across rounds. The X-axis displays job numbers. Each column represents the timing results for a job. The green line is the best result in each job and the red line is the worst.

5.1.5

Figure 5.1.5: TNS values for InTime Default

 

5.1.6

Figure 5.1.6: WS values for InTime Default

5.2 Arria 10 design results

Here are the results for the Arria 10 design.

5.2.1 Seed Sweep
The following chart shows the results for 1 to 200 seed values. Green represents results that are better than the original result and blue indicates results that are worse.

5.2.1

Figure 5.2.1: TNS (ns) values for 200 seeds

5.2.2

Figure 5.2.2: WS(ns) values for 200 seeds

5.2.2 InTime Default
InTime runs 20 compilations each time for 10 rounds. When each round completes, InTime analyzes and learns from the data to generate the parameters for the next round.

The following charts show the results for 200 compilations of InTime. Note that the Y-axis is in a logarithmic scale due to larger fluctuations in timing results.

5.2.3

Figure 5.2.3: TNS values for InTime Default

5.2.4

Figure 5.2.4: WS values for InTime Default

The chart below shows how the results improve across rounds. The X-axis displays job numbers. Each column represents a job. The green line is the best result in each job and the red line is the worst.

5.2.5

Figure 5.2.5: TNS improvements for InTime Default (10 rounds)

5.2.6

Figure 5.2.6: WS values for InTime Default (10 rounds)

Conclusion

InTime is more effective at delivering higher timing performance. In the same number of compilations runs (200), InTime Default is able to produce 30% to 50% performance improvements as compared to only 11% to 14% gains for a Seed Sweep.

Furthermore, there are even greater performance improvements if both methods are used in conjunction – namely, by doing a Seed Sweep based on good InTime Default results. This approach is used by the InTime tool to enhance timing performance and accelerate time-to-market. The key is to first get a sufficiently good result before using seeds to get one over the finishing line.

To read more about InTime and its capabilities, please go to https://www.plunify.com/en/intime/

Subscribe to Plunify Blog

Enter your email address and have the latest insights on FPGA, cloud and Machine Learning delivered straight to your inbox.

Leave a Reply