Meeting the timing requirements of a design is inherently difficult, and achieving complete repeatability of the overall timing of a design is sometimes an impossible task. Fortunately, designers have access to design flow concepts that help achieve repeatable timing results. The four areas with the greatest impact are HDL design practices, synthesis optimization, floorplan, and implementation.
Designs with high resource utilization and frequency requirements are the biggest challenges in terms of obtaining reproducible results. They are also the most demanding designs for repeatable results processes. The first step to achieving repeatable results is to apply design sound practices during the HDL design phase. Following good hierarchical boundary practices helps maintain logical integrity, which helps maintain repeatable results when design changes. A good rule of thumb is to place logic that requires overall optimization, implementation, and verification at the same level. In addition, you need to record the input and output of the module. This keeps the timing paths inside the module and avoids interaction when the module changes.Finally, put all the needs into a larger FPGAThe logic of a resource such as block RAM or DSP is all set at the same level.
It is difficult to obtain repeatable results from designs that require too many look-up table (LUT) logic levels for the desired QoR results. LUT latency is generally not the problem, the problem is routing latency between LUTs. This is critical in the high-performance area of design.
Too many logic levels are often attributed to larger if/else structures and longer select statements. If appropriate, select statements with less logic can be optimized using the “full lel_case” and “paral lel_case” Veri log directives, a technique that generally reduces logic levels. Larger multiplexers or decoders can cause routing congestion, resulting in non-reproducible results. A multi-stage registration multiplexer/decoder path helps solve this problem. For adders, replacing the registered adder tree with a registered adder chain can improve performance. If the adders are all registered, the chain will cause longer delay than the tree. For more information on coding best practices, refer to the Xilinx white paper, HDL Coding Practices to Improve Design Performance (WP231), http://www.xilinx.com/support/documentation/white_papers/wp231.pdf.
reset and other control signals
The choice of reset affects the performance, area, and power of the design. Circuit initialization at power-up does not require a global reset, but it can have a significant impact on the types of resources that can be used during the design process. The shift register cannot be inferred if there is a global reset in the HDL. One shift register produces more repeatable results than ten registers.
Additionally, the DSP and block RAM registers contain only synchronous resets. If asynchronous resets are included in the code, such registers cannot be used, forcing the design to use Configurable Logic Block (CLB) registers instead. It is easier to keep the same result by putting the registers in the DSP, block RAM, or both. The use of synchronous resets in general logic reduces logic levels. Slice registers can have asynchronous or synchronous resets. If the design uses synchronous reset, then combinational logic can use synchronous set. This reduces the logic level of one LUT.
A control set consists of a unique set of clock, clock enable, set and reset signals and, in distributed RAM, writable signals. The control set information is very important because the registers must share the same control set signal packaged in the same chip. This can affect packaging and utilization, creating problems with repeatable results. For more information on resets, see the Xilinx WP272 Reset Trick: Think Locally Not Globally (http://www.xilinx.com/support/documentation/white_papers/wp272.pdf). For more information on control set, see WP309 Spartan®-6 FPGA Orientation and Redirection Guide (http://www.xilinx.com/support/documentation/white_papers/wp309.pdf).Although this white paper is specific to Spartan-6 devices, it also includes FPGAuseful general information.
understand what kind of FPGA It is critical that resources are available and when is the best time to use them. There are generally synthesis directives that define which resources are used. For example, block RAM is best suited for deep memory requirements, while distributed RAM is suitable for wide buses, especially if local clocks are clocking high-speed data. Block RAM and step RAM have some problems when the control signal has a large sector out. Repeating control signals and using floorplanning techniques to fuse blocks together with the same signals helps maintain repeatable results.
Shift registers reduce the utilization of the design, while they promote repeatability. There are some performance issues worth noting. The clock-to-output of the SRL is slower than the clock-to-output of the flip-flop; therefore, it is better to use the flip-flop as the last stage of the shift register. Most synthesis tools do this automatically, but if there is a problem with a path involving a shift register, it’s a good idea to verify that its last stage is a register.
A similar problem exists with the initial register. Setting flip-flops on the front end of the SRL gives the placer more options to meet timing requirements and maintain results. Again, most synthesis tools do this automatically, but if there is a problem with a path involving a shift register, it’s a good idea to verify that its last stage is a register.
FPGA There are many registers, allowing pipelining to play an important role in improving performance. Among them, it is important to disable multi-pipeline optimized flip-flop SRL inference. The above-referenced white paper on HDL coding practices (WP231) provides more information on block RAM. For more information on shift registers, see WP271 Cost Savings with the SRL16E (http://www.xilinx.com/support/documentation/wh ite _ papers/wp271.pdf).
clock domain problem
Designers must be careful to properly constrain paths across unrelated clock domains. Related tools automatically correlate clocks from the same source clock (eg DCM). PERIOD constraints can also relate to external clocks. Extraneous clocks that are not created internally by the device require special consideration. The system does not constrain such clocks by default. If there are special timing considerations, the designer must properly constrain the relevant paths with FROM:TO constraints. The DATAPATHONLY keyword instructs the tool to not include the clock offset in the equation.
For more information, see UG625 Xilinx Constraints Guide (http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/cgd.pdf ) or WP257 What Are PERIOD Constraints? ” (http://www.xilinx.com/support/documentation/white_papers/wp257.pdf) in the “Asynchronous Clock Domain” chapter.
Another key point is to ensure that race conditions do not occur. FIFOs can be used when crossing from one realm to another. Otherwise, the designer needs to doubly synchronize one (only one) control signal and use it to receive the other signal in the receive clock domain.
high fanout signal
High fanout signals are often the deciding factor in the design. Even though most synthesis tools support fan-out control, it is wise to multiplex these signals in the HDL for more repeatable results. Designers should combine this strategy with directives to ensure that synthesis tools do not remove these duplicate signals. If a high fan-out signal is at the top level of logic, it can be multiplexed and then drive each top block with a separate signal.
If the synthesis tool fanout control does not give the expected results and the HDL cannot be modified, using register multiplexing constraints in the MAP logic of the BRAM along with the highest fanout constraints will often yield better register multiplexing choices than synthesis. For more information, see MAX_FANOUT in the Constraints Guide (UG625). As a common debugging problem, keeping the signal names consistent across hierarchies makes it easier to trace the path to the problem. It can be difficult to track timing reports and other debug output if the signal names change frequently. It is also helpful to put the signal direction in the port definitions for all blocks or entities.
Synthesis can have a huge impact on reproducible results. If the optimal output netlist is not obtained from the synthesis process, ideal conditions cannot be produced in the implementation tool. Designers can employ a variety of comprehensive techniques to help improve implementation outcomes.
It is important to use timing constraints when performing synthesis. Users tend to over-constrain during synthesis and then relax timing constraints in Xilinx implementation tools. This can increase the burden on the synthesis tool, thereby reducing the burden on the implementation tool.
Next use the timing report generated by the synthesis tool. If a path cannot meet timing requirements during synthesis and implementation, the HDL or synthesis options can be modified to meet timing requirements after synthesis. This saves time during the implementation phase.
Getting repeatable results during synthesis is the best way to get repeatable results in an implementation tool. Most synthesis tools support a bottom-up flow, which establishes independent synthesis projects for the top-level of the design and each of the lower-level modules. Users can control updating the netlist based on HDL changes. Most commercial synthesis tools have incremental flows.
The importance of floorplanning
Floorplanning locates components in a specific location or area in the design. This reduces layout variation and improves design repeatability. Higher performance can often be achieved by floorplanning or by applying location constraints (or both).
That is, poor floorplanning or location constraints can cause timing requirements to not be met. Floorplanning is technical and requires advanced knowledge of tools and design. You can use timing-compliant implementation results as a guideline to create an ideal layout.
If the motherboard needs to choose the pin layout
major factor,FPGA Implementation tools may have difficulty obtaining repeatable results for timing maintenance. But there are a number of techniques that designers can use to help achieve repeatability.
The first thing to do is to be clear about the data flow. For example, data would flow from central I/O to side I/O.All pins associated with the bus can be kept at FPGA in the same area to limit the routing distance of control signals. The I/O bus control signals are placed adjacent to the associated address and data buses. Signals that need to be optimized together should be placed together.If the motherboard layout is more of a concern, pipelining the registers on the I/O can help to improve the poor pin layout. FPGA wiring.
Area group floor plan
Area group floorplanning is an advanced floorplanning technique that definesFPGA position within. While this technique is easy to use, it is often misused, resulting in a poor floor plan that solves more problems than it creates. There are some general guidelines for good floorplanning that can help you avoid these pitfalls. All zone groups should be kept at a similar utilization rate. For example, one should avoid having one grouping at 60% utilization and another at 99%. Do not overlap zone groups. The only exception is if two different area groups have some logic cells that need to be laid out together, then overlapping one or two rows or columns of the CLB is allowed. At this point it is the user’s responsibility to ensure sufficient resources are provided for the two zonegroup constraints.
If two different logical parts of the design need to be placed in the same physical location, they should be placed in the same zone group. Generally, one level of nesting is allowed, that is, a sub-area group within a parent area group. The above stratification is required if a small part of a large area group needs to be placed in a narrow area. It is important to floorplan only the critical parts of the design and let the relevant tools determine the placement of non-critical logic. Logic connected to fixed resources such as I/O, transceivers, or processor blocks may benefit from floorplanning. The results of a smooth implementation can be used as a guideline to identify placement or timing issues. Tools such as Xilinx PlanAhead™ software (Figure 1) and Timing Analyzer help to visualize related issues.
It is generally beneficial to minimize the number of regions used by each global clock and the number of clocks (regional and global) in each region. If you’re going to add more logic to a clock domain, don’t over-constrain, but plan accordingly. If the clocks of a clock domain are all in use, it can be difficult to find an efficient placement. The Aligned Clock Domain feature provided by PlanAhead software simplifies the above floorplanning. For Virtex® with 10+ Clock Domains FPGA By design, the clock domains used by the current implementation are in the .map report file, along with the UCF constraints.
For more information on zone group floorplan planning, see UG632 PlanAhead User Guide (http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/Pl a nAh ead_UserGuide.pdf ) and UG633 Floorplanning Methodology Guide (http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/Floorplanning_Methodology_Guide.pdf).
Block, Block, and Path Positioning The positioning of core components such as Block RAM, FIFO, DSP, DCM, and global clocking resources often helps achieve repeatability. Its best practice is to focus on a good layout while using relevant design knowledge for floorplanning. Reports can be created using reportgen – clock_regions design.ncd. PlanAhead software can lock all layout information of critical modules. In the next round, the implementation placement remains unchanged, but routing information is not saved. For more information on PlanAhead software positioning constraints, see UG632 PlanAhead User’s Guide , UG633 Floorplanning Methodology Guide, and the Floorplanning for Design chapter of the relevant PlanAhead manual.
If locking an entire module is too affecting, a critical path can be locked in PlanAhead software. However, caution should be exercised when using this method. If a particular path is causing a major problem, it is best to fix the timing problem by modifying the HDL. No request. Xilinx SmartGuide ™ technology is another option for maintaining repeatable results and is best for designs that do not demand the highest QoR or highest utilization. If neither design preservation nor SmartGuide techniques are available for a design, the SmartXplorer or PlanAhead software strategies can be used to maintain timing.
For designs with high QoR requirements, there are advanced implementation options that can help maintain timing. Controlling utilization is often the key to maintaining repeatable results. As the size of the design increases, so does the difficulty of maintaining reproducible results. Using the same software version throughout the design phase helps achieve repeatable results.
The design preservation process in PlanAhead employs partitioning, which is the only way to verify that the relevant locations are appropriate to guarantee repeatable results. Control signals and data flow (bus alignment) need to be considered when locating these BRAM, FIFO and DSP devices. Constraints used to locate clock domains for existing designs can be found in the associated .map report file. Keeping the same clock domains prevents the placer from modifying the clock domain partitions, which would change the design’s design and should limit the use of specific timing path locations.
Various options in the implementation tool can improve repeatability. Partition-based design preservation is the best way to preserve implementations, but it is not suitable for all designs and HDL design requirements do exist. The main purpose of design preservation is to keep the block performance consistent in order to reduce the time spent in the timing closure phase. In addition, it requires users to follow good design practices as much as possible.
Partitions can hold invariant parts of a previously implemented design. If the partition netlist remains unchanged, the implementation tool can use a copy-paste process to ensure that the implementation data for that partition is preserved. By saving implementation results, partitioning allows you to implement modified parts of the design without affecting the saved parts. In Figure 2, the red module has been modified and implemented, while the remaining modules are locked in place.
In version 12.1 and future releases, the PlanAhead software and command line tool will support the design saving feature. For more information, see WP362 “Reproducible Results Based on Design Preservation” (http://www.xilinx.com/support/documentation/white_papers/wp362.pdf ) and UG748 “Hierarchical Design Methodology Guide” ( http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/Hierarchical_Design_Methodology_Guide.pdf).
SmartGuide technology uses previous implementation results as a starting point when executing implementations, and its main purpose is to reduce runtime. Guided place and route or both can be migrated to complete the routing of the design or to meet timing requirements. SmartGuide technology is best for designs that do not force QoR or utilization.
Previous versions of the tool kit provided exact guides and leveraged guides. Often times, precise bootstrapping methods have resulted in non-routable designs in the past. If accurate saves are required, the recommended process is design saves. SmartGuide technology is an alternative to lever guidance.
Designers often ask whether to use SmartGuide technology or partition technology, and the answer depends on where you are in the design flow.
SmartGuide technology works best at the end of a design when small design changes are made. Using this process, it is easy to determine whether the proposed change tasks are suitable for the relevant design. Partitioning techniques need to focus more on following good design hierarchy rules in advance. The decision to use a partition-based design preservation flow should be made at the beginning of the organization of the HDL. The exception is when the design already follows the partition hierarchy rules.
For more information, see UG748 Hierarchical Design Methodology Guide (ht tp: //www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/Hierarchical_Design_Methodology_Guide.pdf ).
Both SmartXplorer and PlanAhead software strategies are similar tools that help achieve timing closure, using different sets of implementation options to determine the results that best suit the design in question. Based on these results, you can determine which placements are likely to produce better timing results and create an ideal zone group floorplan. Different results can also indicate some kind of design problem. If the same path fails each run, the timing problem can be eliminated by modifying the HDL.
In the initial stages of design, it is best to use the default effort levels for MAP and PAR. Adopting too many advanced options at the initial stage can hide timing problems that can be easily solved by modifying the HDL. As device utilization increases, it becomes increasingly difficult for tools to reach solutions that meet timing requirements. If the default option is used, the more efficient option can be used to capture the last few picoseconds of timing late in the design flow to maintain timing results. Designs with low LUTS/FFS utilization (<25%) or their high utilization (>75%) have difficulty achieving consistent placement and routing. For designs with high utilization, attention should be paid to the slice control set signal, reset signal (FPGA Synchronous resets/sets are generally not required) and logic occupancy (which can be easily performed in PlanAhead) or modules where SRL/DSP48 inference exceeds expectations.
The opposite of high utilization is low utilization. For designs where utilization of all component types does not exceed 25%, the low utilization algorithm works and enables tight placement of components. However, if the I/O utilization exceeds 25%, the implementation tool can spread out the design to keep the logic close to the I/O. Careful placement of I/O and the use of zone groups can mitigate these issues as much as possible.
The same major software version should be used whenever possible during timing closure. Because different versions of the algorithm also change, an algorithmic approach that works for one version may not work in other cases. Also, methods obtained based on previous results (partitioning with SmartGuide technology) may not be available for major releases.
The best way to promote design repeatability is to follow good design practices in the HDL and fix any timing issues by modifying the HDL. If this is not feasible, synthesis, floorplanning and implementation techniques can be resorted to. Partition-based design preservation is a process that guarantees instance performance. SmartGuide technology is another solution that takes previously achieved results.