← back [export_pdf]

Triangular Interleaver (5G-NR)

Hardware acceleration of physical layer (L1) error correction mapping

FPGA_ACCELERATED Verilog HLS Spartan-7 BRAM 3GPP_TS_38.212 TIMING_MET

[tldr]

→ Engineered a 5G-NR PUCCH channel interleaver meeting strict 3GPP timing requirements.
→ Optimized coordinate resolution by reducing the overhead of quadratic indices calculation.
→ Final architecture achieves high throughput natively via optimized data-path alignment.
→ Achieved sub-microsecond latency at 100MHz — significantly exceeding standard baseband constraints.

<1.00µs Optimized Latency

Verified 3GPP Compliance

Lean Logic Utilization

100 MHz Core Clock

[context] what is the triangular interleaver?

In 5G-NR communications, interleaving protects data against burst errors. The 3GPP specification dictates writing data row-wise into an isosceles triangle and reading it column-wise.

[SOFTWARE CONCEPT] IN --> [ D0 D1 D2 D3 ] [ D4 D5 D6 ] [ D7 D8 ] [ D9 ] | OUT <-- [ D0 D4 D7 D9 D1 D5 D8 D2 D6 D3 ]

[architecture_evolution]

The optimization path focused on aligning 3GPP algorithms with FPGA hardware primitives for maximum concurrency.

Phase 1: The 3GPP Spec (Math Bottleneck)

The Naive Approach: Standard algorithms require determining a side length $T$ such that T(T+1)/2 ≥ E, then executing nested loops that dynamically skip null bits.

The Bottleneck: Dynamic coordinate tracking is hard to do on hardware. Direct synthesis would generate high logic depth and multi-microsecond computation stalls. This approach was rejected during the design phase.

Phase 2: The Algorithmic Baseline (FSM Bottleneck)

The Optimization: We designed a baseline that precomputed structural offsets to avoid dynamic division at runtime. The data path used a state machine to track coordinates and resolve destination indices sequentially.

The Limit: While mathematically efficient, the state machine created a sequential data dependency. Processing bits one-by-one limited the effective throughput of the AXI4-Stream bus.

Phase 3: Final Optimized Architecture

Further Optimizations: By analyzing the deterministic nature of the triangle for specific block sizes, we restructured the data path to support concurrent bit-routing. The sequential index calculation was replaced with a parallel architecture.

The Result: The FPGA now resolves multiple bit positions simultaneously, achieving higher throughput and better performance, significantly lower than traditional iterative approaches.

[synthesis_results]

Architecture Phase	Clock Target	Complexity	Resource Usage	Timing Status	Latency
Phase 1 (Naive)	-	High	-	-	Rejected
Phase 2 (Baseline)	100 MHz	Moderate	Low	MET	Few Microseconds(5+)
Phase 3 (Parallel)	100 MHz	Optimized	Efficient	MET	Sub-µs

→ Constraints: Target was standard-compliant latency and minimal resource utilization. Phase 3 provides significant headroom for real-time 5G baseband processing.

[engineering_deep_dives]

Trade-offs: Logic Area vs. Latency

The final architecture balances hardware footprint with critical execution speed. By focusing on parallel data routing, we minimize the value of nanoseconds lost in the radio path. This approach also offloads the mathematical heavy-lifting from DSP slices, freeing them for other physical layer tasks.

Verification Methodology

3GPP Standard Validation

Design correctness was verified against the standard test cases defined by 3GPP specifications. The Verilog RTL output was bit-matched through C/RTL Co-simulation, covering diverse payload sizes with zero margin for error across all triangular null-skipping conditions.