Triangular Interleaver (5G-NR)

Hardware acceleration of physical layer (L1) error correction mapping
FPGA_ACCELERATED Verilog HLS Spartan-7 BRAM 3GPP_TS_38.212 TIMING_MET
[tldr]
<1.00µs Optimized Latency
Verified 3GPP Compliance
Lean Logic Utilization
100 MHz Core Clock
[context] what is the triangular interleaver?

In 5G-NR communications, interleaving protects data against burst errors. The 3GPP specification dictates writing data row-wise into an isosceles triangle and reading it column-wise.

[SOFTWARE CONCEPT] IN --> [ D0 D1 D2 D3 ] [ D4 D5 D6 ] [ D7 D8 ] [ D9 ] | OUT <-- [ D0 D4 D7 D9 D1 D5 D8 D2 D6 D3 ]
[architecture_evolution]

The optimization path focused on aligning 3GPP algorithms with FPGA hardware primitives for maximum concurrency.

Phase 1: The 3GPP Spec (Math Bottleneck)

The Naive Approach: Standard algorithms require determining a side length $T$ such that T(T+1)/2 ≥ E, then executing nested loops that dynamically skip null bits.

The Bottleneck: Dynamic coordinate tracking is hard to do on hardware. Direct synthesis would generate high logic depth and multi-microsecond computation stalls. This approach was rejected during the design phase.

Phase 2: The Algorithmic Baseline (FSM Bottleneck)

The Optimization: We designed a baseline that precomputed structural offsets to avoid dynamic division at runtime. The data path used a state machine to track coordinates and resolve destination indices sequentially.

The Limit: While mathematically efficient, the state machine created a sequential data dependency. Processing bits one-by-one limited the effective throughput of the AXI4-Stream bus.

Phase 3: Final Optimized Architecture

Further Optimizations: By analyzing the deterministic nature of the triangle for specific block sizes, we restructured the data path to support concurrent bit-routing. The sequential index calculation was replaced with a parallel architecture.

The Result: The FPGA now resolves multiple bit positions simultaneously, achieving higher throughput and better performance, significantly lower than traditional iterative approaches.

[synthesis_results]
Architecture Phase Clock Target Complexity Resource Usage Timing Status Latency
Phase 1 (Naive) - High - - Rejected
Phase 2 (Baseline) 100 MHz Moderate Low MET Few Microseconds(5+)
Phase 3 (Parallel) 100 MHz Optimized Efficient MET Sub-µs

Constraints: Target was standard-compliant latency and minimal resource utilization. Phase 3 provides significant headroom for real-time 5G baseband processing.

[engineering_deep_dives]
Trade-offs: Logic Area vs. Latency

The final architecture balances hardware footprint with critical execution speed. By focusing on parallel data routing, we minimize the value of nanoseconds lost in the radio path. This approach also offloads the mathematical heavy-lifting from DSP slices, freeing them for other physical layer tasks.

Verification Methodology
3GPP Standard Validation
Design correctness was verified against the standard test cases defined by 3GPP specifications. The Verilog RTL output was bit-matched through C/RTL Co-simulation, covering diverse payload sizes with zero margin for error across all triangular null-skipping conditions.