FPGA • Systems • Low-Latency Computing
Integrated M.Tech student (graduating 2027). Building high-performance hardware-software co-designs through FPGA/HLS implementation, systems performance analysis, and operating systems optimization. Empirical problem-solver focused on latency-critical computing.
Kernel-level teardown of TCP, REST, and gRPC on Ubuntu 24.04. 9.9M latency measurements across 75 experiments. Exposing 121× tail amplification via page fault step function (traced to Go allocator constant _MaxSmallSize = 32768).
128-bit AXI-Stream dual-implementation: Table-based (0.41µs, 13.6× speedup) via BRAM O(1) lookups vs. algorithmic approach. Bypassed 3GPP bottlenecks through compile-time optimization. Verified against test vectors.
Recognized 3GPP permutation as compile-time resolvable chunk-level wire assignments. Achieved 2–5 cycles latency, 166 LUTs, 0 DSP/BRAM, ~540MHz Fmax. Demonstrates spec-to-silicon optimization depth.
Hardware implementation of bit interleaving for PDSCH channels. Navigated Vivado RTL export bugs and LUT over-utilization. Latency: 18.8µs • Score: 9.5/10
9×4 SRAM array for CNN inference via systolic topology. Novel in-memory compute architecture with tiling-based scalability methods. Status: Reviewer feedback → post-layout results in progress
Interested in FPGA design, systems optimization, low-latency computing, or algorithmic engineering? Let's connect.