Login
Sign Up
Woofun AI reports that the semiconductor landscape is pivoting toward AI inference, where memory and storage have emerged as the primary bottlenecks. The industry focus has shifted to whether the GPU architecture roadmap, which relies on the exponential growth of High Bandwidth Memory (HBM), will stall, and when ChangXin's capacity expansion might alter the market dynamics. As the structure evolves, the interplay between rapid demand and rigid supply cycles defines the next decade of hardware availability.
Memory cyclicality remains potent because long production cycles frequently create mismatches between rapid capacity expansion and periods of demand scarcity. To break free from this traditional cycle, the market requires customization, structural exponential demand growth, or rapid technological iteration. HBM currently satisfies approximately two and a half of these criteria. While HBM incorporates elements of customization through packaging and base dies, the DRAM layers remain standardized under JEDEC specifications. This standardization allows for some capacity reallocation, such as Samsung supplying HBM3E to Google and AMD after failing NVIDIA qualification.
However, post-HBM4, customization is expected to increase significantly with the integration of custom logic and cache on base dies. The primary driver remains the hardware upgrade demand for Nvidia's token throughput, where token throughput equals HBM size multiplied by HBM bandwidth. HBM size per GPU grows over 40% annually, a rate that far outpaces DRAM supply-side wafer growth of 14% and density improvement of 9%.
HBM holds a unique status in the market due to the high-bandwidth requirements for the attention stage KV cache. Alternative paths, including SRAM, HBF, CXL, and PIM, cannot compete in the main HBM track for at least five years. HBM upgrades occur every two years, doubling both speed and size, which renders older generations economically impractical for new deployments. This dynamic shifts manufacturer competition from a race for quantity to a contest of stability and speed, specifically regarding qualification share on NVIDIA platforms. Consequently, manufacturers avoid traditional prisoner's dilemmas that often plague commodity markets. The source of memory cyclicity is the long supply cycle, which takes three years to build a fab, misaligned with unstable demand patterns.
However, structural exponential growth in AI dampens this cyclicality.Additionally, DRAM bit density growth per wafer has slowed from 45% in the year 2000 to 9% currently, meaning expansion now relies more on new fabs than density improvements. HBM production is increasingly difficult relative to DRAM bits, with HBM3e requiring approximately three times the DRAM wafers and HBM4 requiring four times, creating a deflationary trend in bits per wafer.
The sustainability of HBM growth depends entirely on the persistence of the Transformer architecture and KV cache mechanisms. Attention mechanisms are likely to remain as primitive operations for dynamic routing, ensuring HBM remains central to inference cores. DRAM is also undergoing structural changes. While not customized, it benefits from structural exponential growth driven by agentic CPUs. The CPU-to-GPU ratio is shifting from 1:4 to 1:2 or even 1:1. In agentic flows, CPU processing delays are significant bottlenecks, requiring synchronous capacity expansion. AI coding increases code volume and API calls, exponentially increasing CPU hours. Sandboxes for data security replicate databases, wasting memory and CPU cores. Consequently, CPU TAM forecasts have been repeatedly raised, with AMD, ARM, Nvidia, and Bernstein projecting $60B to $223B by 2030, potentially reaching $400B by 2031.
Agent tasks are stateful, keeping message history and context in DRAM. Context windows are expanding from 32K to 1M, significantly increasing memory footprint per session. By 2030, with a conservative $300B CPU TAM and 16GB per core, incremental DRAM demand could reach 96EB, far exceeding current global production of approximately 47-60EB. This creates a significant supply-demand gap. DRAM technological iteration is accelerating due to server and edge AI demands, such as Apple's LPDDR needs. DDR6 and LPDDR6 adoption is eager due to tangible performance gains.
Furthermore, HBM production diverts wafers from commodity DDR, known as the 'HBM bit tax', reducing non-HBM DDR bit growth to approximately 20% annually.
Woofun AI data shows that ChangXin Memory's expansion, reaching 500k wafers per month by 2028, has limited impact on the global industry due to lower bit density, which is roughly 50% of the top three manufacturers. Its impact on DRAM bit capacity CAGR is estimated at only 1.5%, shifting industry CAGR from 12.7% to 14.2%. ChangXin is also constrained by lithography limitations for high-speed DDR6.
The DRAM supply-demand gap will widen through 2030. Demand growth, estimated at a 50% CAGR based on CPU TAM projections, far exceeds supply growth of approximately 20%. A 'reservoir' of suppressed demand, including edge AI, consumer electronics, and low-value tasks, ensures prices do not collapse even if supply catches up temporarily. HBM and DRAM price floors are linked; if DRAM margins drop, HBM profitability drives further capacity conversion, safeguarding DRAM prices. NAND SSDs also face structural growth from AI applications, including KV cache offloading, AI video generation, and sandbox usage. Despite weaker structural momentum than HBM, NAND is cost-effective, projected at $0.8 per GB by 2027 compared to DRAM. The shortage is severe due to disciplined production by major players. This round of NAND growth represents a supercycle, delaying downturns until 2030. The convergence of these factors suggests a prolonged period of tight supply across the memory sector.