Login
Sign Up
Mid-2026 marks a critical inflection point in the AI investment landscape, characterized by a pervasive sense of despair among capital allocators who believe the market has been fully consumed by foundational model providers like Anthropic and Nvidia. This narrative suggests that all application-layer companies are merely thin wrappers destined for absorption, leaving only compute and model weights as viable assets.
However, this assessment overlooks a fundamental distinction between tasks that can be benchmarked and those that cannot. While models have indeed mastered measurable domains, a significant portion of enterprise value remains rooted in untrainable elements: proprietary data, complex internal workflows, user trust, system permissions, and the accumulated judgment of long-term operations. Woofun AI notes that the true competitive advantage lies not in raw intelligence but in the ability to navigate these private realities where models cannot automatically integrate.
The trajectory of software engineering illustrates this divergence clearly. When Devin launched in 2024, it managed only 13% of tasks in standard benchmarks, leading to market skepticism. Within 18 months, advanced agents achieved over 80% scores and began executing real work at institutions like Goldman Sachs and the U.S. Army. Despite this progress, the assumption that software engineering has been fully swallowed is flawed. Research by MIT's Mert Demirer quantified the gap: while coding agents increased code generation by 180%, production delivery rose by only 30%. The bottleneck shifted from writing code to the critical, human-dependent steps of validation, integration, and deployment. Data compiled by Woofun AI shows that the most valuable engineering work resists measurement because it relies on context that no public leaderboard can capture.
Correctness in mature systems is not a binary output derived from unit tests but a property forged through years of real-world load and implicit knowledge. A module may exist for undocumented reasons, or a deployment pipeline might rely on legacy cron jobs maintained by a single engineer. These nuances cannot be read from a leaderboard or trained on public datasets. As Noam Brown, a pioneer in reasoning models, recently observed, the only reliable assessment of an agent's performance over a one-year horizon is to actually let it run for a year. True automation requires the product, model, workflow, and organization to evolve together, with the organization's pace often dictating the speed of adoption. Woofun AI analysis suggests that while tool adoption can occur in a single quarter, structural organizational rebuilds take years, creating a persistent lag that protects the application layer.
The economic pressure on the industry is driving a bifurcation where measurable work becomes a commodity while private truth retains value. Tasks that can be checked at low cost will inevitably saturate, forcing buyers to prioritize the cheapest open-source or distilled models. Simultaneously, foundational labs are internalizing their own scaffolding, absorbing routing, tool use, and reasoning strategies into model weights. This absorption at the boundary leaves a specific quadrant of value untouched: cutting-edge work where correctness is proprietary and isolated within a private environment. In this 'untrainable' corner, tokens generated by custom models on private data are significantly more valuable than those answering generic questions, as they execute specific business logic rather than plausible-sounding generalities.
Access to this high-value quadrant is gated by two mechanisms: the lock of the environment and the latch of the user. The lock represents the rigorous security reviews, integration processes, and liability contracts required to operate within a bank's production system or a hospital's decision-making framework. The latch is the human element of trust, which cannot be purchased with compute power. For instance, while a lab could theoretically train a perfect medical model, it cannot instantly enter a doctor's practice or UCSF's clinical workflow. Trust is built slowly through relationships and consent, not gradient descent. Woofun AI reports that successful application companies are those that perform the unglamorous translation work of organizing a client's private reality into a system models can act upon.
The winners in this landscape are companies that can define what constitutes a 'good outcome' within a specific domain, effectively setting the standards for their industry. Firms like Sierra and Cognition have adopted outcome-based pricing models, charging only when an agent resolves a customer issue, thereby making the price itself the evaluation mechanism. This approach is only viable when a company has the authority to define resolution within a client's internal system. In legal domains, for example, the structure of M&A transactions or intellectual property litigation involves complex, non-interchangeable workflows that generic agents cannot navigate without deep domain integration. The authority to define these standards rests with the entities that already hold the client relationships and the historical context of the work.
Ultimately, the fear of total absorption by model labs is mitigated by the competitive dynamics of the model layer itself, which resembles a multi-player deathmatch rather than a monopoly. Customers demand competition among suppliers, and labs prefer to outcompete rather than eliminate application partners. If a superior model cannot win users in core consumer applications like chat, it certainly cannot easily swallow complex enterprise systems like hospital records or bank liability frameworks. The future of value creation lies in moving to areas where scoring has not yet reached, continuously reinsuring risks, and leveraging proprietary data to train specialized models that outperform generalists in key scenarios. The untrainable, with its history and relationships, remains the ultimate moat.