November 12, 2025

Agentic systems are just query engines for unstructured data

A systems engineer’s view of the new AI stack

by Jay Chia

“Imperative” and “declarative” are paradigms we use to describe programming languages. An imperative language requires you to lay out the steps; a declarative language doesn’t.

A declarative language allows developers to declare the intended results – the “what” – while allowing the underlying engine to determine the actual steps to execute – the “how”.

I am struck by how familiar this pattern feels for working with AI systems today. In modern LLMs and agentic systems, the user declares the intended output by supplying context and a prompt. Perhaps LLM prompts, like SQL, are an equivalent shift towards the declarative model for operating over unstructured and multimodal data.

SQL for unstructured data

The textbook example of a declarative language is SQL, a language that declares the transformations that a programmer wants to perform on tabular data, such as JOIN, SUM, and SELECT. That shift from imperative to declarative paradigms mirrors what’s happening with modern LLMs today.

An example query/result pair might look like this:
Query
Given the PDF available at https://arxiv.org/pdf/1706.03762 retrieve the list of authors as a JSON list of {author_name: institution}.
Result
{“Ashish Vaswani”: “Google Brain”, “Noam Shazeer”: “Google Brain”, …}

The analogy feels unnatural because both the input data and the result sets are unstructured (often images, videos, or documents). But there are more parallels here between a tabular data system and an LLM than one might first expect!

Think of how both operate by declaration. SQL leads from a SQL string to a static logical plan to a result. LLMs lead from prompt instructions to a plan (including tool calls, code sandboxes, and inferences) to a result. The path is different, but the declarative nature is similar.

Think, too, of the primitives. SQL provides the primitives for working on relations. These primitives are composed into a Logical Plan by the SQL planner:

•
Projections
•
Filters
•
Scans
•
Joins
•
Sorts

The primitives of AI agentic systems, in rough parallel, allow for working with “documents” (I use this term very loosely here to include everything from JSONs to PDFs and video). These primitives are composed into a workflow by a reasoning loop:

Prompt

•
Summarize/structured extraction
•
Score/evaluation
•
Running a code sandbox

OCR

•
Chunking/embedding
•
Performing arithmetic
•
Tool Calling

File downloads

Tool Calling

In the same way that a SQL compiler builds a relational plan to support and perform queries through these primitives, LLMs need a query planner that uses their parallel primitives to process the amounts and types of data LLMs need. Otherwise, your GPUs are really just sitting idle and lighting a bunch of money on fire while your machine is busy with file downloads or running Python for loops.

The SQL analogy helps explain the shift from imperative to declarative. But here’s where we leave it behind: Modern AI systems already behave like query planners because of reasoning.

Models as query engines

Models like LLMs might expose their functionalities via APIs, but these models are not merely LLM prompt endpoints any longer. These models — here, we’re talking about GPT-4o, Gemini, Claude 3.5, and similarly advanced ones — are execution engines.

The APIs they expose hide entire workflows — graphs of the decisions made to fulfill your requests. “Inference” doesn’t really cover it. They reason, decompose, and execute.

Think about what happens behind the scenes when you ask a question that requires referencing documentation. A simple prompt leads the LLM to download a PDF (maybe with curl), which triggers OCR that parses the text and triggers more complicated reasoning over the text.

At every step, the LLM operates as an engine, not as an endpoint. It makes decisions, uses tools, builds plans, and iterates those plans as it works.

The model names are familiar – GPT, Claude, etc. – but the shift from endpoint to execution engine is a paradigm shift. The model endpoint isn’t really a function anymore. It’s more like an orchestrator.

Why closed models are currently winning

Back in 2023, a Google researcher’s leaked memo claimed, "We have no moat, and neither does OpenAI.” The researcher described the competition between model developers as an “arms race,” one they would inevitably lose to open-source models.

“Who would pay for a Google product with usage restrictions if there is a free, high-quality alternative without them?”, they asked.

Fast forward to 2025, and the question seems foolish.

ChatGPT is still by far the most popular AI chatbot

(Source)

ChatGPT and Gemini have never stopped growing, even amid the rise of open-source models, and ChatGPT receives around 9 times as much as the open-source model DeepSeek.

Open source models are not yet poised to replace closed models, but the researcher’s diagnosis was right, even if the prediction wasn’t: Model quality in isolation is not a moat.

LLMs shifted from endpoints to execution engines and from engines to ecosystems of apps and tools. The primitives have expanded in parallel. Now, instead of just Prompt, Embed, Chunk, OCR, Download File, etc., LLMs can search the web, retrieve a Notion page from an internal wiki, search through a Google Drive for specific documents, and more.

As a result, the nature of competition between models shifted, too. Now, the moat that surrounds a closed model isn’t raw capability, but workflow integration. The orchestration, reasoning, and tool calling are what actually make them magical.

Raw benchmarks describe meaningful capability differences, with little bearing on market dominance. They’re the megapixel count on the latest iPhone: the new big number makes it into the ad, but the surrounding system is what’s important.

Closed models secretly own the runtime: the “compiler” (reasoning), the “CPU” (LLM), the “memory” (filesystem), and the “VMs” (sandboxed execution environment). Open models today really expose only the raw operator, the forward pass. Beneath the surface, modern, closed models are infrastructure bundles that hide complexity.

This is the reason why open models are lagging in the race. A true, open-source operating system for AI hasn’t yet been built; everyone is still competing over who can build the best-performing CPU.

The open query engine for data

My cofounder, Sammy, and I spent years in computer vision research, building self-driving car systems and designing data platforms. At some level, all of these efforts hit the same wall: it’s incredibly difficult to build workloads over petabytes of this data declaratively. Instead, we’ve always resorted to imperatively training and running custom models and C++ code to perform very specific processing tasks over very specific hardware.

In contrast, AI exposes a declarative interface in natural language and has access to all the tools required to actually run the required processing (ffmpeg, Python, VLMs, etc.).

Today, unfortunately, this is all currently hidden behind the complexity of private model endpoints. No, really – try running multimodal workloads on any of the open source providers. You’ll start running into all kinds of problems with pixel padding, image decoding, and URL downloads from external systems. It’s an order of magnitude more expensive and difficult to do work on anything that isn’t just a string of text tokens.

Stacks, such as vLLM, SGLang and HuggingFace, are primitives. They’re building blocks for more complex solutions, not the solutions themselves. This is why I’m so excited to be building a query engine for data, a layer that does for AI what Apache Spark did for big data analytics – making it efficient and scalable.

Much like SQL, this engine will transform instructive work into declarative work. A declarative pipeline will allow us to optimize execution, delivering better performance at larger scales.

AI is a systems engineering problem

SQL unlocked structured data. Before SQL, working with data was painful and slow – too difficult to allow the use cases that would come later.

A data analyst cares about joining Table A and B, not about the specialized sort-merge join algorithm that does it for them. Similarly, an AI engineer doesn’t really care that the system wrote efficient code to crawl a webpage in parallel. They just care about asking a question over a website.

How do we get there?

1.
Build better primitives: models writing and executing code is an increasingly important primitive (see: Anthropic’s Code Execution with MCP blogpost) but there will be many more
2.
Compose them into plans and workflows: Build towards agentic abstractions (workflows vs agents), not just LLM inference endpoints
3.
Build better systems: Execution engines that can execute these workflows with maximum efficiency (there has been a lot of talk recently about Durable Workflows – that feels like the right abstraction for resilience, but the wrong one for efficiency)

This feels a lot like building a query engine. Systems engineering is so back, and I couldn’t be more excited for it. If you want to learn more about what we are working on, please check out our documentation.