
Prompting with DataFrames: Massively Parallel LLM Generation is Here
Structure and scale LLM inference with prompt. A new function for Daft dataframes.
by Everett KlevenTLDR
We're introducing a new way to work with language models at scale. Daft's prompt function brings massively parallel text generation to DataFrames, making it trivial to run LLM operations over thousands or millions of rows with automatic batching and parallelization. Whether you're doing synthetic data generation, knowledge extraction, or batch tool-calling, you can now scale these workloads efficiently without wrestling with API rate limits or building complex batching logic.
prompt
1import daft2from daft.functions import prompt34# read some sample data5df = daft.from_pydict({6 "input": [7 "What sound does a cat make?",8 "What sound does a dog make?",9 "What sound does a cow make?",10 "What does the fox say?",11 ]12})1314# generate responses using a chat model15df = df.with_column("response", prompt(daft.col("input"), model="gpt-5.1"))1617df.show()
With Daft’s prompt function, prompting becomes a first‑class DataFrame operation. Instead of building bespoke pipelines, you compose expressions: pass text, files, or images as columns; emit structured outputs as nested types; and let Daft take care of automatic batching, parallelization, retries, caching, and provider abstraction. Whether you’re generating synthetic data, extracting knowledge from PDFs, calling tools at scale, or steering reasoning models, you can do it across thousands—or millions—of rows using the same mental model you already use for analytics.
The Problem: Prompt Engineering Doesn't Scale
If you've worked with LLMs in production, you've likely faced these challenges:
- •
Manual batching logic - Writing custom code to batch requests, handle rate limits, and retry failed calls
- •
Inefficient resource utilization - GPUs sitting idle between batches or during data preprocessing
- •
Context management complexity - Trying to pass images, PDFs, and structured data to models without a unified interface
- •
Data Accessibility - Curating context you want doesn't just live on your laptop, it also lives in cloud storage buckets, databases, and repositories.
- •
Poor cache utilization - Missing opportunities to reuse computation when prompts share common prefixes
These inconveniences translate directly to higher costs and slower iteration cycles. A data scientist shouldn't need to become a distributed systems engineer just to run a model over a dataset.
The Solution: Prompt as a DataFrame Operation
Daft's prompt function treats prompting as a first-class DataFrame operation. This seemingly simple abstraction unlocks powerful capabilities:
1. Automatic Parallelization
Just like how you don't think about parallelizing a SELECT statement in SQL, you shouldn't have to think about parallelizing LLM calls. Daft handles this automatically:
1import daft2from daft.functions import prompt34# Read any data source5df = daft.read_parquet("s3://my-bucket/customer-reviews/*.parquet")67# Generate sentiment analysis at scale8df = df.with_column(9 "sentiment",10 prompt(11 daft.col("review_text"),12 system_message="Classify the sentiment as positive, negative, or neutral.",13 model="gpt-5-mini"14 )15)1617# Daft automatically batches and parallelizes across your cluster18df.write_parquet("s3://my-bucket/analyzed-reviews/")
2. Native Multimodal Support
Daft is purpose-built for multimodal AI workloads. Pass text, images, and files as a list of expressions with no manual message building required:
1import daft2from daft.functions import prompt, download34# Load images and text together5df = daft.from_glob_path("s3://my-bucket/product-images/*.jpg")6df = df.with_column("image", decode_image(daft.col("path").download()))78# Prompt with both text and images9df = df.with_column(10 "description",11 prompt(12 messages=[13 daft.lit("Describe this product image in detail:"),14 daft.col("image")15 ],16 model="gpt-5"17 )18)
This works seamlessly with PDFs, Markdown files, HTML, CSV, and any other file type:
1import daft2from daft.functions import prompt, file34df = daft.from_glob_path("hf://datasets/Eventual-Inc/sample-files/papers/*.pdf")5df = df.with_column("file", file(daft.col("path")))67df = df.with_column(8 "summary",9 prompt(10 messages=[daft.lit("Summarize this research paper:"), daft.col("file")],11 model="gpt-5-nano"12 )13)
3. Structured Outputs with Native Struct Support
Daft natively converts Pydantic models to DataFrame columns with nested struct types. This makes structured generation feel natural in a vectorized context:
1import daft2from daft import prompt, unnest3from pydantic import BaseModel, Field45class Anime(BaseModel):6 show: str = Field(description="The name of the anime show")7 character: str = Field(description="The name of the character")8 explanation: str = Field(description="Why the character says the quote")910df = daft.from_pydict({11 "quote": [12 "I am going to be the king of the pirates!",13 "I'm going to be the next Hokage!",14 ]15})1617df = df.with_column(18 "classification",19 prompt(20 daft.col("quote"),21 system_message="Classify the anime based on the quote.",22 return_format=Anime,23 model="gpt-5-nano",24 )25).select("quote", unnest(daft.col("classification")))2627df.show(format="fancy", max_width=80)

4. Flexible Template Composition
Build dynamic prompts using Daft's format function or user-defined functions:
1from daft.functions import prompt, format23def answer_in_language(language: str, column_name: str) -> daft.Expression:4 return format(5 "Answer the following question in {}: {}",6 daft.lit(language),7 daft.col(column_name)8 )910df = daft.from_pydict({11 "question": [12 "What is the capital of France?",13 "Who invented the telephone?",14 ]15})1617df = df.with_column(18 "spanish_answer",19 prompt(20 answer_in_language("Spanish", "question"),21 model="gpt-5"22 )23)
Or use row-wise @daft.func() user-defined functions for more complex logic:
1@daft.func2def build_prompt(context: str, question: str, max_words: int) -> str:3 return f"""4Context: {context}56Question: {question}78Answer in at most {max_words} words.9"""1011df = df.with_column(12 "answer",13 prompt(14 build_prompt(daft.col("context"), daft.col("question"), daft.lit(50)),15 model="gpt-5"16 )17)
Swap Providers Without Changing Code
One of the most powerful aspects of Daft's AI functions is the provider abstraction. The same prompt call works across OpenAI, local models, vLLM servers, and more, just change the provider:
OpenAI (Default)
1import daft2from daft.functions import prompt34# OpenAI is the default - just set your API key5df = df.with_column("result", prompt(daft.col("input"), model="gpt-5"))
OpenAI-Compatible Providers (OpenRouter, etc.)
1import os2import daft3from daft.functions import prompt45daft.set_provider(6 "openai",7 base_url="https://openrouter.ai/api/v1",8 api_key=os.environ.get("OPENROUTER_API_KEY")9)1011df = df.with_column(12 "result",13 prompt(daft.col("input"), model="nvidia/nemotron-nano-9b-v2:free")14)
Local Models with LM Studio
12df = df.with_column(3 "result",4 prompt(5 daft.col("input"),6 model="google/gemma-3-4b",7 provider="lm_studio"8 )9)
vLLM Online Serving
1daft.set_provider(2 "openai",3 api_key="none",4 base_url="http://localhost:8000/v1"5)67df = df.with_column(8 "result",9 prompt(daft.col("input"), model="google/gemma-3-4b-it")10)
vLLM Offline Serving w/ Prefix Caching (Beta)
For batch inference workloads, our new vLLM Prefix Caching provider can cut inference time in half:
1df = df.with_column(2 "result",3 prompt(4 daft.col("input"),5 provider="vllm-prefix-caching",6 model="Qwen/Qwen-8B"7 )8)
This provider implements Dynamic Prefix Bucketing and Streaming-Based Continuous Batching to maximize GPU utilization and cache hit rates.
Advanced Features
Tool Calling at Scale
Use OpenAI's built-in tools like web search, or define custom functions for agentic workflows:
1import daft2from daft.functions import prompt34df = daft.from_pydict({5 "query": ["Buy one get one free burritos in SF right now."]6})78df = df.with_column(9 "search_results",10 prompt(11 daft.col("query"),12 model="gpt-5",13 tools=[{"type": "web_search"}]14 )15)
Reasoning Models
Control compute allocation with the `reasoning` parameter for GPT-5 and other reasoning-capable models:
1df = df.with_column(2 "deep_analysis",3 prompt(4 daft.col("complex_question"),5 model="gpt-5.1",6 reasoning={"effort": "high"},7 )8)
Working with Different OpenAI APIs
Daft supports both the new Responses API (default for GPT-5) and the Chat Completions API:
1# Responses API (default for GPT-5)2df = df.with_column(3 "result",4 prompt(5 daft.col("input"),6 model="gpt-5",7 max_output_tokens=200 # Note: max_output_tokens, not max_tokens8 )9)1011# Chat Completions API (for GPT-4.1 and earlier)12df = df.with_column(13 "result",14 prompt(15 daft.col("input"),16 model="gpt-4.1-2025-04-14",17 max_tokens=200, # Note: max_tokens18 temperature=0.7,19 use_chat_completions=True20 )21)
Structured Outputs with vLLM Online Serving
vLLM's guided decoding for classification and regex-constrained generation works seamlessly:
1# Classification with guided choice2df = df.with_column(3 "sentiment",4 prompt(5 daft.col("review"),6 model="google/gemma-3-4b-it",7 extra_body={"structured_outputs": {"choice": ["positive", "negative"]}}8 )9)1011# Regex-constrained generation12df = df.with_column(13 "email",14 prompt(15 format("Generate an email for {}", daft.col("name")),16 model="google/gemma-3-4b-it",17 extra_body={"structured_outputs": {"regex": r"\w+@\w+\.com\n"}}18 )19)
Conclusion
With massively parallel LLM generation now possible thanks to Daft's prompt function, you can scale LLM operations from a single row to millions without changing your code. Whether you're generating synthetic data, building document intelligence pipelines, or running batch inference, Daft provides the foundation for production-grade AI workloads.
The future of prompt engineering is declarative, scalable, and multimodal.
Try it out and let us know what you build!