Stateful UDFs with daft.cls: Python Classes that Scale

Last week we covered @daft.func: four patterns for turning stateless Python functions into distributed operations. Row-wise, async, generator, batch. If you missed it, that post is worth reading first, because everything here builds directly on what you learn there.

But @daft.func has a limitation: every invocation starts from scratch. There's no way to hold onto something between rows. If your function needs a database connection, an API client with a session, or any resource that's expensive to create, you're paying that cost on every call. That's where @daft.cls comes in.

@daft.cls turns a Python class into a distributed operator. Your __init__ runs once per worker. Your methods run on every row, reusing whatever __init__ set up. Decorate a class, and Daft handles instantiation, scheduling, and parallelism across your cluster. No other distributed data framework has anything like it.

1. The anatomy of a stateful UDF

A stateful UDF has four parts: imports, decorator, __init__, and a method. If you've written a Python class before, this will look familiar.

import daft
 
from transformers import pipeline
 
 
@daft.cls
class SentimentClassifier:
    def __init__(self):
        # Runs ONCE per worker -- download weights, allocate memory
        self.pipe = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
 
 
    def __call__(self, text: str) -> str:
        # Runs on EVERY ROW -- reuses the loaded model
        return self.pipe(text)[0]["label"]
 
 
classifier = SentimentClassifier()
 
 
df = daft.from_pydict({"review": [
    "This product is amazing",
    "Worst purchase I've ever made",
    "It's okay, nothing special",
]})
df = df.select(classifier(df["review"]).alias("sentiment"))
df.show()

When you write SentimentClassifier(), Daft doesn't instantiate the class immediately. During query execution, each worker calls __init__ once, then reuses that instance for every row it processes. The heavy setup of downloading weights, allocating memory, or opening connections happens once and the per-row work reuses it.

This is the same deferred execution model you saw with @daft.func in week 1's intro. Daft builds a plan, then executes it. The difference is that @daft.cls gives your operator persistent state.

2. It looks like Python because it is Python

In the stateless UDFs post, one of the key takeaways was that @daft.func requires almost no changes to your programming mindset. You write a normal Python function, add a decorator, and it scales. @daft.cls extends that same philosophy to classes.

If your function needs to hold onto something between rows, like a loaded model, a database connection pool, or an API client with authentication, you wrap it in a class. The mental model is simple: write any Python class you'd normally write, then add @daft.cls on top.

import daft
import aiohttp
 
 
@daft.cls
class APIClient:
    def __init__(self, api_key: str):
        self.api_key = api_key
 
 
    async def fetch(self, url: str) -> str:
        async with aiohttp.ClientSession() as session:
            headers = {"Authorization": f"Bearer {self.api_key}"}
            async with session.get(url, headers=headers) as resp:
                return await resp.text()
 
 
client = APIClient("sk-my-secret-key")
df = daft.from_pydict({"endpoint": [
    "https://httpbin.org/get",
    "https://httpbin.org/ip",
]})
df = df.select(client.fetch(df["endpoint"]).alias("response"))

Notice how the APIClient stores the API key in __init__ and reuses it across every fetch call. With @daft.func, you'd either pass the key as a column (awkward) or hardcode it in the function (inflexible). The class gives you a natural place to hold configuration and resources.

Async, sync, classes, functions -- write Python the way you already do and Daft handles the distribution.

3. Same four patterns, now with state

Remember the four stateless patterns from last week's post? Row-wise, async, generator, and batch. Every one of them carries over to @daft.cls through @daft.method. The decorator changes from @daft.func to @daft.method, but the signatures and behavior stay the same. You're just adding persistent state on top.

Sync: row-by-row, the default.

Here the class holds precomputed statistics that get reused on every row:

@daft.cls
class Normalizer:
    def __init__(self, mean: float, std: float):
        self.mean = mean
        self.std = std
 
    @daft.method
    def normalize(self, value: float) -> float:
        return (value - self.mean) / self.std

Async: for I/O-bound work where you want concurrency.

The class holds credentials and can maintain persistent connections across requests:

@daft.cls
class TranslationClient:
    def __init__(self, api_key: str):
        self.api_key = api_key
 
    @daft.method
    async def translate(self, text: str) -> str:
        async with aiohttp.ClientSession() as session:
            resp = await session.post(
                "https://api.deepl.com/v2/translate",
                data={"text": text, "target_lang": "DE",
                      "auth_key": self.api_key},
            )
            result = await resp.json()
            return result["translations"][0]["text"]

Generator: one input, many outputs.

The class loads a tokenizer once and reuses it to split every input into multiple rows:

from typing import Iterator
 
 
@daft.cls
class Tokenizer:
    def __init__(self, model_name: str):
        from transformers import AutoTokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
 
 
    @daft.method
    def tokenize(self, text: str) -> Iterator[str]:
        for token in self.tokenizer.tokenize(text):
            yield token

Batch: when your operation expects tensors or arrays, not individual scalars.

The class loads the model once, then processes entire batches at a time:

from daft import DataType, Series
 
 
@daft.cls
class BatchEmbedder:
    def __init__(self, model_name: str):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer(model_name)
 
 
    @daft.method.batch(return_dtype=DataType.list(DataType.float64()))
    def embed(self, texts: Series) -> list:
        return self.model.encode(texts.to_pylist()).tolist()

If you followed along with the @daft.func patterns, these should feel immediately familiar. The only new concept is that your class holds state between calls. Everything else -- the type hints, the return behavior, the way Daft inspects your signature -- works the same way.

4. Real patterns: transcription, inference, clients

To make this concrete, here's an example drawn from a production voice AI pipeline. FasterWhisperTranscriber loads a Whisper model once per worker, then transcribes audio files across the cluster:

import daft
 
from faster_whisper import WhisperModel, BatchedInferencePipeline
 
 
@daft.cls()
class FasterWhisperTranscriber:
    def __init__(self, model="distil-large-v3", compute_type="float32", device="auto"):
        self.model = WhisperModel(model, compute_type=compute_type, device=device)
        self.pipe = BatchedInferencePipeline(self.model)
 
 
    @daft.method(return_dtype=daft.DataType.struct({
        "transcript": daft.DataType.string(),
        "segments": daft.DataType.list(daft.DataType.struct({
            "start": daft.DataType.float64(),
            "end": daft.DataType.float64(),
            "text": daft.DataType.string(),
        })),
    }))
    def transcribe(self, audio_file: daft.File):
        with audio_file.to_tempfile() as tmp:
            segments_iter, info = self.pipe.transcribe(
                str(tmp.name), vad_filter=True, batch_size=16
            )
            segments = [{"start": s.start, "end": s.end, "text": s.text}
                        for s in segments_iter]
            text = " ".join(seg["text"] for seg in segments)
            return {"transcript": text, "segments": segments}
 
 
transcriber = FasterWhisperTranscriber()
df = daft.from_glob_path("s3://my-bucket/audio/*.mp3")
df = df.with_column("result", transcriber.transcribe(daft.col("path")))
df.show()

The __init__ downloads and initializes Whisper once per worker. The transcribe method runs on every audio file, reusing the loaded model each time. A single hour of 48 kHz stereo audio can consume 518 MB of memory -- reinitializing for every file would make the pipeline unusable.

The same pattern applies broadly: image classification (ByteDance tunes distributed UDFs on Ray using @daft.cls), document embedding, API clients that maintain sessions, database connectors that pool connections. Anywhere you have a resource that's expensive to create and cheap to reuse, @daft.cls is the right tool.

Next week we'll go deeper on the AI/ML inference use cases specifically -- GPU allocation, memory management, and production patterns for running models at scale. This week is about understanding the building block: the stateful UDF itself.

5. Configure the worker, not the code

One more thing that @daft.cls gives you over @daft.func: resource configuration at the decorator level. Need a GPU? Want to cap concurrency? Need to escape the GIL? You declare it once on the class, and Daft enforces it across every worker:

@daft.cls(
    gpus=1,              # Reserve 1 GPU per worker instance
    max_concurrency=4,   # Limit to 4 concurrent instances
    use_process=True,    # Run in a separate process (escape the GIL)
)
class ImageClassifier:
    def __init__(self, model_name: str):
        import torch
        self.model = torch.hub.load("pytorch/vision", model_name, pretrained=True)
        self.model.cuda().eval()
 
 
    def __call__(self, image_path: str) -> str:
        image = load_and_preprocess(image_path)
        with torch.no_grad():
            output = self.model(image.cuda())
        return decode_prediction(output)

Three lines of configuration. Daft handles GPU allocation, concurrency limits, and process isolation. The configuration lives on the decorator -- not scattered through your pipeline code, not in a separate config file, not in your cluster orchestrator. It's right next to the class it applies to, where you'd expect to find it.

We'll cover these resource options in detail in next week's post on AI/ML inference. For now, the key insight is that @daft.cls isn't just about state -- it's about giving you a single place to define what your operator needs and how it should run.