LLM

The wippy/llm module provides a unified interface for working with Large Language Models from multiple providers (OpenAI, Anthropic, Google, local models). It supports text generation, tool calling, structured output, embeddings, and streaming.

Setup

Add the module to your project:

wippy add wippy/llm
wippy install

Declare the dependency in your _index.yaml. The LLM module requires an environment storage (for API keys) and a process host:

version: "1.0"
namespace: app

entries:
  - name: os_env
    kind: env.storage.os

  - name: processes
    kind: process.host
    lifecycle:
      auto_start: true

  - name: dep.llm
    kind: ns.dependency
    component: wippy/llm
    version: "*"
    parameters:
      - name: env_storage
        value: app:os_env
      - name: process_host
        value: app:processes

The env.storage.os entry exposes OS environment variables to the LLM providers. Set your API keys as environment variables (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY).

Text Generation

Import the llm library into your entry and call generate():

entries:
  - name: ask
    kind: function.lua
    source: file://ask.lua
    method: handler
    imports:
      llm: wippy.llm:llm

local llm = require("llm")

local function handler()
    local response, err = llm.generate("What are the three laws of robotics?", {
        model = "gpt-4o"
    })

    if err then
        return nil, err
    end

    return response.result
end

return { handler = handler }

The first argument to generate() can be a string prompt, a prompt builder, or a table of messages. The second argument is an options table.

Generate Options

Option	Type	Description
`model`	string	Model name or class (required)
`temperature`	number	Randomness control, 0-1
`max_tokens`	number	Maximum tokens to generate
`top_p`	number	Nucleus sampling parameter
`top_k`	number	Top-k filtering
`thinking_effort`	number	Thinking depth 0-100 (models with thinking capability)
`tools`	table	Array of tool definitions
`tool_choice`	string	`"auto"`, `"none"`, `"any"`, or tool name
`stream`	table	Streaming config: `{ reply_to, topic, buffer_size }`
`timeout`	number	Request timeout in seconds (default 600)

Response Structure

Field	Type	Description
`result`	string	Generated text content
`tokens`	table	Token usage: `prompt_tokens`, `completion_tokens`, `thinking_tokens`, `total_tokens`
`finish_reason`	string	Why generation stopped: `"stop"`, `"length"`, `"tool_call"`
`tool_calls`	table?	Array of tool calls (if model invoked tools)
`metadata`	table	Provider-specific metadata
`usage_record`	table?	Usage tracking record

Prompt Builder

For multi-turn conversations and complex prompts, use the prompt builder:

imports:
  llm: wippy.llm:llm
  prompt: wippy.llm:prompt

local llm = require("llm")
local prompt = require("prompt")

local conversation = prompt.new()
conversation:add_system("You are a helpful assistant.")
conversation:add_user("What is the capital of France?")

local response, err = llm.generate(conversation, {
    model = "gpt-4o",
    temperature = 0.7,
    max_tokens = 500
})

Builder Methods

Method	Description
`prompt.new()`	Create empty builder
`prompt.with_system(content)`	Create builder with system message
`:add_system(content, meta?)`	Add system message
`:add_user(content, meta?)`	Add user message
`:add_assistant(content, meta?)`	Add assistant message
`:add_developer(content, meta?)`	Add developer message
`:add_message(role, content_parts, name?, meta?)`	Add message with role and content parts
`:add_function_call(name, args, id?)`	Add tool call from assistant
`:add_function_result(name, result, id?)`	Add tool execution result
`:add_cache_marker(id?)`	Mark cache boundary (Claude models)
`:get_messages()`	Get message array
`:build()`	Get `{ messages = ... }` table for `llm.generate()`
`:clone()`	Deep copy the builder
`:clear()`	Remove all messages

All add_* methods return the builder for chaining.

Multi-Turn Conversations

Build up context across turns by appending messages:

local conversation = prompt.new()
conversation:add_system("You are a helpful assistant.")

-- first turn
conversation:add_user("What is Lua?")
local r1 = llm.generate(conversation, { model = "gpt-4o" })
conversation:add_assistant(r1.result)

-- second turn with full context
conversation:add_user("What makes it different from Python?")
local r2 = llm.generate(conversation, { model = "gpt-4o" })

Multimodal Content

Combine text and images in a single message:

local conversation = prompt.new()
conversation:add_message(prompt.ROLE.USER, {
    prompt.text("What's in this image?"),
    prompt.image("https://example.com/photo.jpg")
})

Function	Description
`prompt.text(content)`	Text content part
`prompt.image(url, mime_type?)`	Image from URL
`prompt.image_base64(mime_type, data)`	Base64-encoded image

Role Constants

Constant	Value
`prompt.ROLE.SYSTEM`	`"system"`
`prompt.ROLE.USER`	`"user"`
`prompt.ROLE.ASSISTANT`	`"assistant"`
`prompt.ROLE.DEVELOPER`	`"developer"`
`prompt.ROLE.FUNCTION_CALL`	`"function_call"`
`prompt.ROLE.FUNCTION_RESULT`	`"function_result"`

Cloning

Clone a builder to create variations without modifying the original:

local base = prompt.new()
base:add_system("You are a helpful assistant.")

local conv1 = base:clone()
conv1:add_user("What is AI?")

local conv2 = base:clone()
conv2:add_user("What is ML?")

Streaming

Stream responses in real-time using process communication. This requires a process.lua entry:

local llm = require("llm")

local TOPIC = "llm_stream"

local function main()
    local stream_ch = process.listen(TOPIC)

    local response = llm.generate("Write a short story", {
        model = "gpt-4o",
        stream = {
            reply_to = process.pid(),
            topic = TOPIC,
        },
    })

    while true do
        local chunk, ok = stream_ch:receive()
        if not ok then break end

        if chunk.type == "chunk" then
            io.write(chunk.content)
        elseif chunk.type == "thinking" then
            io.write(chunk.content)
        elseif chunk.type == "error" then
            io.print("Error: " .. chunk.error.message)
            break
        elseif chunk.type == "done" then
            break
        end
    end

    process.unlisten(stream_ch)
end

Chunk Types

Type	Fields	Description
`"chunk"`	`content`	Text content fragment
`"thinking"`	`content`	Model thinking process
`"tool_call"`	`name`, `arguments`, `id`	Tool invocation
`"error"`	`error.message`, `error.type`	Stream error
`"done"`	`meta`	Stream complete

Streaming requires a process.lua entry because it uses Wippy's process communication system (process.pid(), process.listen()).

Tool Calling

Define tools as inline schemas and pass them to generate():

local llm = require("llm")
local prompt = require("prompt")
local json = require("json")

local tools = {
    {
        name = "get_weather",
        description = "Get current weather for a location",
        schema = {
            type = "object",
            properties = {
                location = { type = "string", description = "City name" },
            },
            required = { "location" },
        },
    },
}

local conversation = prompt.new()
conversation:add_user("What's the weather in Tokyo?")

local response = llm.generate(conversation, {
    model = "gpt-4o",
    tools = tools,
    tool_choice = "auto",
})

if response.tool_calls and #response.tool_calls > 0 then
    for _, tc in ipairs(response.tool_calls) do
        -- execute the tool and get a result
        local result = { temperature = 22, condition = "sunny" }

        -- add the exchange to the conversation
        conversation:add_function_call(tc.name, tc.arguments, tc.id)
        conversation:add_function_result(tc.name, json.encode(result), tc.id)
    end

    -- continue generation with tool results
    local final = llm.generate(conversation, { model = "gpt-4o" })
    print(final.result)
end

Tool Call Fields

Field	Type	Description
`id`	string	Unique call identifier
`name`	string	Tool name
`arguments`	table	Parsed arguments matching the schema

Tool Choice

Value	Behavior
`"auto"`	Model decides when to use tools (default)
`"none"`	Never use tools
`"any"`	Must use at least one tool
`"tool_name"`	Must use the specified tool

Structured Output

Generate validated JSON matching a schema:

local llm = require("llm")

local schema = {
    type = "object",
    properties = {
        name = { type = "string" },
        age = { type = "number" },
        hobbies = {
            type = "array",
            items = { type = "string" },
        },
    },
    required = { "name", "age", "hobbies" },
    additionalProperties = false,
}

local response, err = llm.structured_output(schema, "Describe a fictional character", {
    model = "gpt-4o",
})

if not err then
    print(response.result.name)
    print(response.result.age)
end

For OpenAI models, all properties must be in the required array. Use union types for optional fields: type = {"string", "null"}. Set additionalProperties = false.

Model Configuration

Models are defined as registry entries with meta.type: llm.model:

entries:
  - name: gpt-4o
    kind: registry.entry
    meta:
      name: gpt-4o
      type: llm.model
      title: GPT-4o
      comment: OpenAI's flagship model
      capabilities:
        - generate
        - tool_use
        - structured_output
        - vision
      class:
        - balanced
      priority: 100
    max_tokens: 128000
    output_tokens: 16384
    pricing:
      input: 2.5
      output: 10
    providers:
      - id: wippy.llm.openai:provider
        provider_model: gpt-4o

Model Entry Fields

Field	Description
`meta.name`	Model identifier used in API calls
`meta.type`	Must be `llm.model`
`meta.capabilities`	Feature list: `generate`, `tool_use`, `structured_output`, `embed`, `thinking`, `vision`, `caching`
`meta.class`	Class membership: `fast`, `balanced`, `reasoning`, etc.
`meta.priority`	Numeric priority for class-based resolution (higher wins)
`max_tokens`	Maximum context window
`output_tokens`	Maximum output tokens
`pricing`	Cost per million tokens: `input`, `output`
`providers`	Array with `id` (provider entry) and `provider_model` (provider-specific model name)

Local Models

For locally hosted models (LM Studio, Ollama), define a separate provider entry with a custom base_url:

  - name: local_provider
    kind: registry.entry
    meta:
      name: ollama
      type: llm.provider
      title: Ollama Local
    driver:
      id: wippy.llm.openai:driver
      options:
        api_key_env: none
        base_url: http://127.0.0.1:11434/v1

  - name: local-llama
    kind: registry.entry
    meta:
      name: local-llama
      type: llm.model
      title: Local Llama
      capabilities:
        - generate
    max_tokens: 4096
    output_tokens: 4096
    pricing:
      input: 0
      output: 0
    providers:
      - id: app:local_provider
        provider_model: llama-3.2

Model Resolution

Models can be referenced by exact name, class, or explicit class prefix:

-- exact model name
llm.generate("Hello", { model = "gpt-4o" })

-- model class (picks highest priority in that class)
llm.generate("Hello", { model = "fast" })

-- explicit class syntax
llm.generate("Hello", { model = "class:reasoning" })

Resolution order:

Match by exact meta.name
Match by class name (highest meta.priority wins)
With class: prefix, search only in that class

Model Discovery

Query available models and their capabilities at runtime:

local llm = require("llm")

-- all models
local models = llm.available_models()

-- filter by capability
local tool_models = llm.available_models("tool_use")
local embed_models = llm.available_models("embed")

-- list model classes
local classes = llm.get_classes()
for _, c in ipairs(classes) do
    print(c.name .. ": " .. c.title)
end

Embeddings

Generate vector embeddings for semantic search:

local llm = require("llm")

-- single text
local response = llm.embed("The quick brown fox", {
    model = "text-embedding-3-small",
    dimensions = 512,
})
-- response.result is a float array

-- multiple texts
local response = llm.embed({
    "First document",
    "Second document",
}, { model = "text-embedding-3-small" })
-- response.result is an array of float arrays

Error Handling

Errors are returned as the second return value. On error, the first return value is nil:

local response, err = llm.generate("Hello", { model = "gpt-4o" })

if err then
    io.print("Error: " .. tostring(err))
    return
end

io.print(response.result)

Error Types

Constant	Description
`llm.ERROR_TYPE.INVALID_REQUEST`	Malformed request
`llm.ERROR_TYPE.AUTHENTICATION`	Invalid API key
`llm.ERROR_TYPE.RATE_LIMIT`	Provider rate limit exceeded
`llm.ERROR_TYPE.SERVER_ERROR`	Provider server error
`llm.ERROR_TYPE.CONTEXT_LENGTH`	Input exceeds context window
`llm.ERROR_TYPE.CONTENT_FILTER`	Content filtered by safety systems
`llm.ERROR_TYPE.TIMEOUT`	Request timed out
`llm.ERROR_TYPE.MODEL_ERROR`	Invalid or unavailable model

Finish Reasons

Constant	Description
`llm.FINISH_REASON.STOP`	Normal completion
`llm.FINISH_REASON.LENGTH`	Reached max tokens
`llm.FINISH_REASON.CONTENT_FILTER`	Content filtered
`llm.FINISH_REASON.TOOL_CALL`	Model made a tool call
`llm.FINISH_REASON.ERROR`	Error during generation

Capabilities

Constant	Description
`llm.CAPABILITY.GENERATE`	Text generation
`llm.CAPABILITY.TOOL_USE`	Tool/function calling
`llm.CAPABILITY.STRUCTURED_OUTPUT`	JSON structured output
`llm.CAPABILITY.EMBED`	Vector embeddings
`llm.CAPABILITY.THINKING`	Extended thinking
`llm.CAPABILITY.VISION`	Image understanding
`llm.CAPABILITY.CACHING`	Prompt caching

LLM