# LLM
_Path: en/framework/llm_
## Table of Contents
- LLM
## Content
# LLM
The `wippy/llm` module provides a unified interface for working with Large Language Models from multiple providers (OpenAI, Anthropic, Google, local models). It supports text generation, tool calling, structured output, embeddings, and streaming.
## Setup
Add the module to your project:
```bash
wippy add wippy/llm
wippy install
```
Declare the dependency in your `_index.yaml`. The LLM module requires an environment storage (for API keys) and a process host:
```yaml
version: "1.0"
namespace: app
entries:
- name: os_env
kind: env.storage.os
- name: processes
kind: process.host
lifecycle:
auto_start: true
- name: dep.llm
kind: ns.dependency
component: wippy/llm
version: "*"
parameters:
- name: env_storage
value: app:os_env
- name: process_host
value: app:processes
```
The `env.storage.os` entry exposes OS environment variables to the LLM providers. Set your API keys as environment variables (e.g. `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).
## Text Generation
Import the `llm` library into your entry and call `generate()`:
```yaml
entries:
- name: ask
kind: function.lua
source: file://ask.lua
method: handler
imports:
llm: wippy.llm:llm
```
```lua
local llm = require("llm")
local function handler()
local response, err = llm.generate("What are the three laws of robotics?", {
model = "gpt-4o"
})
if err then
return nil, err
end
return response.result
end
return { handler = handler }
```
The first argument to `generate()` can be a string prompt, a prompt builder, or a table of messages. The second argument is an options table.
### Generate Options
| Option | Type | Description |
|--------|------|-------------|
| `model` | string | Model name or class (required) |
| `temperature` | number | Randomness control, 0-1 |
| `max_tokens` | number | Maximum tokens to generate |
| `top_p` | number | Nucleus sampling parameter |
| `top_k` | number | Top-k filtering |
| `thinking_effort` | number | Thinking depth 0-100 (models with thinking capability) |
| `tools` | table | Array of tool definitions |
| `tool_choice` | string | `"auto"`, `"none"`, `"any"`, or tool name |
| `stream` | table | Streaming config: `{ reply_to, topic, buffer_size }` |
| `timeout` | number | Request timeout in seconds (default 600) |
### Response Structure
| Field | Type | Description |
|-------|------|-------------|
| `result` | string | Generated text content |
| `tokens` | table | Token usage: `prompt_tokens`, `completion_tokens`, `thinking_tokens`, `total_tokens` |
| `finish_reason` | string | Why generation stopped: `"stop"`, `"length"`, `"tool_call"` |
| `tool_calls` | table? | Array of tool calls (if model invoked tools) |
| `metadata` | table | Provider-specific metadata |
| `usage_record` | table? | Usage tracking record |
## Prompt Builder
For multi-turn conversations and complex prompts, use the prompt builder:
```yaml
imports:
llm: wippy.llm:llm
prompt: wippy.llm:prompt
```
```lua
local llm = require("llm")
local prompt = require("prompt")
local conversation = prompt.new()
conversation:add_system("You are a helpful assistant.")
conversation:add_user("What is the capital of France?")
local response, err = llm.generate(conversation, {
model = "gpt-4o",
temperature = 0.7,
max_tokens = 500
})
```
### Builder Methods
| Method | Description |
|--------|-------------|
| `prompt.new()` | Create empty builder |
| `prompt.with_system(content)` | Create builder with system message |
| `:add_system(content, meta?)` | Add system message |
| `:add_user(content, meta?)` | Add user message |
| `:add_assistant(content, meta?)` | Add assistant message |
| `:add_developer(content, meta?)` | Add developer message |
| `:add_message(role, content_parts, name?, meta?)` | Add message with role and content parts |
| `:add_function_call(name, args, id?)` | Add tool call from assistant |
| `:add_function_result(name, result, id?)` | Add tool execution result |
| `:add_cache_marker(id?)` | Mark cache boundary (Claude models) |
| `:get_messages()` | Get message array |
| `:build()` | Get `{ messages = ... }` table for `llm.generate()` |
| `:clone()` | Deep copy the builder |
| `:clear()` | Remove all messages |
All `add_*` methods return the builder for chaining.
### Multi-Turn Conversations
Build up context across turns by appending messages:
```lua
local conversation = prompt.new()
conversation:add_system("You are a helpful assistant.")
-- first turn
conversation:add_user("What is Lua?")
local r1 = llm.generate(conversation, { model = "gpt-4o" })
conversation:add_assistant(r1.result)
-- second turn with full context
conversation:add_user("What makes it different from Python?")
local r2 = llm.generate(conversation, { model = "gpt-4o" })
```
### Multimodal Content
Combine text and images in a single message:
```lua
local conversation = prompt.new()
conversation:add_message(prompt.ROLE.USER, {
prompt.text("What's in this image?"),
prompt.image("https://example.com/photo.jpg")
})
```
| Function | Description |
|----------|-------------|
| `prompt.text(content)` | Text content part |
| `prompt.image(url, mime_type?)` | Image from URL |
| `prompt.image_base64(mime_type, data)` | Base64-encoded image |
### Role Constants
| Constant | Value |
|----------|-------|
| `prompt.ROLE.SYSTEM` | `"system"` |
| `prompt.ROLE.USER` | `"user"` |
| `prompt.ROLE.ASSISTANT` | `"assistant"` |
| `prompt.ROLE.DEVELOPER` | `"developer"` |
| `prompt.ROLE.FUNCTION_CALL` | `"function_call"` |
| `prompt.ROLE.FUNCTION_RESULT` | `"function_result"` |
### Cloning
Clone a builder to create variations without modifying the original:
```lua
local base = prompt.new()
base:add_system("You are a helpful assistant.")
local conv1 = base:clone()
conv1:add_user("What is AI?")
local conv2 = base:clone()
conv2:add_user("What is ML?")
```
## Streaming
Stream responses in real-time using process communication. This requires a `process.lua` entry:
```lua
local llm = require("llm")
local TOPIC = "llm_stream"
local function main()
local stream_ch = process.listen(TOPIC)
local response = llm.generate("Write a short story", {
model = "gpt-4o",
stream = {
reply_to = process.pid(),
topic = TOPIC,
},
})
while true do
local chunk, ok = stream_ch:receive()
if not ok then break end
if chunk.type == "chunk" then
io.write(chunk.content)
elseif chunk.type == "thinking" then
io.write(chunk.content)
elseif chunk.type == "error" then
io.print("Error: " .. chunk.error.message)
break
elseif chunk.type == "done" then
break
end
end
process.unlisten(stream_ch)
end
```
### Chunk Types
| Type | Fields | Description |
|------|--------|-------------|
| `"chunk"` | `content` | Text content fragment |
| `"thinking"` | `content` | Model thinking process |
| `"tool_call"` | `name`, `arguments`, `id` | Tool invocation |
| `"error"` | `error.message`, `error.type` | Stream error |
| `"done"` | `meta` | Stream complete |
Streaming requires a process.lua entry because it uses Wippy's process communication system (process.pid(), process.listen()).
## Tool Calling
Define tools as inline schemas and pass them to `generate()`:
```lua
local llm = require("llm")
local prompt = require("prompt")
local json = require("json")
local tools = {
{
name = "get_weather",
description = "Get current weather for a location",
schema = {
type = "object",
properties = {
location = { type = "string", description = "City name" },
},
required = { "location" },
},
},
}
local conversation = prompt.new()
conversation:add_user("What's the weather in Tokyo?")
local response = llm.generate(conversation, {
model = "gpt-4o",
tools = tools,
tool_choice = "auto",
})
if response.tool_calls and #response.tool_calls > 0 then
for _, tc in ipairs(response.tool_calls) do
-- execute the tool and get a result
local result = { temperature = 22, condition = "sunny" }
-- add the exchange to the conversation
conversation:add_function_call(tc.name, tc.arguments, tc.id)
conversation:add_function_result(tc.name, json.encode(result), tc.id)
end
-- continue generation with tool results
local final = llm.generate(conversation, { model = "gpt-4o" })
print(final.result)
end
```
### Tool Call Fields
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique call identifier |
| `name` | string | Tool name |
| `arguments` | table | Parsed arguments matching the schema |
### Tool Choice
| Value | Behavior |
|-------|----------|
| `"auto"` | Model decides when to use tools (default) |
| `"none"` | Never use tools |
| `"any"` | Must use at least one tool |
| `"tool_name"` | Must use the specified tool |
## Structured Output
Generate validated JSON matching a schema:
```lua
local llm = require("llm")
local schema = {
type = "object",
properties = {
name = { type = "string" },
age = { type = "number" },
hobbies = {
type = "array",
items = { type = "string" },
},
},
required = { "name", "age", "hobbies" },
additionalProperties = false,
}
local response, err = llm.structured_output(schema, "Describe a fictional character", {
model = "gpt-4o",
})
if not err then
print(response.result.name)
print(response.result.age)
end
```
For OpenAI models, all properties must be in the required array. Use union types for optional fields: type = {"string", "null"}. Set additionalProperties = false.
## Model Configuration
Models are defined as registry entries with `meta.type: llm.model`:
```yaml
entries:
- name: gpt-4o
kind: registry.entry
meta:
name: gpt-4o
type: llm.model
title: GPT-4o
comment: OpenAI's flagship model
capabilities:
- generate
- tool_use
- structured_output
- vision
class:
- balanced
priority: 100
max_tokens: 128000
output_tokens: 16384
pricing:
input: 2.5
output: 10
providers:
- id: wippy.llm.openai:provider
provider_model: gpt-4o
```
### Model Entry Fields
| Field | Description |
|-------|-------------|
| `meta.name` | Model identifier used in API calls |
| `meta.type` | Must be `llm.model` |
| `meta.capabilities` | Feature list: `generate`, `tool_use`, `structured_output`, `embed`, `thinking`, `vision`, `caching` |
| `meta.class` | Class membership: `fast`, `balanced`, `reasoning`, etc. |
| `meta.priority` | Numeric priority for class-based resolution (higher wins) |
| `max_tokens` | Maximum context window |
| `output_tokens` | Maximum output tokens |
| `pricing` | Cost per million tokens: `input`, `output` |
| `providers` | Array with `id` (provider entry) and `provider_model` (provider-specific model name) |
### Local Models
For locally hosted models (LM Studio, Ollama), define a separate provider entry with a custom `base_url`:
```yaml
- name: local_provider
kind: registry.entry
meta:
name: ollama
type: llm.provider
title: Ollama Local
driver:
id: wippy.llm.openai:driver
options:
api_key_env: none
base_url: http://127.0.0.1:11434/v1
- name: local-llama
kind: registry.entry
meta:
name: local-llama
type: llm.model
title: Local Llama
capabilities:
- generate
max_tokens: 4096
output_tokens: 4096
pricing:
input: 0
output: 0
providers:
- id: app:local_provider
provider_model: llama-3.2
```
## Model Resolution
Models can be referenced by exact name, class, or explicit class prefix:
```lua
-- exact model name
llm.generate("Hello", { model = "gpt-4o" })
-- model class (picks highest priority in that class)
llm.generate("Hello", { model = "fast" })
-- explicit class syntax
llm.generate("Hello", { model = "class:reasoning" })
```
Resolution order:
1. Match by exact `meta.name`
2. Match by class name (highest `meta.priority` wins)
3. With `class:` prefix, search only in that class
## Model Discovery
Query available models and their capabilities at runtime:
```lua
local llm = require("llm")
-- all models
local models = llm.available_models()
-- filter by capability
local tool_models = llm.available_models("tool_use")
local embed_models = llm.available_models("embed")
-- list model classes
local classes = llm.get_classes()
for _, c in ipairs(classes) do
print(c.name .. ": " .. c.title)
end
```
## Embeddings
Generate vector embeddings for semantic search:
```lua
local llm = require("llm")
-- single text
local response = llm.embed("The quick brown fox", {
model = "text-embedding-3-small",
dimensions = 512,
})
-- response.result is a float array
-- multiple texts
local response = llm.embed({
"First document",
"Second document",
}, { model = "text-embedding-3-small" })
-- response.result is an array of float arrays
```
## Error Handling
Errors are returned as the second return value. On error, the first return value is `nil`:
```lua
local response, err = llm.generate("Hello", { model = "gpt-4o" })
if err then
io.print("Error: " .. tostring(err))
return
end
io.print(response.result)
```
### Error Types
| Constant | Description |
|----------|-------------|
| `llm.ERROR_TYPE.INVALID_REQUEST` | Malformed request |
| `llm.ERROR_TYPE.AUTHENTICATION` | Invalid API key |
| `llm.ERROR_TYPE.RATE_LIMIT` | Provider rate limit exceeded |
| `llm.ERROR_TYPE.SERVER_ERROR` | Provider server error |
| `llm.ERROR_TYPE.CONTEXT_LENGTH` | Input exceeds context window |
| `llm.ERROR_TYPE.CONTENT_FILTER` | Content filtered by safety systems |
| `llm.ERROR_TYPE.TIMEOUT` | Request timed out |
| `llm.ERROR_TYPE.MODEL_ERROR` | Invalid or unavailable model |
### Finish Reasons
| Constant | Description |
|----------|-------------|
| `llm.FINISH_REASON.STOP` | Normal completion |
| `llm.FINISH_REASON.LENGTH` | Reached max tokens |
| `llm.FINISH_REASON.CONTENT_FILTER` | Content filtered |
| `llm.FINISH_REASON.TOOL_CALL` | Model made a tool call |
| `llm.FINISH_REASON.ERROR` | Error during generation |
## Capabilities
| Constant | Description |
|----------|-------------|
| `llm.CAPABILITY.GENERATE` | Text generation |
| `llm.CAPABILITY.TOOL_USE` | Tool/function calling |
| `llm.CAPABILITY.STRUCTURED_OUTPUT` | JSON structured output |
| `llm.CAPABILITY.EMBED` | Vector embeddings |
| `llm.CAPABILITY.THINKING` | Extended thinking |
| `llm.CAPABILITY.VISION` | Image understanding |
| `llm.CAPABILITY.CACHING` | Prompt caching |
## See Also
- [Agents](agents.md) - Agent framework with tools, delegates, and memory
- [Building an LLM Agent](../tutorials/llm-agent.md) - Step-by-step tutorial
- [Framework Overview](overview.md) - Framework module usage
## Navigation
Previous: Overview (framework/overview)
Next: Agents (framework/agents)