Workers AI
Monitor and inspect AI model inference calls from your worker
Workers AI lets you run machine learning models directly on Cloudflare's network. FlareDesk captures every AI call your worker makes, showing you the model used, input prompt, response, token counts, and latency, all in real time.
Overview
The Workers AI page in FlareDesk gives you full visibility into your AI inference pipeline:
Live Trace Feed
See every AI call in real time as your worker processes requests
Input & Output Inspector
View the full prompt sent and the complete model response
Token Counts
Track input and output tokens per inference call
Latency Tracking
Measure how long each model inference takes
Requirements
Before you start
- 1.Add the
[ai]binding to yourwrangler.toml - 2.Run your worker with
wrangler dev --remote(Workers AI requires remote execution) - 3.Enable Profiling in FlareDesk (click Enable Profiling on the Workers AI page)
# Add this to your wrangler.toml [ai] binding = "AI"
interface Env { AI: Ai; // ... other bindings }
Viewing AI Traces
- 1
Navigate to Workers AI in the sidebar under Bindings
- 2
Click Enable Profiling if it's not already active
- 3
Make a request to your worker that calls
env.AI.run() - 4
The trace appears instantly in the list. Click it to inspect the full input and output
Live mode: FlareDesk auto-refreshes every 3 seconds when profiling is enabled. You can pause live updates by clicking the Live toggle in the header.
Inspecting a Trace
Click any trace in the list to open the detail drawer on the right side:
Trace Details Include
- Model: The full model identifier (e.g.
@cf/meta/llama-3.1-8b-instruct) - Duration: Total inference latency in milliseconds
- Timestamp: Exact time the call was made
- Input Tokens: Number of tokens in the prompt
- Output Tokens: Number of tokens in the response
- Request Input: The full prompt or messages array sent to the model
- Response Output: The complete model response
Supported Models
FlareDesk captures traces for all Workers AI model categories:
Text Generation
@cf/meta/llama-3.1-8b-instructText Classification
@cf/huggingface/distilbert-sst-2-int8Text Embeddings
@cf/baai/bge-base-en-v1.5Translation
@cf/meta/m2m100-1.2bSummarization
@cf/facebook/bart-large-cnnImage Classification
@cf/microsoft/resnet-50Handling Errors
Failed AI calls are clearly marked with an Error badge in the trace list. Clicking the trace shows the full error message in the detail drawer.
Common error: AI not available in local mode
Workers AI cannot run in local-only mode. If you see this error, restart your worker with wrangler dev --remote.
Tips & Best Practices
Use structured messages
Pass a messages array (OpenAI-style) rather than a raw prompt string for better visibility in the inspector.
Monitor token usage
Keep an eye on input/output token counts to understand your usage patterns and optimise prompt lengths for cost and speed.
Use the Profiler for full context
The Profiler shows AI calls alongside all other binding calls (D1, KV, R2) in a waterfall view, great for understanding the full latency breakdown of a request.
Next Steps
Profiler
See AI calls alongside all other bindings in a waterfall view