Exploring Claude Agent Skills

Introduction

I've been generating a lot of images lately. The workflow goes like this: think of what I need, metaprompt it with another AI, generate, see it's 80% there, iterate, tweak the prompt, regenerate. Repeat for anywhere from 5 to 20 minutes per image until something acceptable emerges.

It works. But it's tedious, and it doesn't scale. If I need twenty hero shots for a campaign, that's potentially hours of manual back-and-forth. What if Claude could handle this entire loop? Not just generate the image, but critique it, iterate on it, and keep going until it hits quality thresholds - all autonomously.

Skills + Tools = The Hybrid Pattern

Claude Skills are folders with markdown files that teach Claude how to coordinate image generation. The Skill provides brand guidelines and critique frameworks, while custom tools handle the actual API calls (generation via Replicate, critique via Claude Vision). Think of it as: Skills = Brain, Tools = Hands.

This is what I am building and exploring. A system that uses Claude Skills to coordinate image generation - providing the knowledge (brand guidelines, critique framework) while custom tools execute the actual work (calling Replicate's API for generation, Claude Vision for critique). The whole 5-20 minute loop compressed into something that runs autonomously while you get coffee.

Development Timeline

Final stages • Repository publishing soon

17/10 Specification & initial concept design

18/10 Core implementation & iterative testing

19/10 Blog post writing & interactive demo

20/10 Initial Critical Misunderstanding → Revelation (blog post v2)

❌ Initial belief: SKILL.md could directly call external APIs (Replicate, Claude Vision) - "just write Python in the Skill and it runs everything!"

✓ Reality discovered: Skills run in a sandboxed environment with NO network access. External API calls must happen through custom tools outside the sandbox.

This led to the hybrid pattern: Skills = knowledge (brand guidelines, workflows), Custom Tools = execution (API calls). Claude reads the Skill and coordinates tool usage.

23/10 Taking a bit of a break, working on this blog post again on the weekend

Meanwhile checkout my progress on the AI branding project

30/10 Currently busy with other priorities

If you're returning to this post: I'm currently a little busy with client work, the AI branding project, and the AI storyboarding case study. Please excuse the slow updates - check back in November for fuller progress. Of course do feel to send me a question if you have them, I'll see if i can answer them.

**/11 Final refinements & repository publication

Full source code, documentation, and implementation guide coming within the next few days. Follow along or check back soon for the complete repository.

So let's build this, a mini project, contained within this website, for a static demo, but trust me, it's working on my end (I am just not going to let you use my API credits... for now).

The Initial Vision (And Why It Didn't Quite Work)

When first exploring Claude Skills, I envisioned a pure SKILL.md approach: a single markdown file orchestrating everything - tool calls, workflow logic, task coordination. Beautiful in theory. But Skills run in sandboxed environments designed for local file operations, not network-connected workflows.

So we needed a middle ground: not a complex multi-agent architecture, but not pure Skills either. Enter the hybrid pattern - where Skills provide intelligence and Custom Tools provide execution.

But Wait... What CAN Pure Skills Build?

Turns out, quite a lot! Here's what works beautifully without any external APIs.

Codebase Intelligence & Automation

Everything that runs locally on your machine without external services.

Project Setup

• Create FastAPI/Django projects with auth boilerplate
• Generate .env.example from actual .env
• Setup pre-commit hooks automatically
• Initialize git repos with proper .gitignore

Code Analysis

• Find all TODO/FIXME comments → task list
• Identify functions longer than N lines
• Generate dependency graphs from imports
• Detect unused variables and imports

Batch Operations

• Resize all images in folder to specific width
• Convert YAML configs to JSON (or vice versa)
• Minify JavaScript/CSS files in bulk
• Rename files with intelligent patterns

Report Generation

• Weekly commit summary from git log
• Test coverage reports with trends
• Build changelog from git history
• Generate API documentation from code

The Pattern: Local files → Python/bash processing → Output files. No network needed!

TL;DR: Pure Skills work for file-based workflows (input → process → output) and local computation. Need external APIs? That's when you add Custom Tools.

The Hybrid Pattern Explained

Skills provide the knowledge. Custom tools provide the actions. Claude coordinates everything.

Why Not Pure Skills?

You might wonder: "Why not write everything in SKILL.md with Python and have Skills do it all?"

I mean, it would be beautiful right? Just markdown files orchestrating everything. But we aren't there yet. That said, I do think this is a taste of things to come - the direction is clear, even if the implementation has practical constraints today.

The Sandbox Constraints

• No network access - Skills run in a sandboxed environment

• No runtime package installation - Only pre-installed packages available

• Isolated execution - Can't call external APIs directly

What This Means for Our Use Case:

CAN Live in Skills

• Brand guidelines (markdown)
• Scoring algorithms (Python)
• Composition analysis logic
• Local file operations

NEEDS Custom Tools

• External API calls
• Image generation (Replicate)
• Vision critique (Claude API)
• Network operations

Let me reiterate: You cannot call external APIs from Skills!

Skills cannot make network calls to external services, install packages at runtime, or access APIs outside the sandbox. This includes Replicate, Claude API, or any image generation service. Only pre-installed packages and local operations are permitted.

Source: Official Documentation

Runtime Environment Constraints (docs.claude.com):

• No network access - Skills cannot make external API calls
• No runtime package installation - only pre-installed packages available

Security Model:

• Sandboxed computing environment with limited internet access
• Internet restricted to small allowlist (npm, pip, GitHub, Ubuntu repos)
• NOT on allowlist: Replicate.com, image generation APIs, or Claude API itself

How Skills Execute:

• Skills run in isolated containers via bash/code execution
• Claude reads files via bash commands within the container
• No network calls possible from within the execution environment

What does work!

Following a single generation request through the system

User Request

Generate brand-consistent image for spring campaign

SANDBOX ENVIRONMENT

Claude loads SKILL.md (in sandbox)

Reads brand guidelines, critique framework, workflow instructions

Claude calls generate_brand_image tool

EXITS sandbox

Custom tool with network access can call external APIs

EXTERNAL NETWORK CALLS

Tool → Replicate API (external)

Generates image using Nano Banana model

API → Tool → Claude (as tool_result)

Returns generated image path in conversation context

Key: Tool results live in Claude's conversation context, NOT written to sandbox files automatically. Claude receives the data and can then decide whether to save it to the sandbox filesystem.

Claude calls critique_brand_image tool

EXITS sandbox

Analyzes image against brand guidelines using Claude Vision API

EXTERNAL NETWORK CALLS

Tool → Claude Vision API (external)

Scores: color, composition, style, brand alignment

API → Tool → Claude (as tool_result)

Returns scores in conversation context (color: 85, composition: 90, style: 88, brand: 92)

SKILL ACTIVE (decision time)

Claude evaluates: Skill instructions + tool results

Reads scores from conversation context + applies SKILL.md threshold: average ≥85 = approved

Result: Score 88.75 → Approved! Either finalize or call more tools if Skill says so.

Inside sandbox (no network)

Tool call (exits sandbox)

External API operations

The Pattern: Skills = Brain, Tools = Hands

Skills provide knowledge and coordination logic (brand guidelines, critique thresholds, workflow steps). Custom tools provide execution capabilities (API calls, file operations, external services). Claude acts as the orchestrator, loading Skills when relevant and calling tools when actions are needed.

This is why Anthropic provides both: pre-built Skills (xlsx, pptx) for sandboxed operations + Tool Use API for external integrations.

Three Approaches Compared

Understanding what each approach can and can't do - and why the hybrid pattern emerged.

Pure Skills

SKILL.md files with built-in code execution - all operations happen in Claude's sandboxed environment.

Can Do

• Read/write local files (workspace only)
• Execute Python code (pre-installed packages)
• Process data with pandas, numpy, PIL
• Analyze images, manipulate files
• Encode workflows & best practices

Can't Do

• Network access - no external API calls
• Install packages - sandbox is pre-configured
• Database connections - isolated environment
• System operations - limited file access

Example: Excel/PowerPoint Skills that manipulate files locally

Multi-Agent

Frameworks like LangChain, LangSmith, CrewAI - multiple specialized agents coordinated by orchestration logic.

Can Do

• Complex multi-step workflows
• Parallel task execution
• Specialized agents per domain
• State management & memory
• Full network & tool access

Trade-offs

• Complex setup - orchestration logic required
• High token cost - multiple LLM calls per task
• Debugging difficulty - distributed execution
• Maintenance burden - coordination code grows

Example: Research agent → Writer agent → Editor agent → Publisher agent

See multi-agent example →

Hybrid Pattern

This Project

Skills provide knowledge + Custom tools provide execution - best of both worlds for external API workflows.

Sweet Spot

• Knowledge in Skills (guidelines, workflows)
• External APIs via custom tools
• Single Claude instance (no orchestration)
• Simple tool definitions (JSON schema)
• Context-efficient (Skills load on-demand)

Requirements

• SKILL.md with guidelines
• Custom tools for external operations
• Tool schemas (input/output definitions)
• Claude API or Extended Tools feature

Example: This project - brand guidelines in Skill, image generation via Replicate API tool

When to Use Each Approach

Scenario	Pure Skills	Multi-Agent	Hybrid
Local file operations only	✓	-	-
External API calls required	✗	✓	✓
Need domain guidelines/workflows	✓	Manual	✓
Complex multi-agent coordination	✗	✓	Maybe
Simple setup & maintenance	✓	✗	✓
Token efficiency	✓	✗	✓

Why Hybrid Wins (For This Use Case)

→ Brand guidelines live in Skills (easy to edit, version control)
→ External APIs (Replicate, Claude Vision) handled by custom tools
→ Single Claude coordinates everything (no multi-agent overhead)
→ Simple debugging - one conversation thread, clear tool calls

When You'd Still Need Multi-Agent

→ Parallel execution - multiple independent tasks at once
→ Specialized models - different LLMs for different domains
→ Complex state - long-running workflows with checkpoints
→ Human-in-loop - approval gates between agent actions

Real Example: Brand Image Generation

Pure Skills ✗

1. Load brand guidelines ✓

2. Construct prompt ✓

3. Call Replicate API ✗ (no network)

4. Critique image ✗

5. Iterate ✗

Fails at step 3 - can't escape sandbox

Multi-Agent ⚠️

1. Orchestrator → Guidelines Agent

2. Guidelines Agent → Generator Agent

3. Generator → Replicate API ✓

4. Orchestrator → Critic Agent

5. Loop until threshold ✓

Works but 5+ LLM calls, complex setup

Hybrid ✓

Claude loads Skill (guidelines) ✓

Claude → generate_brand_image tool ✓

→ Tool calls Replicate API

Claude → critique_brand_image tool ✓

→ Tool calls Claude Vision

Claude decides: iterate or done ✓

Simple, efficient, maintainable

The Bottom Line

Pure Skills are perfect for local operations. Multi-agent systems excel at complex, parallel workflows. The hybrid pattern (Skills + Custom Tools) hits the sweet spot for workflows that need both domain knowledge and external API access - simpler than multi-agent, more capable than pure Skills.

For this brand consistency project, hybrid is the clear winner: brand guidelines stay in version-controlled markdown, external APIs get clean tool wrappers, and Claude coordinates everything in a single, debuggable conversation thread.

A Note on Multi-Agent Systems

I've spent considerable time exploring multi-agent frameworks - building orchestration layers, debugging coordination logic, managing state across agents. It always gets complex, and maintenance becomes a burden. Worse, when new models drop or APIs change, the entire architecture often needs reworking because the new models don't play nice with old orchestration patterns.

Claude Skills signal a new direction: simpler patterns that don't require multi-agent complexity for most use cases. Instead of coordinating multiple LLM instances, you coordinate knowledge (Skills) and execution (tools) around a single Claude conversation. Less moving parts, clearer debugging, easier maintenance.

This doesn't make multi-agent obsolete - there are still legitimate use cases. But for workflows like this? The simpler pattern wins.

The Spec

A Claude Skill + Custom Tools system that coordinates image generation, critique, and refinement until meeting production standards.

I do quite like this ...

Instead of manually iterating on images in a web interface for 5-20 minutes, Claude coordinates the entire workflow - reading brand guidelines from a Skill, calling custom tools to generate and critique images, and refining until quality thresholds are met. One prompt, one command, production-ready output.

System Architecture

Core Components

→ SKILL.md - Brand guidelines and workflow instructions (in sandbox)
→ Generate Tool - Custom tool that calls Replicate's Nano Banana API (exits sandbox)
→ Critique Tool - Custom tool that uses Claude Vision API to score images (exits sandbox)
→ Project Structure - Organizes campaigns with briefs, style guides, and versioned scenes

Autonomous Workflow

1 Initialize Project - Load brand guidelines and campaign brief
2 Generate Image - Call generate_brand_image tool with brand-informed prompt
3 Critique & Score - Call critique_brand_image tool to evaluate technical (8/10), brand (9/10), functional (7/10)
4 Iterate or Finalize - Refine if below thresholds (max 5 iterations), or save with metadata

Quality Thresholds

Technical (≥8/10)

Lighting, focus, composition, artifacts, color accuracy

Brand (≥9/10)

Color palette, visual style, tone consistency, guidelines adherence

Functional (≥7/10)

Brief alignment, use case fit, messaging support

Progressive Asset Library

Each project builds a library of versioned scenes with full metadata tracking - prompt, scores, iteration count, refinement history. Over time, you develop a rich corpus of brand-consistent assets that inform future generations.

The smallest useful pieces

One markdown file with instructions, two tiny Python scripts, and a bit of UI glue. That's the whole system.

SKILL.md (excerpt)

---
name: Brand Consistency Engine
description: Coordinates image generation workflow with brand guidelines and quality control through custom tool integration
---

# Brand Consistency Engine

## Overview
This Skill provides brand guidelines and workflow instructions for image generation. When creating brand visuals, Claude reads these guidelines and coordinates custom tools to generate images, critique them, and iterate until quality thresholds are met.

## Brand Guidelines

### Visual Identity
- **Primary Colors**: Emerald (#10b981), Teal (#14b8a6)
- **Typography**: Clean sans-serif, high contrast
- **Composition**: Minimalist, breathing room, strategic negative space
- **Mood**: Professional yet approachable, modern, trustworthy

Show full SKILL.md


## Workflow

When asked to create brand-consistent images:

1. **Review Guidelines**
   - Read brand-guidelines/ for colors, composition rules, tone
   - Note any project-specific requirements
   
2. **Generate Initial Image**
   - Construct prompt incorporating brand rules
   - Call custom tool: generate_brand_image (handles Replicate API call)
   
3. **Critique Against Standards**
   - Call custom tool: critique_brand_image (uses Claude Vision API)
   - Analyze: color accuracy, composition, brand alignment
   
4. **Iterate if Needed**
   - If quality score < 85%: refine prompt and regenerate
   - Maximum 5 iterations to prevent runaway loops
   - Track improvements across iterations
   
5. **Finalize**
   - Save approved image to: projects/<name>/scenes/
   - Generate metadata: prompt, scores, iteration count
   - Log for future reference and learning

## Quality Thresholds

Images must meet these criteria before approval:
- Color Accuracy: Brand colors present and prominent (>80%)
- Composition: Clean, uncluttered, professional (>85%)
- Brand Alignment: Matches tone and visual identity (>90%)
- Overall Quality: Combined score >85%

## Custom Tools

### generate_brand_image
Custom tool (outside sandbox) that calls Replicate API for image generation.
Claude invokes this tool with brand-aware prompts.

# Tool Parameters:
{
  "scene_description": "minimalist workspace, emerald accents",
  "project_name": "campaign-2024",
  "generation_brief": "Modern, professional office setting"
}

### critique_brand_image
Custom tool that uses Claude Vision API to analyze brand alignment.
Returns scores for color, composition, style, and overall brand fit.

# Tool Parameters:
{
  "image_path": "projects/campaign-2024/scenes/v1.jpg",
  "project_name": "campaign-2024"
}

## When to Apply

Use this Skill when:
- Creating hero images for campaigns
- Generating social media visuals
- Producing marketing collateral
- Developing website imagery
- Any brand-facing visual asset

## Resources

- brand-guidelines/ - Complete visual identity documentation
- tools/ - Generation and critique scripts
- projects/ - Output directory for approved images

So... I was saying, we need a "Product" here. Static code examples are so 2019, what is this? A coding blog? No, this is me rambling, and building products.

Interactive demo

Prototype UI to illustrate the flow.

Brand Consistency Engine · Live Demo

Skills + Custom Tools Hybrid Pattern

v2.0

What would you like to create?

Step 1 of 6

Just describe your vision - AI handles generation, critique, and refinement automatically.

Project Name

Image Description

Brand Colors

3 colors

#4A90E2

Blue

#7B68EE

Purple

#E0E0E0

Gray

✓ Brand guidelines loaded · Minimalist aesthetic · 30%+ negative space

Style References

5 assets loaded

Loading Brand Guidelines

Step 2 of 6

SANDBOX

Loading Skill into knowledge environment...

Claude can now read and reason about brand standards

SKILL.md Workflow instructions

color-systems.md #10b981, #14b8a6

composition-rules.md 30%+ negative space

style-references.md Scandinavian modern

tone-and-mood.md Professional yet approachable

Knowledge layer active

· No network access yet

The Skill provides instructions and context - but can't execute actions like API calls. That's where Custom Tools come in next.

Generating Image

Step 3 of 6

EXTERNAL

Tool call: generate_brand_image()

Prompt enriched with brand requirements from Skill

→ Exiting sandbox environment...

→ Calling Replicate API (nano-banana model)

→

Prompt:

Waiting for image generation...

Generating with brand colors #10b981, #14b8a6... 65%

EXITS SANDBOX · External network call to Replicate

Current iteration of

Critiquing Against Brand

Step 4 of 6

EXTERNAL

Tool call: critique_brand_image()

Claude Vision analyzes image with Skill context loaded

→ Exiting sandbox environment...

→ Calling Claude Vision API

→ Brand guidelines loaded in context

Analyzing brand alignment...

Color Accuracy

Composition

Style Match

Brand Alignment

EXITS SANDBOX · Vision API with Skill context

The critique tool uses Claude Vision API with the Skill loaded, so it has access to brand context while analyzing the image.

Critique Results

Step 5 of 6

SANDBOX

Claude analyzing critique results

Color /100

Comp /100

Style /100

Brand /100

Overall Score

Target: 85+

/100

Issues

Fixes

Generation Complete

Step 6 of 6

Brand-compliant image generated!

Score: /100 · iteration(s)

Color

Comp

Style

Brand

Strengths

Work in Progress

In Progress

If you made it all the way here, thanks! You're a part of an active deep-dive. I am exploring the concept of Claude skills right now at this very moment, while you're reading this. You might want to check back in a few days, or a week from now, to see where this went. Stay tuned!

So, the above of course is a mini‑product; what is missing now is making it stable, adaptable as well as public. Without too much time spent on it, because of course, in a few weeks from now some free tool will come out that does the same. Opportunity cost.... We are here to learn.

However, I am starting to get a sense for what I am really hoping to build. As you can see, this blog post, and my own website has a lot of AI generated images - these are all hand generated; often taking a lot of time to get what I really want. It is A LOT harder to generate them even with the small tool built above. Now, what if I can build something to automate something like that?

What about animations?

Edwin

Think in frames: generate 24–60 images per second, and every few frames audit the shot for scene stability and temporal consistency. If drift creeps in, auto‑correct prompts/controls before continuing. This can also hand off to models like VEO or Sora for motion interpolation/upsampling.

Generate at 24–60 fps; keep camera/lighting locked unless the shot list calls for change.
Every 3–5 frames, run “scene stability” + “temporal consistency” checks to catch drift early.
Claude Skill runs a generator⇄critique loop to adjust prompt/seed/control‑nets automatically.
Optionally pass clean keyframes to VEO/Sora for smoother motion and timing.

24–60 fps stability checks critique loop

What we learned

The initial dream: Pure Skills that handle everything - markdown files with embedded Python calling APIs directly. Simple, elegant, "just instructions."

The reality: Skills run in a sandboxed environment with no network access. They can't call external APIs like Replicate or Claude Vision directly. That's a hard constraint, not a temporary limitation.

The solution: The hybrid pattern - Skills provide knowledge (brand guidelines, workflows, decision logic) while custom tools provide execution (API calls, external operations). Claude reads the Skill and coordinates tool usage.

Why this matters: This pattern is simpler than multi-agent orchestration (single Claude instance, no coordination overhead), more maintainable (guidelines in version-controlled markdown, tools are focused API wrappers), and more debuggable (one conversation thread, clear tool call boundaries).

Not "just markdown" as we hoped, but still a significant improvement over complex multi-agent setups. The future might bring fully autonomous Skills with network access - but for now, the hybrid pattern is the right tool for the job.

Links

Have thoughts or questions? Get in touch.

Exploring Claude Agent Skills

Skills + Tools = The Hybrid Pattern

Development Timeline

The Initial Vision (And Why It Didn't Quite Work)

But Wait... What CAN Pure Skills Build?

Codebase Intelligence & Automation

Project Setup

Code Analysis

Batch Operations

Report Generation

Transform, Clean, Analyze (All Locally)

Data Cleaning & Transform

Statistical Analysis

Data Validation

Documentation & Content Processing

Documentation Generation

Content Migration

Markdown Processing

Template-Based Generation

Testing, Analysis & Refactoring

Test Generation & Coverage

Code Complexity Analysis

Automated Refactoring

The Hybrid Pattern Explained

Why Not Pure Skills?

The Sandbox Constraints

What This Means for Our Use Case:

What does work!

The Pattern: Skills = Brain, Tools = Hands

Three Approaches Compared

Pure Skills

Can Do

Can't Do

Multi-Agent

Can Do

Trade-offs

Hybrid Pattern

Sweet Spot

Requirements

When to Use Each Approach

Why Hybrid Wins (For This Use Case)

When You'd Still Need Multi-Agent

Real Example: Brand Image Generation

Pure Skills ✗

Multi-Agent ⚠️

Hybrid ✓

The Bottom Line

A Note on Multi-Agent Systems

The Spec

I do quite like this ...

Progressive Asset Library

The smallest useful pieces

Interactive demo

Brand Consistency Engine · Live Demo

What would you like to create?

Loading Brand Guidelines

Generating Image

Critiquing Against Brand

Critique Results

Issues

Fixes

Generation Complete

Strengths

Work in Progress

What about animations?

Links

Want to discuss AI integration for your project?