From Prompts to Training: The Next Frontier

Throughout 2024-2025, most of my effort in creating AI systems has heavily relied on effective prompt engineering. Spending hours per day with AI (mostly Cursor or Claude Desktop), it feels like I have become very proficient at doing so. However, my primary concern is that I am hitting a wall here. Sure, I can wrangle myself around what a model can and cannot do with a series of prompts, guard-rails and what-not. But at the very core of an agent there is an LLM with intentions that somewhat go against the grain of what I am trying to enforce.

AI Training Journey: From Prompts to Models

Evolving...

Tracking the progression from prompt engineering mastery to recognizing the need for deeper model training approaches. Each step represents a growing understanding of AI system limitations and possibilities.

1
The Realization—prompt engineering alone isn't sufficient for sophisticated AI systems
2
What I Keep Running Into—fighting against model's core training with elaborate workarounds
~
Starting to train my first model

Part 1: The Realization

This became crystal clear while researching adversarial AI prompting techniques. The deeper I dug into the research, the more obvious it became: prompt engineering alone isn’t sufficient for the kind of AI systems I want to build.

The research is pretty clear on this:

Prompts are volatile and model-dependent—what works on one model fails on others
They can’t overcome fundamental model limitations or architectural constraints
There’s always this human bottleneck where I’m fighting against the model’s base training

When I’m trying to create AI agents that genuinely challenge thinking or behave in sophisticated ways, I keep running into the same issue: I’m essentially asking the model to act against its core training. It’s like trying to convince someone to argue against their deeply held beliefs—they might do it, but it never feels authentic or robust.

What I Keep Running Into

The most frustrating part is when I build these elaborate multi-agent systems with careful orchestration, and they work beautifully in demos. But then real-world complexity hits, and I’m back to tweaking prompts, adding more guardrails, writing longer instructions. It’s like building increasingly complex workarounds instead of addressing the fundamental issue.

The models want to be helpful, harmless, and honest in very specific ways that often conflict with what I’m trying to achieve. When I want an AI that:

Challenges assumptions rather than validates them
Maintains uncertainty instead of providing confident answers
Prioritizes truth-seeking over user satisfaction
Adapts its behavior based on performance patterns

I’m essentially fighting the RLHF training that made these models useful in the first place.

The Fine-Tuning Path

From what I’ve been reading, fine-tuning offers something fundamentally different. Instead of instructing a model to behave differently, you’re actually changing what it considers “natural” behavior.

Constitutional AI particularly interests me—the idea of embedding principles directly into the model’s behavior rather than trying to enforce them through prompts. Instead of telling the model “be more challenging,” you train it so that intellectual challenge becomes part of its core response patterns.

Adversarial fine-tuning is another approach that caught my attention. The research on Multi-round Automatic Red-Teaming (MART) suggests you can iteratively improve model robustness by training it to both generate and defend against adversarial inputs. That feels much more robust than my current approach of anticipating problems in prompts.

The RLHF Question

RLHF is where things get really interesting. The research shows it can double accuracy on adversarial questions, but there’s a paradox: it can also increase hallucination compared to supervised fine-tuning alone.

What excites me about RLHF is the possibility of training models on my specific definition of “helpful” behavior. Instead of the generic helpfulness that current models optimize for, I could potentially train for:

Constructive intellectual challenge
Appropriate uncertainty acknowledgment
Truth-seeking over satisfaction
Context-aware behavioral adaptation

But the resource requirements are substantial—around 50,000 labeled preference samples and significant human annotation costs. That’s not exactly weekend project territory.

The Hardware Curiosity

Here’s what really gets me excited: the possibility of not just working with the software, but eventually understanding the hardware layer too. There’s something appealing about going all the way down to first principles—understanding not just how to train models, but how the computational substrate actually works.

I keep thinking about the people who built the early internet infrastructure. They didn’t just use existing protocols; they understood networking at the packet level, built their own routers, designed new architectures. That’s the kind of deep understanding I want with AI systems.

Maybe it’s naive, but I’d love to get to a point where I’m not just a sophisticated user of AI tools, but someone who actually understands the full stack—from hardware acceleration to training algorithms to behavioral modification techniques.

What I’m Actually Planning

For now, I’m thinking about starting small:

Experiment with LoRA fine-tuning on specific behavioral patterns
Try constitutional AI approaches with simple principles
Maybe explore RLAIF (RL from AI Feedback) as a more accessible alternative to full RLHF
Document what actually works versus what sounds good in research papers

The goal isn’t to compete with OpenAI or Anthropic. It’s to understand these systems deeply enough that I can build exactly what I envision rather than settling for clever workarounds.

So this page will be dedicated to my continued attempts to go from first principles and see where model training and tuning takes me. Quite frankly, it would excite me incredibly to not just work with the software, but going forward somewhere also with the hardware.

Let me hold this thought, to be continued…

</div>

Part 2: Initial Model Training

Now that I’ve established why prompt engineering has its limits, it’s time to move beyond theory and start actually training models. This section will document my journey from complete beginner to someone who understands the fundamentals of model training.

The transition from prompting existing models to training my own feels like moving from being a sophisticated user to becoming a creator. It’s exciting, intimidating, and exactly the kind of challenge that gets me up in the morning.

Evaluating My Options

Before diving into training, I need to decide between using my local hardware or exploring cloud options. Here’s what I’m weighing:

Option	Cost	Hardware	VRAM	Pros	Cons
Local RTX 3060	$0/month	RTX 3060	14GB	No ongoing costs Full control Privacy	Limited VRAM Ties up my main machine No scalability
Google Colab Pro	$10/month	T4/V100 varies	16GB+	Easy setup Jupyter notebooks Better GPUs available	Session limits Data transfer overhead Inconsistent hardware
RunPod	~$0.50/hour	RTX 4090	24GB	High-end GPUs Pay per use Scalable	Costs add up Setup complexity Data management
AWS Spot Instances	~$1-3/hour	Various	16-80GB	Enterprise grade Massive scalability Spot pricing	Complex setup Can be terminated Steep learning curve

The $25 Reality Check

With a $25 m o n t h l y b u d g e t, I co u l d g e tC o l ab P ro ($ 10) plus some RunPod experimentation, or about 8-25 hours of cloud GPU time depending on the instance type. That’s not a lot, but it might be enough to learn the fundamentals and decide if I want to invest more seriously.

My Current Thinking

Start local with the RTX 3060 for basic LoRA experiments, then use cloud resources for anything that needs more VRAM or compute power. This hybrid approach lets me learn without breaking the bank while still having access to better hardware when needed.

Actually, let me be honest about something: over the past 3 months, my AI costs have been steadily climbing and are now closing in on $500 per month. At this rate, my AI expenses are going to approach the price of a mortgage. So maybe it’s time to get serious about using my RTX 3060 for local training before my cloud computing bills get completely out of hand.

And then I have some excuses to buy $5000 in hardware 😂

What I’m Planning to Document

Here’s what I’ll be tracking:

My first attempts at fine-tuning (probably starting with LoRA)
The hardware and software setup process
What actually works vs. what sounds good in research papers
The inevitable failures and what I learn from them
Cost analysis and resource management strategies

This isn’t going to be a polished tutorial—it’s going to be a real-time learning log. Expect mistakes, false starts, and hopefully some genuine breakthroughs along the way.

Coming soon: First training experiments and setup documentation…

Edwin

Menu

Model Training