
From Prompts to Training: The Next Frontier
Throughout 2024-2025, most of my effort in creating AI systems has heavily relied on effective prompt engineering. Spending hours per day with AI (mostly Cursor or Claude Desktop), it feels like I have become very proficient at doing so. However, my primary concern is that I am hitting a wall here. Sure, I can wrangle myself around what a model can and cannot do with a series of prompts, guard-rails and what-not. But at the very core of an agent there is an LLM with intentions that somewhat go against the grain of what I am trying to enforce.
AI Training Journey: From Prompts to Models
Evolving...-
1
The Realizationāprompt engineering alone isn't sufficient for sophisticated AI systems
-
2
What I Keep Running Intoāfighting against model's core training with elaborate workarounds
-
~
Starting to train my first model
Part 1: The Realization
This became crystal clear while researching adversarial AI prompting techniques. The deeper I dug into the research, the more obvious it became: prompt engineering alone isnāt sufficient for the kind of AI systems I want to build.
The research is pretty clear on this:
- Prompts are volatile and model-dependentāwhat works on one model fails on others
- They canāt overcome fundamental model limitations or architectural constraints
- Thereās always this human bottleneck where Iām fighting against the modelās base training
When Iām trying to create AI agents that genuinely challenge thinking or behave in sophisticated ways, I keep running into the same issue: Iām essentially asking the model to act against its core training. Itās like trying to convince someone to argue against their deeply held beliefsāthey might do it, but it never feels authentic or robust.
What I Keep Running Into
The most frustrating part is when I build these elaborate multi-agent systems with careful orchestration, and they work beautifully in demos. But then real-world complexity hits, and Iām back to tweaking prompts, adding more guardrails, writing longer instructions. Itās like building increasingly complex workarounds instead of addressing the fundamental issue.
The models want to be helpful, harmless, and honest in very specific ways that often conflict with what Iām trying to achieve. When I want an AI that:
- Challenges assumptions rather than validates them
- Maintains uncertainty instead of providing confident answers
- Prioritizes truth-seeking over user satisfaction
- Adapts its behavior based on performance patterns
Iām essentially fighting the RLHF training that made these models useful in the first place.
The Fine-Tuning Path
From what Iāve been reading, fine-tuning offers something fundamentally different. Instead of instructing a model to behave differently, youāre actually changing what it considers ānaturalā behavior.
Constitutional AI particularly interests meāthe idea of embedding principles directly into the modelās behavior rather than trying to enforce them through prompts. Instead of telling the model ābe more challenging,ā you train it so that intellectual challenge becomes part of its core response patterns.
Adversarial fine-tuning is another approach that caught my attention. The research on Multi-round Automatic Red-Teaming (MART) suggests you can iteratively improve model robustness by training it to both generate and defend against adversarial inputs. That feels much more robust than my current approach of anticipating problems in prompts.
The RLHF Question
RLHF is where things get really interesting. The research shows it can double accuracy on adversarial questions, but thereās a paradox: it can also increase hallucination compared to supervised fine-tuning alone.
What excites me about RLHF is the possibility of training models on my specific definition of āhelpfulā behavior. Instead of the generic helpfulness that current models optimize for, I could potentially train for:
- Constructive intellectual challenge
- Appropriate uncertainty acknowledgment
- Truth-seeking over satisfaction
- Context-aware behavioral adaptation
But the resource requirements are substantialāaround 50,000 labeled preference samples and significant human annotation costs. Thatās not exactly weekend project territory.
The Hardware Curiosity
Hereās what really gets me excited: the possibility of not just working with the software, but eventually understanding the hardware layer too. Thereās something appealing about going all the way down to first principlesāunderstanding not just how to train models, but how the computational substrate actually works.
I keep thinking about the people who built the early internet infrastructure. They didnāt just use existing protocols; they understood networking at the packet level, built their own routers, designed new architectures. Thatās the kind of deep understanding I want with AI systems.
Maybe itās naive, but Iād love to get to a point where Iām not just a sophisticated user of AI tools, but someone who actually understands the full stackāfrom hardware acceleration to training algorithms to behavioral modification techniques.
What Iām Actually Planning
For now, Iām thinking about starting small:
- Experiment with LoRA fine-tuning on specific behavioral patterns
- Try constitutional AI approaches with simple principles
- Maybe explore RLAIF (RL from AI Feedback) as a more accessible alternative to full RLHF
- Document what actually works versus what sounds good in research papers
The goal isnāt to compete with OpenAI or Anthropic. Itās to understand these systems deeply enough that I can build exactly what I envision rather than settling for clever workarounds.
So this page will be dedicated to my continued attempts to go from first principles and see where model training and tuning takes me. Quite frankly, it would excite me incredibly to not just work with the software, but going forward somewhere also with the hardware.
Let me hold this thought, to be continuedā¦
</div>
Part 2: Initial Model Training
Now that Iāve established why prompt engineering has its limits, itās time to move beyond theory and start actually training models. This section will document my journey from complete beginner to someone who understands the fundamentals of model training.
The transition from prompting existing models to training my own feels like moving from being a sophisticated user to becoming a creator. Itās exciting, intimidating, and exactly the kind of challenge that gets me up in the morning.
Evaluating My Options
Before diving into training, I need to decide between using my local hardware or exploring cloud options. Hereās what Iām weighing:
The $25 Reality Check
With a 10) plus some RunPod experimentation, or about 8-25 hours of cloud GPU time depending on the instance type. Thatās not a lot, but it might be enough to learn the fundamentals and decide if I want to invest more seriously.
My Current Thinking
Start local with the RTX 3060 for basic LoRA experiments, then use cloud resources for anything that needs more VRAM or compute power. This hybrid approach lets me learn without breaking the bank while still having access to better hardware when needed.
Actually, let me be honest about something: over the past 3 months, my AI costs have been steadily climbing and are now closing in on $500 per month. At this rate, my AI expenses are going to approach the price of a mortgage. So maybe itās time to get serious about using my RTX 3060 for local training before my cloud computing bills get completely out of hand.
And then I have some excuses to buy $5000 in hardware š
What Iām Planning to Document
Hereās what Iāll be tracking:
- My first attempts at fine-tuning (probably starting with LoRA)
- The hardware and software setup process
- What actually works vs. what sounds good in research papers
- The inevitable failures and what I learn from them
- Cost analysis and resource management strategies
This isnāt going to be a polished tutorialāitās going to be a real-time learning log. Expect mistakes, false starts, and hopefully some genuine breakthroughs along the way.
Coming soon: First training experiments and setup documentationā¦