Human-AI Collaboration

From Prompts to Training: The Next Frontier


Throughout 2024-2025, most of my effort in creating AI systems has heavily relied on effective prompt engineering. Spending hours per day with AI (mostly Cursor or Claude Desktop), it feels like I have become very proficient at doing so. However, my primary concern is that I am hitting a wall here. Sure, I can wrangle myself around what a model can and cannot do with a series of prompts, guard-rails and what-not. But at the very core of an agent there is an LLM with intentions that somewhat go against the grain of what I am trying to enforce.

AI Training Journey: From Prompts to Models

Evolving...
Tracking the progression from prompt engineering mastery to recognizing the need for deeper model training approaches. Each step represents a growing understanding of AI system limitations and possibilities.
  • 1
    The Realization—prompt engineering alone isn't sufficient for sophisticated AI systems
  • 2
    What I Keep Running Into—fighting against model's core training with elaborate workarounds
  • ~
    Starting to train my first model

Part 1: The Realization

This became crystal clear while researching adversarial AI prompting techniques. The deeper I dug into the research, the more obvious it became: prompt engineering alone isn’t sufficient for the kind of AI systems I want to build.

The research is pretty clear on this:

  • Prompts are volatile and model-dependent—what works on one model fails on others
  • They can’t overcome fundamental model limitations or architectural constraints
  • There’s always this human bottleneck where I’m fighting against the model’s base training

When I’m trying to create AI agents that genuinely challenge thinking or behave in sophisticated ways, I keep running into the same issue: I’m essentially asking the model to act against its core training. It’s like trying to convince someone to argue against their deeply held beliefs—they might do it, but it never feels authentic or robust.

What I Keep Running Into

The most frustrating part is when I build these elaborate multi-agent systems with careful orchestration, and they work beautifully in demos. But then real-world complexity hits, and I’m back to tweaking prompts, adding more guardrails, writing longer instructions. It’s like building increasingly complex workarounds instead of addressing the fundamental issue.

The models want to be helpful, harmless, and honest in very specific ways that often conflict with what I’m trying to achieve. When I want an AI that:

  • Challenges assumptions rather than validates them
  • Maintains uncertainty instead of providing confident answers
  • Prioritizes truth-seeking over user satisfaction
  • Adapts its behavior based on performance patterns

I’m essentially fighting the RLHF training that made these models useful in the first place.

The Fine-Tuning Path

From what I’ve been reading, fine-tuning offers something fundamentally different. Instead of instructing a model to behave differently, you’re actually changing what it considers ā€œnaturalā€ behavior.

Constitutional AI particularly interests me—the idea of embedding principles directly into the model’s behavior rather than trying to enforce them through prompts. Instead of telling the model ā€œbe more challenging,ā€ you train it so that intellectual challenge becomes part of its core response patterns.

Adversarial fine-tuning is another approach that caught my attention. The research on Multi-round Automatic Red-Teaming (MART) suggests you can iteratively improve model robustness by training it to both generate and defend against adversarial inputs. That feels much more robust than my current approach of anticipating problems in prompts.

The RLHF Question

RLHF is where things get really interesting. The research shows it can double accuracy on adversarial questions, but there’s a paradox: it can also increase hallucination compared to supervised fine-tuning alone.

What excites me about RLHF is the possibility of training models on my specific definition of ā€œhelpfulā€ behavior. Instead of the generic helpfulness that current models optimize for, I could potentially train for:

  • Constructive intellectual challenge
  • Appropriate uncertainty acknowledgment
  • Truth-seeking over satisfaction
  • Context-aware behavioral adaptation

But the resource requirements are substantial—around 50,000 labeled preference samples and significant human annotation costs. That’s not exactly weekend project territory.

The Hardware Curiosity

Here’s what really gets me excited: the possibility of not just working with the software, but eventually understanding the hardware layer too. There’s something appealing about going all the way down to first principles—understanding not just how to train models, but how the computational substrate actually works.

I keep thinking about the people who built the early internet infrastructure. They didn’t just use existing protocols; they understood networking at the packet level, built their own routers, designed new architectures. That’s the kind of deep understanding I want with AI systems.

Maybe it’s naive, but I’d love to get to a point where I’m not just a sophisticated user of AI tools, but someone who actually understands the full stack—from hardware acceleration to training algorithms to behavioral modification techniques.

What I’m Actually Planning

For now, I’m thinking about starting small:

  • Experiment with LoRA fine-tuning on specific behavioral patterns
  • Try constitutional AI approaches with simple principles
  • Maybe explore RLAIF (RL from AI Feedback) as a more accessible alternative to full RLHF
  • Document what actually works versus what sounds good in research papers

The goal isn’t to compete with OpenAI or Anthropic. It’s to understand these systems deeply enough that I can build exactly what I envision rather than settling for clever workarounds.

So this page will be dedicated to my continued attempts to go from first principles and see where model training and tuning takes me. Quite frankly, it would excite me incredibly to not just work with the software, but going forward somewhere also with the hardware.

Let me hold this thought, to be continued…

</div>

Part 2: Initial Model Training

Now that I’ve established why prompt engineering has its limits, it’s time to move beyond theory and start actually training models. This section will document my journey from complete beginner to someone who understands the fundamentals of model training.

The transition from prompting existing models to training my own feels like moving from being a sophisticated user to becoming a creator. It’s exciting, intimidating, and exactly the kind of challenge that gets me up in the morning.

Evaluating My Options

Before diving into training, I need to decide between using my local hardware or exploring cloud options. Here’s what I’m weighing:

OptionCostHardwareVRAMProsCons
Local RTX 3060$0/monthRTX 306014GB
  • No ongoing costs
  • Full control
  • Privacy
  • Limited VRAM
  • Ties up my main machine
  • No scalability
Google Colab Pro$10/monthT4/V100 varies16GB+
  • Easy setup
  • Jupyter notebooks
  • Better GPUs available
  • Session limits
  • Data transfer overhead
  • Inconsistent hardware
RunPod~$0.50/hourRTX 409024GB
  • High-end GPUs
  • Pay per use
  • Scalable
  • Costs add up
  • Setup complexity
  • Data management
AWS Spot Instances~$1-3/hourVarious16-80GB
  • Enterprise grade
  • Massive scalability
  • Spot pricing
  • Complex setup
  • Can be terminated
  • Steep learning curve

The $25 Reality Check

With a 10) plus some RunPod experimentation, or about 8-25 hours of cloud GPU time depending on the instance type. That’s not a lot, but it might be enough to learn the fundamentals and decide if I want to invest more seriously.

My Current Thinking

Start local with the RTX 3060 for basic LoRA experiments, then use cloud resources for anything that needs more VRAM or compute power. This hybrid approach lets me learn without breaking the bank while still having access to better hardware when needed.

Actually, let me be honest about something: over the past 3 months, my AI costs have been steadily climbing and are now closing in on $500 per month. At this rate, my AI expenses are going to approach the price of a mortgage. So maybe it’s time to get serious about using my RTX 3060 for local training before my cloud computing bills get completely out of hand.

And then I have some excuses to buy $5000 in hardware šŸ˜‚

What I’m Planning to Document

Here’s what I’ll be tracking:

  • My first attempts at fine-tuning (probably starting with LoRA)
  • The hardware and software setup process
  • What actually works vs. what sounds good in research papers
  • The inevitable failures and what I learn from them
  • Cost analysis and resource management strategies

This isn’t going to be a polished tutorial—it’s going to be a real-time learning log. Expect mistakes, false starts, and hopefully some genuine breakthroughs along the way.

Coming soon: First training experiments and setup documentation…