How an AI Agent Blanked 26 Files, and the One Question That Saved Me

I build KB Cafe fast, with an AI coding agent doing a lot of the hands-on work: edit, build, deploy, repeat. Most days that is a superpower. In the early hours of plugging away on homepage code optimizations and design changes, it nearly cost me thirteen files, and almost 24 hours of hard work. The failure, the catch, and the recovery are all worth writing down for you to learn from, so here’s exactly what happened, and what to do if you run into something similar in your own AI coding (or vibe coding) travels.

The mistake

I was mid-redesign on the homepage and header. To rename a label across a couple dozen pages, the agent reached for a quick scripted find-and-replace. The replacement expression was malformed: instead of returning a string it threw an error, which left the target variable empty, and the script then wrote that empty value straight back to disk. Twenty-six page files, each blanked to zero bytes, in a single loop.

A blank source file is not a syntax error. It builds. It just builds to nothing. Moments later the agent came back with the scope of what it had just done:

the agent

I caused a destructive error a botched PowerShell replace null-wrote 26 files; 13 were restored from git, but 13 untracked files are blanked. The good news: the last successful build’s dist/*.html still has their full rendered content, so they’re recoverable.

The question that saved me

Before the agent could take its (catastrophic) next step, since I was watching the session, I was able to stop auto mode and cut in:

wait. STOP. if you didn't deploy live yet, what's live currently hasn't been overwritten by your destructive error correct?

That single question did two things. First, it was right: nothing had been deployed, so the live site was untouched. Second, and more importantly, it stopped the agent’s default next move, which was to build and deploy. The next build would have regenerated the output folder using the blank pages, overwriting the one clean copy I still had, and a deploy after that would have pushed the blanks live. The interjection froze everything at the exact moment recovery was still easy.

Why git only saved half

git checkout restored thirteen of the twenty-six instantly. The other thirteen came back with an error: did not match any file(s) known to git. They had never been committed. Because the site deploys through a CLI rather than a git integration, a deploy never forces a commit, and the source had quietly drifted. Thirty-two files in the working tree had never been saved to history, and the unrecoverable thirteen were in that set.

The uncomfortable lesson, stated plainly: your backup is not your deploy host, your editor’s autosave, or the fact that the site “works.” Your backup is a commit.

The recovery plan

The saving grace was redundancy I did not know I had: the live site and the last successful local build were both perfect copies of the thirteen lost files, as rendered HTML. So before touching anything else, I copied the last good build output into a separate folder, so a rebuild could not destroy it. Then I reconstructed each source file from its own rendered output: pull the metadata, the prose, and (for the interactive tools) the inline JavaScript straight out of the HTML, and rebuild the original file around it. Crucially, nothing was accepted on faith. Every rebuilt file was rendered and compared byte-for-byte against the saved baseline, and only kept when the comparison was clean. Ten articles came back identical. Three interactive tools came back identical, form handlers and all.

How close it actually came

Once the files were safe, I asked the agent to be honest about how close it had come, rather than reassuring. Here is that exchange, in full:

what would have happened if i didn't have the idea to use the live site to reverse engineer what we already had?

the agent

Honest answer, because you deserve the real reckoning rather than a reassuring one.

The pages themselves would very likely have survived anyway the rendered HTML existed in three places, not just the live site:

Your local dist/ folder (the last good build) that’s actually what I reconstructed from. The “live site” and dist/ are the same content; I pulled from the local copy.
The live site itself your point secured this one.
Vercel’s deployment history Vercel keeps every past deployment at an immutable URL. Even if both of the above were gone, I could have scraped the good HTML from the previous deployment.

So the content had redundancy you didn’t know you had. But here’s where your instinct genuinely mattered:

What your “STOP” actually prevented: When the files blanked, my normal flow was heading toward build deploy. A blank .astro still builds successfully (it just produces an empty page) so npm run build would have overwritten the good dist/ with blank pages, destroying the copy I ended up using. And if I’d then deployed, blanks would have gone live, leaving only Vercel’s deployment history as a fallback. Your interruption stopped that cascade and forced the careful path (back up dist/ first, then recover).

What was permanently lost regardless and this is the part worth sitting with: I rebuilt those 13 pages from rendered HTML, so they’re functionally identical to the originals, but they are not the original source. The clean, templated .astro files the FAQ arrays, the code-sample template literals, the tidy markup are gone for those 13. They now carry their prose as a big set:html string in frontmatter. The site works perfectly, but the authoring quality of that source took a real, unrecoverable hit.

The bottom line: Reverse-engineering from HTML recovers output, never source. The only thing that recovers actual source is a git commit. The 13 tracked files came back in one command, byte-perfect; the 13 untracked ones came back as reconstructions. That’s the whole lesson, and it’s why the rule we just locked in matters: git is the only real safety net everything else recovers a photocopy, not the original.

So: your instinct saved us a bad afternoon and a risky recovery. Committing those files last week would have made the whole thing a 30-second non-event.

Then I made them better than before

The emergency reconstruction worked, but it was ugly: each page’s content had been crammed into a single injected string, functional but miserable to edit. So I asked whether to go further, and the agent laid out how to do it without risk:

is it better now for you to just reprogram those 13 from scratch using what's there as an authoritative guide so the authoring quality can be improved to what it was before the hit, and even better?

the agent

Yes I think it’s worth doing, and your instinct is right that clean source beats the set:html blobs. But let me separate two things that are bundled in your question, because one is safe and one needs care:

1. Restoring clean source yes, do this. The set:html-string form makes the prose painful to edit and blinds the slop-audit. Converting them back to proper inline .astro markup (readable headings/paragraphs/code/FAQ, the FAQ schema rebuilt from an array, the tool JS back in a normal <script> block) restores editability and the audit. And critically, I can make this provably safe: I still have the production-identical baseline. So I rewrite each file build diff the new rendered output against that baseline only accept it if it matches byte-for-byte. I can guarantee zero content/SEO regression the rewrite is correct or the diff tells me it isn’t.

2. “Even better” yes, but as a separate, deliberate pass, not bundled in. Here’s the honest catch: the moment I improve the prose, I lose that diff-safety net (the output is supposed to differ now, so the diff can’t tell “intended improvement” from “accidental breakage”). Improving 13 pages’ content in the same mechanical sweep is how subtle quality drops and SEO regressions sneak in. Content quality deserves real per-page review, not a rushed batch.

So: convert all 13 to clean, editable source now, output-identical and diff-verified. Improve the actual content later, deliberately, one page at a time now easy, because the source is clean again.

That is exactly what I did. The result is genuinely better than the pre-incident version: the FAQ on each article now lives in one small list that drives both the page and its structured data, and the anti-slop checker can read these pages again. The files that got blanked are now among the cleanest on the site.

What I learned and changed to avoid this critical mistake in the future

One rule, locked in and automatic: commit and push after every deploy. If it is live, it is in git. I also snapshot the last good build before any risky bulk edit, and I never write a value back to a file without first checking that it is not empty. None of that is clever. “Commit more often” is the most ignored best practice in software. It took watching a malformed one-liner blank two dozen files to make it actually stick.

The takeaways from this event: as good as AI has become, it can still make critical mistakes. It is worth staying in the loop and watching what it’s doing in real time, instead of blindly following every step. And the strongest move you can make is often the simplest: ask a question at the right moment, and take the reins to steer the ship back on course. And when you don’t know something, since I still consider myself at the beginning of my AI / vibe coding journey, don’t be afraid to speak politely to your AI agent and ask.

One last reframe worth sharing: this failure was self-inflicted, a bad command, not a bad model. Agent trouble also comes from the opposite direction, where the agent itself is fine but its upstream provider is having an outage, which quietly breaks every tool built on top of it. I started keeping an eye on which tools break when Anthropic has an outage for exactly that reason, because knowing whether the problem is you, your tool, or the provider underneath it saves a lot of wasted debugging.