Building favicon.dog with Shelley on exe.dev
I’ve needed a reliable favicon service more times than I’d like to admit. Every time I build something that displays a list of domains (bookmarks, link aggregators, dashboards) I end up writing the same janky favicon-fetching code. I finally decided to make a proper, reusable solution. And I decided to build it entirely through conversation with Shelley, the AI coding agent built into exe.dev.
The result is favicon.dog. A simple API that fetches, caches, and serves favicons. Good boy fetches your favicons. Caches them too.
First Contact
exe.dev gives you a persistent VM accessible via SSH with an AI agent (Shelley) that can write code, run commands, manage files, take browser screenshots, and deploy, all from the same shared workspace. You talk to the agent in natural language, it does the work on the VM, and you can see the results immediately.
My first prompt was deliberately broad:
“Create a lightweight tool in Go, using SQLite for data and S3 for image storage. Users should be able to query the URL with a query parameter that has the target URL which provides the favicon for that website.”
Within 12 minutes, the first commit landed. A working Go service with favicon discovery (HTML parsing for <link> tags, /favicon.ico fallback), SQLite storage, and S3 image storage.
From there it was rapid-fire iteration, each prompted by a single question:
What happens if a site doesn't exist?
Negative caching
Can you add a placeholder?
Placeholder SVG for missing favicons
Would it be reasonable to return available favicons in the header?
RFC 8688 Link headers
Could you set a capture time and keep the history?
Stale-while-revalidate with icon history
6 commits in 40 minutes. Each one building on the previous.
Zero to Production in an Afternoon
By the end of Day 1 the service had:
- Favicon fetching with multiple discovery strategies
- SQLite caching with stale-while-revalidate refresh
- SSRF protection (blocking private IP ranges, loopback, link-local)
- Two-tier rate limiting (aggressive on fetch, generous on serving cached icons)
- A batch endpoint that accepts up to 100,000 domains
- Cloudflare R2 CDN for serving icons via redirect
- A branded UI with the warm, earthy favicon.dog design
- GitHub Actions deploy pipeline to a DigitalOcean droplet
- 10,000 domains pre-loaded
25 commits. One afternoon.
The Process
The thing that surprised me most wasn’t the speed. It was how natural the workflow felt.
Conversations as Architecture
The best design decisions came from open-ended questions. Not “implement X” but “what do you think about X?” Shelley would assess the situation, propose options, and we’d land on something together.
What are the current protections?
Audited the codebase, found missing SSRF protection
I expect people will embed these URLs in their front end, so a private API key won't work
Two-tier rate limit design
Would it be better if I used Cloudflare instead?
CDN migration from DO Spaces to R2
That SSRF one is a good example. I didn’t know I had a gap. I just asked “what are the current protections?” and Shelley audited the whole codebase, identified the missing coverage, and implemented it. The question surfaced the problem.
The Persistent VM Changes Everything
exe.dev gives you a VM that sticks around. Shelley runs on it, builds on it, deploys from it. There’s no “works on my machine” because we’re literally on the same machine.
This matters more than it sounds. When I shared a screenshot of the mobile layout being cramped, Shelley could see the same dev server I was looking at, inspect the CSS, and push a fix. When we hit SQLite concurrency issues under batch load, Shelley could reproduce it, fix it, and retest without me describing the environment. The shared context is the whole point.
Short Tasks, Short Conversations
Not every conversation was a big architecture session. Most were small:
- Paw print color change? One conversation, 3 minutes.
- Rate limit bump from 10 to 100? One conversation, 1 minute.
- Remove last name from footer? Done before I finished my coffee.
This felt natural. Like asking a colleague sitting next to you to make a quick change. The conversation overhead is low enough that it’s not worth batching things up.
While I glorified the shared context earlier, controlling it is still essential. Small conversations that reference specific code, features, or other conversations are a form of context disclosure.
Screenshot-Driven Development
I shared screenshots constantly. Design mockups, mobile layouts, logo issues, production bugs. Shelley could see what I was seeing and respond precisely. “The border is a bit too tight to the image” and it was fixed by scaling proportionally. No back-and-forth describing the problem.
This is a strength of Shelley vs other AI harnesses. Exploring the browser implementation, you can see that this is a minimal CDP wrapper that exposes tools in a very precise way. Orders of magnitude smaller than your typical playwright or selenium MCPs.
Where I Still Needed to Pay Attention
Shelley is pretty great, but I still had to stay engaged. A few things came up that required human judgment:
- Code review catches what generation misses. The rate limiter config used
0to mean both “use default” and “unlimited” in different contexts. The code was clean and well-structured. The semantic conflict was subtle. I caught it in review. - Security code needs slow, careful attention. The SSRF protection had an IPv6 edge case that was too broad. Fast iteration is great for features; security benefits from a slower loop.
- Creative assets required more iteration. Logos, branding, visual design. These took more rounds of back-and-forth than pure code changes. The subjective stuff is harder to get right in one shot.
By the Numbers
| Metric | Value |
|---|---|
| Conversations | 40 |
| Commits | 43 |
| Active days | 5 |
| Day 1 commits | 25 |
| Lines of Go | ~5,400 |
| Lines of React/TS | ~850 |
| DB migrations | 5 |
| Domains batch-loaded | 20,000+ |
Under the Hood with Shelley
| Metric | Value |
|---|---|
| API requests | 1,807 |
| Tokens processed | ~80M |
| Output tokens generated | ~490K |
| Primary model | Claude Opus 4.6 (96% of requests) |
| Day 1 alone | 968 requests, 254K output tokens |
Cache reads dominate the token count; most of the ~80M tokens are prompt caching rather than new generation.
Over Days 2 through 5 the project continued to evolve: a full React + Vite + Tailwind migration (replacing a 924-line inline HTML file), a dedicated docs page, an async analytics system, and a proper rate limiting package with per-referrer overrides.
The Takeaway
The combination of a persistent VM and Shelley brings about a certain Gestalt quality that I’ve not been able to identify. My theory is that there is a combination of prompts, tool design, controlled ecosystem (exe.dev container) and guided user experience that creates a sum greater than its parts.
40 conversations over 5 days. Some were 30-second asks, some were hour-long architecture sessions. The mix felt right. I stayed in the driver’s seat on design decisions and code review while Shelley handled the bulk of implementation.
I’m pretty happy with how this turned out. Check it out at favicon.dog. The API is simple: GET https://favicon.dog/icon?url=github.com and you get a favicon back. Good boy.