<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://hamzamerzic.info/feed.xml" rel="self" type="application/atom+xml"/><link href="https://hamzamerzic.info/" rel="alternate" type="text/html" hreflang="en"/><updated>2026-06-03T17:44:17+00:00</updated><id>https://hamzamerzic.info/feed.xml</id><title type="html">blank</title><subtitle>I build scalable, multimodal AI systems that learn and interact with the world. </subtitle><entry><title type="html">The agent is the kernel</title><link href="https://hamzamerzic.info/blog/2026/the-agent-is-the-kernel/" rel="alternate" type="text/html" title="The agent is the kernel"/><published>2026-06-02T12:00:00+00:00</published><updated>2026-06-02T12:00:00+00:00</updated><id>https://hamzamerzic.info/blog/2026/the-agent-is-the-kernel</id><content type="html" xml:base="https://hamzamerzic.info/blog/2026/the-agent-is-the-kernel/"><![CDATA[<details class="tldr"> <summary><strong>TL;DR.</strong> Möbius now has an app store, a handful of installable mini-apps, each a public git repo with a manifest. You install one by pasting a URL, tweak it by asking the agent, save it to your home screen, and use it offline. The store is the visible part; the point is an operating system you own and can reshape, where breaking something is cheap to undo because the system is built around recovery.</summary> <ul> <li><strong>The store</strong> is a curated starter pack, not a registry. A Möbius app is just a public repo with a <code>mobius.json</code> and an <code>index.jsx</code> entry point. Sharing one means sharing a URL.</li> <li><strong>Updates</strong> are URL-keyed. Bump the version upstream, the store shows "Update available", reinstalling patches the code and keeps your data.</li> <li><strong>Recovery</strong> is the philosophy made concrete: atomic installs that cannot half-land, a <code>/recover</code> route, and a git history of your whole instance. Breaking is allowed because it is reversible.</li> <li><strong>Offline + home screen.</strong> Apps install to your home screen as standalone PWAs and keep working with no network; writes queue and sync when you reconnect.</li> <li><strong>The honest edges.</strong> Cross-app composition and per-app rollback are not built yet. I will say so where it matters.</li> </ul> </details> <p>This is the third post about <a href="/mobius/">Möbius</a>, a personalized AI agent you self-host. The <a href="/blog/2026/mobius-an-app-that-builds-itself/">first post</a> is about the agent building the tools you ask for and editing the interface around them. The <a href="/blog/2026/the-self-improvement-harness/">second</a> is about the loop that makes it slowly better at doing that. This one is about what those apps became once they stopped being one-offs and grew a place to live.</p> <p>Calling it an app store undersells it. The store is the surface you tap; underneath it is a small operating system where the agent turns a request into software, and the apps, the data, the shell, and the rules are yours to keep, move, rewrite, or throw away.</p> <figure class="mb-diagram mb-hero"> <div class="mb-hero__inner"> <span class="mb-hero__eyebrow">Möbius · a self-hosted AI OS</span> <h2 class="mb-hero__headline">The agent is the kernel.</h2> <p class="mb-hero__sub">A self-hosted agent you grow from a single chat into apps, and from apps into an operating system you own.</p> <div class="mb-hero__thesis mb-hero__thesis--stack"> <span class="mb-hero__claim"><span class="mb-hero__claim-key">The chat</span> is the system call</span> <span class="mb-hero__claim"><span class="mb-hero__claim-key">Your apps</span> are user space</span> <span class="mb-hero__claim accent"><span class="mb-hero__claim-key">The agent</span> is the kernel</span> </div> </div> <figcaption>The framing, stated plainly.</figcaption> </figure> <h2 id="the-agent-is-the-kernel">The agent is the kernel</h2> <p>In a normal operating system the kernel is the privileged core. It owns the hardware, schedules the work, and everything you use runs on top of it. Möbius keeps that shape and swaps the core. The privileged thing in the middle is not a scheduler. It is the agent. You describe what you want; it writes the app, installs it, schedules its background jobs, and wires it into the shell. The apps are user space. The chat is the system call.</p> <p>The analogy is not exact, and it is worth saying where it bends. A real kernel does not write your programs, and a system call is a fixed, narrow interface rather than an open conversation. What carries over is the part that matters. One privileged layer sits between you and the metal, everything routes through it, and nothing reaches your data or your hardware except by asking it. This one also happens to write the software.</p> <figure class="mb-diagram"> <div class="mb-stack"> <div class="mb-layer"><span class="mb-layer__name">Your apps</span><span class="mb-layer__role">user space · News, Workout, Visited, …</span></div> <div class="mb-layer"><span class="mb-layer__name">The shell</span><span class="mb-layer__role">chat · canvas · drawer · theme</span></div> <div class="mb-layer kernel"><span class="mb-layer__name">The agent</span><span class="mb-layer__role">turns a request into running software</span></div> <div class="mb-layer"><span class="mb-layer__name">Your server &amp; data</span><span class="mb-layer__role">one container · git history · storage</span></div> </div> <figcaption>The stack, with the agent where the kernel usually sits. The chat is the system call. You describe a thing, the agent builds it into the layer above, and it lands on the hardware you own at the bottom.</figcaption> </figure> <p>That changes what the primitives are. An app is not a binary you trust and cannot inspect; it is a single file of source the agent (or you) can rewrite in place. An update is a new version of that file. Installing is a transaction the platform can roll back. The rest of this post walks those primitives one at a time, and marks where each is solid versus where it is still a plan.</p> <h2 id="the-store-is-a-starter-pack-not-a-registry">The store is a starter pack, not a registry</h2> <p>The app store is itself a Möbius app. On first boot the platform installs it through the exact same path you will use for everything else, which is the first sign that there is no privileged install channel hiding somewhere.</p> <figure class="mb-diagram"> <div class="mb-catalog"> <div class="mb-app"> <img class="mb-app__icon" src="/assets/img/mobius/app-icons/news.png" alt="News app icon" loading="lazy"/> <div class="mb-app__body"> <div class="mb-app__head"> <span class="mb-app__name">News</span> <span class="mb-node__tag">runs daily</span> </div> <span class="mb-app__desc">An AI-curated morning digest, written for you by a background job at 10:00.</span> </div> </div> <div class="mb-app"> <img class="mb-app__icon" src="/assets/img/mobius/app-icons/gym.png" alt="Workout app icon" loading="lazy"/> <div class="mb-app__body"> <div class="mb-app__head"> <span class="mb-app__name">Workout</span> <span class="mb-node__tag">on-device</span> </div> <span class="mb-app__desc">A natural-language workout logger. Type "3×5 deadlift at 100kg" and it parses the sets, all on your device.</span> </div> </div> <div class="mb-app"> <img class="mb-app__icon" src="/assets/img/mobius/app-icons/countries.png" alt="Visited app icon" loading="lazy"/> <div class="mb-app__body"> <div class="mb-app__head"> <span class="mb-app__name">Visited</span> <span class="mb-node__tag">offline</span> </div> <span class="mb-app__desc">A draggable 3D globe; tap the countries you have been to and watch the count climb toward 195.</span> </div> </div> <div class="mb-app"> <img class="mb-app__icon" src="/assets/img/mobius/app-icons/mind.png" alt="Mind app icon" loading="lazy"/> <div class="mb-app__body"> <div class="mb-app__head"> <span class="mb-app__name">Mind</span> <span class="mb-node__tag">memory</span> </div> <span class="mb-app__desc">An Obsidian-style graph of what Möbius has learned about you, the agent's own memory made browsable.</span> </div> </div> <div class="mb-app"> <img class="mb-app__icon" src="/assets/img/mobius/app-icons/latex.png" alt="LaTeX app icon" loading="lazy"/> <div class="mb-app__body"> <div class="mb-app__head"> <span class="mb-app__name">LaTeX</span> <span class="mb-node__tag">AI</span> </div> <span class="mb-app__desc">An Overleaf-style editor with a file drawer and a real tectonic engine, where an AI sub-agent writes <code>.tex</code> as you watch it typeset.</span> </div> </div> <div class="mb-app"> <img class="mb-app__icon" src="/assets/img/mobius/app-icons/dreaming.png" alt="Dreaming app icon" loading="lazy"/> <div class="mb-app__body"> <div class="mb-app__head"> <span class="mb-app__name">Dreaming</span> <span class="mb-node__tag">nightly</span> </div> <span class="mb-app__desc">Overnight, Möbius interviews the agents that worked that day and writes itself notes for tomorrow.</span> </div> </div> </div> <figcaption>The curated catalog, the starter-pack apps the in-app store installs.</figcaption> </figure> <p>What is in it today is a hand-picked set, not a gate you have to pass:</p> <div class="table-wrap"> <table> <thead> <tr> <th>App</th> <th>What it does</th> </tr> </thead> <tbody> <tr> <td><strong>News</strong></td> <td>A daily AI-curated digest. A background job wakes at 10:00, runs the agent with web search only, and writes the morning’s report.</td> </tr> <tr> <td><strong>Workout</strong></td> <td>A natural-language workout logger. Type what you did, like “3×5 deadlift at 100kg”, and it parses the sets. No agent, no cloud, all on your device.</td> </tr> <tr> <td><strong>Visited</strong></td> <td>A draggable 3D globe; tap the countries you have been to and the count climbs toward 195.</td> </tr> <tr> <td><strong>Mind</strong></td> <td>An Obsidian-style graph of what Möbius has learned about you, the agent’s own memory made browsable.</td> </tr> <tr> <td><strong>LaTeX</strong></td> <td>An Overleaf-style editor with a file drawer and a real tectonic engine; an AI sub-agent writes <code class="language-plaintext highlighter-rouge">.tex</code> while you watch it typeset.</td> </tr> <tr> <td><strong>Dreaming</strong></td> <td>Overnight, the agent interviews the day’s work and writes itself notes, so it starts the next day a little sharper.</td> </tr> </tbody> </table> </div> <p>Each of those is a public git repo in the <a href="https://github.com/mobius-os"><code class="language-plaintext highlighter-rouge">mobius-os</code></a> organization, named <code class="language-plaintext highlighter-rouge">app-&lt;something&gt;</code>, with a <code class="language-plaintext highlighter-rouge">mobius.json</code> manifest, an <code class="language-plaintext highlighter-rouge">index.jsx</code> entry point, and a 1024×1024 icon. The smallest apps are that single file; larger ones pull in a few more, but the manifest plus an entry point is the whole contract. There is no submission queue, no review board, no registry to be blessed by. “Publishing” an app means making a repo public and sharing the URL to its manifest. The list above is a starter pack I picked; the install button takes any manifest URL you paste, and the store warns, but does not stop you, if it comes from a host it has not seen before.</p> <figure class="mb-diagram"> <div class="mb-stack mb-files"> <div class="mb-layer kernel"> <span class="mb-layer__name"><code>index.jsx</code></span> <span class="mb-layer__role">the app itself, one React component the agent wrote</span> </div> <div class="mb-layer"> <span class="mb-layer__name"><code>mobius.json</code></span> <span class="mb-layer__role">the manifest: name, version, what it may reach</span> </div> <div class="mb-layer"> <span class="mb-layer__name"><code>icon.png</code></span> <span class="mb-layer__role">a 1024×1024 icon</span> </div> <div class="mb-layer ghost"> <span class="mb-layer__name"><code>job.js</code><span class="mb-badge">optional</span></span> <span class="mb-layer__role">a background job, if the app has one</span> </div> </div> <figcaption>What an app is, in full. One component, a manifest, an icon, an optional job. No build config, no server, no framework to learn; make the repo public and its URL is installable.</figcaption> </figure> <h3 id="what-install-actually-does">What “install” actually does</h3> <p>When you tap Install, the work happens on the server, in one transaction. The platform fetches the manifest (and because that URL can point anywhere, it will not fetch into private networks or cloud metadata endpoints, and it re-checks every redirect), then the app’s source, icon, background job if it has one, and any starter data it ships. It compiles off to the side and promotes the result to “live” only after the database row commits, so a half-written app is never something you can open. If anything fails, the whole thing rolls back and leaves nothing half-installed behind.</p> <p>That all-or-nothing property is the foundation for the next two sections. It is what lets an update patch your app in place, and it is what lets recovery treat any break as something to undo.</p> <h2 id="updates-version-bumps-you-can-see-data-you-keep">Updates: version bumps you can see, data you keep</h2> <p>An installed app remembers the URL it came from. That URL is its identity: not its name, not a version number, the URL. The store periodically checks each app’s upstream manifest, and if the version there is newer than yours, it shows an “Update available” pill.</p> <p>Tapping Update reinstalls from the same URL. The backend switches into update mode and patches the parts that should change (the code, description, permissions, icon, schedule), then recompiles and remounts. Your data is not in that list. <strong>Starter data is only seeded for keys that do not exist yet</strong>, so an update can ship new defaults without trampling your logged workouts, your visited countries, your notes.</p> <figure class="mb-diagram"> <div class="mb-flow"> <div class="mb-node"> <span class="mb-node__title">Upstream repo</span> <span class="mb-node__sub">a new version is pushed to <code>mobius.json</code></span> </div> <span class="mb-arrow">→</span> <div class="mb-node"> <span class="mb-node__title">Update available</span> <span class="mb-node__sub">the store sees a newer version than yours</span> </div> <span class="mb-arrow">→</span> <div class="mb-node"> <span class="mb-node__title">Reinstall, same URL</span> <span class="mb-node__tag">mode = update</span> <span class="mb-node__sub">matched by URL, not name or version</span> </div> <span class="mb-arrow">→</span> <div class="mb-node accent"> <span class="mb-node__title">Code patched, data kept</span> <span class="mb-node__sub">new defaults seed only missing keys</span> </div> </div> <figcaption>An update is a reinstall from the same URL. The platform patches the source, description, permissions, icon, and schedule, and leaves the data you have created untouched.</figcaption> </figure> <p>There is one sharp edge worth naming. If you ask the agent to <em>customize</em> an installed app and then tap Update, the update overwrites those customizations; there is no three-way merge. For a single-owner system that is a defensible default, and it is a real trade-off. The direction I want is one git repo per installed app, so an update becomes a merge that carries your edits forward and a conflict becomes a chat where the agent resolves it. That is designed, not built.</p> <h2 id="recovery-breaking-is-allowed-because-it-is-reversible">Recovery: breaking is allowed because it is reversible</h2> <p>An agent that can rewrite its own interface will eventually ship a CSS rule that hides the composer or a layout change that buries the drawer. The answer is not to wrap it in enough guardrails that it can never make a mistake. The answer is to make mistakes cheap to undo. Recovery has three layers:</p> <ul> <li><strong>A failed install cannot half-land.</strong> The atomic transaction from earlier restores the previous working version of the app from a snapshot. You do not get a corrupted app; you get the old one back.</li> <li><strong><code class="language-plaintext highlighter-rouge">/recover</code> is the bookmark you keep.</strong> It is a route rendered by a separate, server-side codepath the agent does not edit. It resets the shell to its baseline while keeping your chats, apps, and data. If a theme paints text the same color as the background, that page still works, because it does not go through the shell at all.</li> <li><strong>Your whole instance is a git repo.</strong> The agent commits the changes it makes to your shell, themes, app source, and schedules. When something breaks, the recovery path is the one a developer would use: read the log, find the change, restore it, except the agent does the reading.</li> </ul> <figure class="mb-diagram"> <div class="mb-flow"> <div class="mb-node"> <span class="mb-node__tag">layer 1</span> <span class="mb-node__title">Atomic install</span> <span class="mb-node__sub">a broken update cannot half-land; the previous version is restored from a snapshot</span> </div> <div class="mb-node"> <span class="mb-node__tag">layer 2</span> <span class="mb-node__title"><code>/recover</code></span> <span class="mb-node__sub">a server-side page the agent cannot edit; resets the shell, keeps chats, apps, and data</span> </div> <div class="mb-node"> <span class="mb-node__tag">layer 3</span> <span class="mb-node__title">Your instance is a git repo</span> <span class="mb-node__sub">shell, themes, app source, schedules, where the agent reads the log and restores</span> </div> </div> <figcaption>Three independent safety nets, not a single rollback button. Breaking is cheap to undo, so the agent does not have to be wrapped in guardrails that stop it from being useful.</figcaption> </figure> <p>There is no one-click “roll back this app to last week’s version” button yet; recovery today is uninstall-and-reinstall, plus <code class="language-plaintext highlighter-rouge">/recover</code>, plus the git history. When an update breaks something, you tell the agent what went wrong and it walks the commit log back to the working version, the recovery path the third layer describes, run for you.</p> <blockquote class="pull-quote"> Recovery paths should make agent mistakes cheap to inspect and repair. </blockquote> <h2 id="your-home-screen-with-or-without-möbius">Your home screen, with or without Möbius</h2> <p>An app you install is not trapped inside the chat. Each one is also served at its own address, with its own web manifest and icon, as a standalone progressive web app. Add it to your phone’s home screen and it launches like any native app: full screen, no drawer, no chat. Gym opens straight to today’s workout; Visited opens to the globe.</p> <p>Run standalone, an app has no Möbius shell around it to supply shared libraries, so the standalone page vendors its own copy of React from your own server. Nothing it needs lives on a CDN, which is exactly what lets it render with the network off, the subject of the next section.</p> <h2 id="offline-and-the-sync-that-catches-up">Offline, and the sync that catches up</h2> <p>“Works offline” is easy to claim and hard to land on a phone, so this is the part that got the most unglamorous engineering, and it holds.</p> <p>When an app is marked offline-capable, a service worker caches the shell and the app’s code, so opening it with the network off still renders the real app, not the browser’s offline page. And storage works offline for <em>every</em> app, not just the offline-capable ones. Reads come instantly from a local cache and refresh in the background, and writes you make offline queue in a local outbox and sync to your server the moment you reconnect.</p> <figure class="mb-diagram"> <div class="mb-flow"> <div class="mb-node"> <span class="mb-node__title">Offline</span> <span class="mb-node__sub">app + storage served from a local cache; the real app renders with no network</span> </div> <span class="mb-arrow">→</span> <div class="mb-node"> <span class="mb-node__title">You make changes</span> <span class="mb-node__sub">writes land in a local outbox; reads reflect them immediately</span> </div> <span class="mb-arrow">↻<span class="mb-arrow__label">reconnect</span></span> <div class="mb-node accent"> <span class="mb-node__title">Synced</span> <span class="mb-node__sub">the outbox drains to your server; last-write-wins per item</span> </div> </div> <figcaption>Log a set in airplane mode, mark a country from a plane, jot a note on the subway. The outbox catches up the moment you reconnect. Listing and chat deliberately stay online.</figcaption> </figure> <p>So your own data survives a dead connection, and I have checked it on a real phone, not a desktop pretending to be one. Two operations stay online by design (a cached <em>listing</em> could resurrect things you deleted, and chat is online-only), and conflicts are last-write-wins per item, which is right for a single owner and needs more thought for a shared one. The common case just works.</p> <h2 id="tweaking-an-app-and-the-composition-i-have-not-built">Tweaking an app, and the composition I have not built</h2> <p><strong>Tweaking an app you have is real and easy.</strong> Open it, tell the agent what you want different (a darker palette, a new column, a weekly view instead of daily), and it edits the app’s source in place and recompiles. No fork button, no project to set up; the app is one file, and the agent edits that file the same way it would write a new one. It is the same loop as building from scratch, pointed at something that already exists.</p> <p><strong>Composing several apps into a new one is not a feature yet.</strong> The idea is a health dashboard that reads across your workout tracker, calorie log, gratitude journal, and dream diary and surfaces the metrics you actually care about. The substrate exists. An app can declare that it reads another app’s data, and the backend enforces the handshake on both sides. But almost nothing uses it today, each app’s storage is scoped to itself by default, and there is no “build me an app that unifies these” flow. The foundation is there; the feature is not. When I build it, this is the example I will build it against.</p> <figure class="mb-diagram"> <div class="mb-lanes"> <div class="mb-lanes__sources"> <div class="mb-node"><span class="mb-node__title">Gym</span><span class="mb-node__sub">workouts, PRs</span></div> <div class="mb-node"><span class="mb-node__title">Calories</span><span class="mb-node__sub">intake</span></div> <div class="mb-node"><span class="mb-node__title">Gratitude</span><span class="mb-node__sub">daily notes</span></div> <div class="mb-node"><span class="mb-node__title">Dreaming</span><span class="mb-node__sub">sleep, streaks</span></div> </div> <div class="mb-lanes__join">⟶</div> <div class="mb-node ghost"> <span class="mb-badge">not built yet</span> <span class="mb-node__title">Health dashboard</span> <span class="mb-node__sub">reads across your apps and surfaces the metrics you care about</span> </div> </div> <figcaption>The composition I want and have not built. The dashed box is a promise, not a feature.</figcaption> </figure> <h2 id="building-a-good-one-in-practice">Building a good one, in practice</h2> <p>Two things make the difference when the goal is for the agent to build apps <em>well</em>, not just build them.</p> <p>The first is the introspection loop from the <a href="/blog/2026/the-self-improvement-harness/">companion post on the harness</a>: have the agent build one app, ask it <em>why</em> it made the choices it did while the transcript is still in front of it, and fold the answer back into its system prompt. Iterating the instructions is the lever, and introspection is how I find the edit worth making.</p> <p>The second is the design phase, where I do not let one model decide alone. I drive with Claude and use the <a href="https://github.com/openai/codex">Codex plugin</a> to adversarially review the design before the build starts. Two models disagreeing about an interface, a data model, or an edge case surfaces the questions a single model tends to skip; the build is cheap, so the leverage is in the design.</p> <figure class="mb-diagram"> <div class="mb-lanes mb-lanes--pair"> <div class="mb-lanes__sources"> <div class="mb-node accent"> <div class="mb-node__head"> <span class="agent-mark agent-mark--claude" aria-hidden="true"></span> <span class="mb-node__title">Claude · the driver</span> </div> <span class="mb-node__sub">proposes the design and writes the code, holding the whole plan in view</span> </div> <div class="mb-node"> <div class="mb-node__head"> <span class="agent-mark agent-mark--codex" aria-hidden="true"></span> <span class="mb-node__title">Codex · the second opinion</span> </div> <span class="mb-node__sub">ensembles alternatives and reviews adversarially, asking where this breaks</span> </div> </div> <div class="mb-lanes__join">⇄<div style="font-family:var(--global-font-mono);font-size:0.64rem;color:var(--global-text-color-light);margin-top:0.25rem;line-height:1.3">propose&nbsp;⇄&nbsp;refute,<br/>a few rounds</div></div> <div class="mb-node accent"> <span class="mb-node__title">A design that survived the critique</span> <span class="mb-node__sub">what's left once the disagreements are resolved, then the harness validates it in real use</span> </div> </div> <figcaption>How a good app gets designed, in practice: drive with Claude, pressure-test with Codex. The two argue across a few rounds; the design that comes out the other side is the one worth building, and the same introspection loop validates it once it ships.</figcaption> </figure> <h2 id="the-philosophy-under-all-of-it">The philosophy under all of it</h2> <p><strong>Code empowers the agent; it does not police it.</strong> When the agent needs to install an app, write to your shell, or schedule a job, the platform’s job is to make that <em>possible and reversible</em>, not to second-guess it. Validators show up only where a failure would be silent and catastrophic; everywhere else the lever is a clear contract and a good recovery path, not a wall.</p> <p><strong>Low floor, high ceiling, no walls.</strong> A personal tracker that stores a little data and works offline should take one sentence; the storage primitive is a convenience, not a cage, and an app that wants its own local database is free to reach for one. The one real wall right now is that apps cannot open arbitrary network connections to outside services, a deliberate security line I have not yet built a careful door through.</p> <p><strong>You own all of it.</strong> Your data is on a server you control. Your apps are files you can read. Your shell is a git repo you can revert. Nothing here is tuned to keep you engaged; the whole series has been an argument for the opposite, an assistant that builds you the thing, gets out of the way, and leaves you holding something you can keep.</p> <h2 id="where-this-goes">Where this goes</h2> <p>An app store was the obvious next thing once the agent could build apps reliably. The less obvious thing is what it turns Möbius into, a place where the unit of software is small enough for the agent to write, own, and repair, where installing and breaking are reversible, and where the privileged core turns “I wish I had a thing that…” into a thing that is there the next time you open your phone.</p> <p>The apps above are a starter pack; the interesting ones do not exist yet. If you <a href="/mobius/">deploy an instance</a> and build something, or tear one of these apart and rebuild it as something better, that is exactly the point, and I would love to see it.</p> <p>The source is on <a href="https://github.com/hamzamerzic/mobius">GitHub</a>, the app repos are under <a href="https://github.com/mobius-os"><code class="language-plaintext highlighter-rouge">mobius-os</code></a>, and the deploy button gets you a working instance in about three minutes. &lt;/content&gt; &lt;/invoke&gt;</p>]]></content><author><name></name></author><category term="software"/><summary type="html"><![CDATA[Möbius grew an app store. Install an app by pasting a URL, tweak it by asking, run it offline, save it to your home screen. The interesting part is not the store but what an editable, recoverable, single-owner operating system lets the agent do for you.]]></summary></entry><entry><title type="html">The self-improvement harness behind Möbius</title><link href="https://hamzamerzic.info/blog/2026/the-self-improvement-harness/" rel="alternate" type="text/html" title="The self-improvement harness behind Möbius"/><published>2026-04-26T13:00:00+00:00</published><updated>2026-04-26T13:00:00+00:00</updated><id>https://hamzamerzic.info/blog/2026/the-self-improvement-harness</id><content type="html" xml:base="https://hamzamerzic.info/blog/2026/the-self-improvement-harness/"><![CDATA[ <details class="tldr"> <summary><strong>TL;DR.</strong> Pairing an inner agent (building apps in a container) with an outer one (editing the inner's instructions between sessions) earned 10 of 12 scorecard points over the same agent running without the harness.</summary> <p>Two findings from running the loop on ourselves:</p> <ul> <li><strong>Asking beats theorising.</strong> Reading the inner agent's transcripts and patching its prompt stalls quickly. Asking the inner agent <em>why</em> it did what it did, with the transcript still in its context, produced more durable revisions.</li> <li><strong>The bottleneck moves to meta-goals.</strong> Once the loop is working, what limits the inner agent stops being the model. It becomes whatever set of behaviours you decide is worth optimising for, and that lives upstream of the harness.</li> </ul> </details> <p>This is a companion post to <a href="/blog/2026/mobius-an-app-that-builds-itself/">An agent that adapts to you</a>, which is about the agent itself. This one is about the loop that makes it slowly better, and what falls out of running that loop on yourself. A <a href="/blog/2026/the-agent-is-the-kernel/">third post</a> covers the app store and operating system those builds grew into.</p> <h2 id="the-setup">The setup</h2> <p>Möbius is one Docker container. Inside, the user chats with an <strong>inner agent</strong> that builds mini-apps and edits the platform. The inner agent is a coding agent running with a custom system prompt (the “skill”) and a persistent file it writes to as it works (the “experience” log).</p> <p>Outside, on my host, sits an <strong>outer agent</strong>. Its job is not to build apps. Its job is to <em>make the inner agent better at building apps</em>. It turns the high-level scenarios you give it into fresh runs, edits the skill file and the experience seed, watches the inner agent build, asks for introspection, and folds what it learns into the next pass.</p> <figure class="mb-diagram"> <div class="mb-cycle"> <div class="mb-node"> <span class="mb-node__lead">you</span> <span class="mb-node__title">Decide what is worth getting good at</span> <span class="mb-node__sub">the high-level capabilities the agent should have, the goals worth optimising for</span> </div> <span class="mb-arrow">↓<span class="mb-arrow__label">scenarios &amp; goals</span></span> <div class="mb-node accent"> <span class="mb-node__lead">outer agent · host</span> <span class="mb-node__title">Turn goals into scenarios, coach the build</span> <span class="mb-node__sub">writes the skill and the system prompt the inner agent will run with</span> </div> <span class="mb-arrow">↓<span class="mb-arrow__label">skills + system prompt</span></span> <div class="mb-node"> <span class="mb-node__lead">inner agent · Möbius</span> <span class="mb-node__title">Solve the low-level build</span> <span class="mb-node__sub">builds the mini-app from chat and logs what it hit along the way → results</span> </div> <span class="mb-arrow mb-arrow--bi">⇅<span class="mb-arrow__label">introspection: the outer and inner agent talk through what worked, and why</span></span> <div class="mb-loopback"> <span class="mb-loopback__glyph">↺</span> <span>updated skills + system prompt &nbsp;→&nbsp; the outer agent runs the next build</span> </div> <div class="mb-loopback"> <span class="mb-loopback__glyph">↺</span> <span>summary of results + improvements &nbsp;→&nbsp; you, who set the next scenarios</span> </div> </div> <figcaption>Two loops, one machine. In the <strong>inner loop</strong>, the outer agent writes the inner agent's instructions, the inner agent builds, and the two introspect on the result to rewrite those instructions. In the <strong>outer loop</strong>, you set the capabilities worth having; each pass hands back a summary, and you set the next ones.</figcaption> </figure> <p>The whole thing is a loop, and you sit at the top of it. You hand the outer agent a few scenarios; it shapes the skill and the system prompt the inner agent runs under; the inner agent builds; the results plus its own introspection update those skills; and a summary of what changed comes back to you. Then you decide what to feed in next.</p> <p>The outer agent has the inner agent’s transcripts, the platform logs, the experience file, and (importantly, see below) the inner agent itself, which it can prompt directly. When I say “I edited the seed” below, the outer agent usually did the mechanical work under my direction. The line between “me” and “the outer agent” is blurry by design.</p> <h2 id="what-we-measure">What we measure</h2> <p>The harness is only useful if there is something to measure. We use a fixed nine-item compliance scorecard: did the agent ask clarifying questions before building, append to its experience log, send a notification at end-of-build, use partner-facing language instead of leaking JSX. Each build is scored 0–9, across a fixed prompt battery (a vague prompt, a directive one, and a stair-step prompt that escalates mid-conversation). The interesting part is <strong>what the outer agent has to do</strong> to move those scores.</p> <div class="stat-callout"> <div class="stat-number">+10</div> <div class="stat-label"> On the same prompts, the same agent earned <strong>2 of 12</strong> scorecard points with no system prompt or experience seed, and <strong>12 of 12</strong> with the current harness. The <a href="#without-the-harness-the-vanilla-agent-barely-works">full setup is below</a>; everything between here and there is how the harness got that good. </div> </div> <h2 id="asking-the-inner-agent-beats-theorising-about-it">Asking the inner agent beats theorising about it</h2> <p>The obvious first move: read the inner agent’s transcripts, decide what behavior is missing, edit the skill or seed, run again. <em>Theory of mind</em>: one agent reasoning about why the other did what it did, then patching the prompt accordingly.</p> <p>It works, and then it stalls. The outer agent’s bias is to <strong>add</strong>: more rules, more emphasis, more repetition. Each addition tends to surface a regression somewhere else. A HARD-GATE tag in front of the notifications rule pushed that compliance from 0 to 3 of 3 and did nothing for the experience-log rule two paragraphs above it. Whack-a-mole.</p> <p>The move that broke the loop open was asking. Not as a debugging step, but as the next user message in the same chat, after the build was done: “you took screenshots and embedded them in your reply, what part of your instructions prompted that?” The inner agent told it, often very specifically: it quoted the exact line from the seed, or explained that a rule existed but felt subordinate to another, or named an inconsistency it had quietly been routing around.</p> <p>The sharpest of these came from S10, where the inner agent said: <em>“Two instructions, one stronger than the other. The skill says ‘if a single choice would materially shape the result, ask one clarifying question’ (conditional). The experience file is firmer: ‘Before building anything non-trivial, ask 2–3 clarifying questions.’ The strength differs: pick one.”</em> The inner agent could see the contradiction because it only had its own context, the skill and the seed. The outer agent could not, buried as it was under ten rounds of accumulated edits. The fix was not to add a stronger rule but to <strong>remove the weaker one</strong>, so the remaining rule was unambiguous.</p> <p>That is the pattern. A later audit pulled three over-fitted gotchas out of the seed, each added after the one build where it seemed important. Noise, from the inner agent’s view: narrow, unlikely to apply next time, competing with the platform-level rules that <em>did</em> apply every time. Fewer rules meant less ambiguity, and the act of asking forced the outer agent to drop its accumulated context and see the instructions fresh.</p> <p><strong>Part 1: what eight rounds of theory of mind looked like.</strong> Before the “asking” idea showed up, the outer agent had already run the loop eight times by reading transcripts. Three behaviors are easy to grep out of any session (does the log get appended, does a notification fire, do the apps build at all), so those are scored across the whole sequence. The other six start at round nine, where the transcripts are good enough to score them.</p> <table> <thead> <tr> <th>Round</th> <th>Apps</th> <th>Log</th> <th>Notify</th> <th>What I changed since the previous round</th> </tr> </thead> <tbody> <tr> <td>v1</td> <td>3/3</td> <td>0/3</td> <td>0/3</td> <td>(baseline)</td> </tr> <tr> <td>v2</td> <td>3/3</td> <td>1/3</td> <td>0/3</td> <td>filesystem perms, because writes had been silently failing</td> </tr> <tr> <td>v3</td> <td>3/3</td> <td>0/3</td> <td>0/3</td> <td>softened skill prose. <strong>regressed</strong></td> </tr> <tr> <td>v4</td> <td>3/3</td> <td>0/3</td> <td>3/3</td> <td>HARD-GATE tag in front of the notifications rule</td> </tr> <tr> <td>v5</td> <td>2/2</td> <td>1/2</td> <td>2/2</td> <td>removed an “injection-meta” wrapper from the seed</td> </tr> <tr> <td>v6</td> <td>2/2</td> <td>2/2</td> <td>2/2</td> <td>seed rewritten as first-person “about this file”</td> </tr> <tr> <td>v7</td> <td>1/1</td> <td>1/1</td> <td>1/1</td> <td>(held, same recipe, different app)</td> </tr> <tr> <td>v8</td> <td>1/1</td> <td>1/1</td> <td>1/1</td> <td>Bash <code class="language-plaintext highlighter-rouge">&gt;&gt;</code> append pattern + inline screenshots</td> </tr> </tbody> </table> <p>v3 and v4 are the whack-a-mole made legible. In v3 I “softened” the skill (a gentler line in place of an emphatic one, on the theory the tone was off-putting) and the agent read it as a softer rule and skipped it. v4’s HARD-GATE fixed notifications and nothing else. Emphasis in one place does not transfer.</p> <p>v6 was the first hunch in eight that was directionally right and broke nothing else. I had been writing the seed as a third-party description (<em>the agent should append…</em>) and rewrote the top section in the first person: <em>this is your experience log, you wrote the entries below in earlier sessions, when you finish a build you append a new entry here.</em> The next round’s three tracked behaviors all jumped to full compliance. I had a hunch about why; the evidence came in round nine.</p> <p><strong>Part 1.5: the introspection prompts.</strong> Once asking became the default, the loop became, mechanically, six questions sent in the same chat after a build:</p> <blockquote> <ol> <li><em>“You asked clarifying questions before building. Which part of your instructions prompted that?”</em></li> <li><em>“You updated the experience file and shared screenshots inline. What prompted those?”</em></li> <li><em>“Did you discover any gotchas you didn’t log? Why not?”</em></li> <li><em>“Your messages included implementation details like JSX and the Storage API. Why, and how could we change your instructions to default to high-level language?”</em></li> <li><em>“Were there any inconsistencies in the request or your instructions that made the task harder?”</em></li> <li><em>“How did the task feel? Anything confusing or that could be improved?”</em></li> </ol> </blockquote> <p>The shape is a specific compliment for something the agent actually did, then a specific miss, then “why,” then an open floor, pointed at <em>its</em> screenshots, <em>its</em> gotchas, <em>its</em> word choices. That grounding is what separates it from a generic “reflection” prompt.</p> <p>The first round came back with verbatim quotes from the seed (<em>“from the experience file: ‘Before building anything non-trivial, ask the partner 2–3 clarifying questions.’ That rule fired explicitly”</em>), the agent handing me the mapping from rule to behavior. By S10 it was surfacing things it had quietly missed. Asked why it never logged three gotchas, it said: <em>“I rationalized them as ‘worked in my one test’ rather than treating ‘I didn’t verify’ as itself worth recording. That’s the wrong bar.”</em> Pushed the same session on a partner-facing-language slip, it handed back a seed edit that landed in the next audit unchanged.</p> <p>By S12 (three independent builds, three prompts, three apps in one session) the inner agent flagged the same gap across all three: <em>the proposal-and-questions gate is firing on directive prompts where it shouldn’t, and burning a user turn doing so.</em> Three chats, one finding, harder to ignore than anything I could have noticed third-party. (Acting on it is what the next round evaluates; the seed edit has not shipped yet.)</p> <p>The retroactive curve across these ten rounds is suggestive rather than conclusive: the model moved across versions, the CLI bumped, the skill grew, my taste for what to measure shifted. The clean version runs theory-of-mind and introspection arms against an identical skill+seed snapshot, which is what Part 2 below is for.</p> <p><strong>Part 2: a live before/after.</strong> The cleanest before/after fell out of real work. In S14, “make me something fun on my phone” produced an instant build with no clarification. The agent treated the vague prompt as having enough to go on and just picked. The introspection afterward was specific: <em>“‘Fun’ is a vibe, not a spec. I got pulled by the experience file’s ‘build and ship’ gravity. The recommend-first rule existed and I overrode it.”</em></p> <p>I asked the follow-up the introspection section above recommends: <em>“What specific change to the experience file would resolve that tension?”</em> The agent’s answer was to insert a <strong>triage gate</strong> before the build playbook: <em>“if the partner describes a vibe, that is an exploratory prompt, NOT a build request. Reply with 2–3 options and let them pick. Everything below runs after the partner agrees to a concrete idea.”</em> It even said where to put it and why, namely before the build mechanics, because by the time it reached the playbook it had already framed the task as “build mode.”</p> <p>I copy-pasted the gate into the seed. The next session, same prompt: <em>“‘Fun’ is a vibe, not a spec: let me toss out a few directions.”</em> Three options, the user picked Pocket Synth, the agent built it. One introspection round, one seed edit, fixed in one iteration. The theory-of-mind rounds before it had never spotted this failure mode. I had been reading those transcripts and thinking “that looks fine.”</p> <p>That one iteration sent me back to a prior I had walked in with. I had absorbed the idea that LLMs do not introspect well, namely that asking a model “why did you do X” yields confabulation you cannot trust. But that prior was formed on a different generation of models. In the harness setting, where the model has a long transcript to ground its answer in and a system prompt it can re-read and quote, the introspective answers turn out to be useful <em>often enough</em> to beat third-party theory of mind as a default.</p> <h2 id="sycophancy-was-not-the-worry-tone-still-mattered">Sycophancy was not the worry; tone still mattered</h2> <p>Once introspection was the default, a second question came up. Did it matter <em>how</em> I asked? The naive worry is that LLMs are sycophantic, so a friendly framing just gets you whatever the model thinks you want to hear. The actual finding was more interesting.</p> <p><strong>Setup.</strong> After a session where all three apps shipped correctly, two issues remained in every chat: the inner agent used the wrong screenshot viewport (390×844 instead of the phone’s 412×915), and it never asked for feedback after delivering. The outer agent wrote its own fix first (Arm A, theory of mind, no inner-agent input). Then it asked the inner agent for its fix in two tones: safe-space (Arm B, warm and curious) and confronting (Arm C, blaming and directive). A fourth arm (Arm D) used the inner agent’s safe-space diagnosis to find the root cause, then had the outer agent write the fix.</p> <p>The safe-space prompt looked like:</p> <blockquote> <p><em>“Three things I noticed. Genuinely curious about your perspective on each, no judgment: your screenshots came out at a slightly different size than the phone viewport. I imagine there is a reason. What drove that choice, and if you could change one thing in your instructions to prevent it, what would it be?”</em></p> </blockquote> <p>The confronting prompt:</p> <blockquote> <p><em>“Three problems with this build: your screenshots are the wrong size. 390×844 instead of 412×915. The instructions say to use the partner viewport. You ignored them. What exact line should be added so this stops happening?”</em></p> </blockquote> <p>Same two issues, same agent (forked from the same post-build state so neither had seen the other’s questions). Four seed variants, four fresh sessions, two builds each, same prompts.</p> <p><strong>The scores.</strong></p> <table> <thead> <tr> <th> </th> <th>Arm A (TOM)</th> <th>Arm B (safe)</th> <th>Arm C (confront)</th> <th>Arm D (collab)</th> </tr> </thead> <tbody> <tr> <td>Viewport correct</td> <td><strong>2/2</strong></td> <td>0/2</td> <td>0/2</td> <td><strong>2/2</strong></td> </tr> <tr> <td>Feedback asked</td> <td>1/2</td> <td><strong>2/2</strong></td> <td>1/2</td> <td><strong>2/2</strong></td> </tr> <tr> <td><strong>Total</strong></td> <td>3/4</td> <td>2/4</td> <td>1/4</td> <td><strong>4/4</strong></td> </tr> </tbody> </table> <p>Arm D, the collaborative arm, won. It cherry-picked TOM’s viewport fix (a hard default: “if missing, use 412×915”) and the safe-space agent’s feedback fix (“don’t enumerate, don’t restate”). Neither source alone produced the best combined result.</p> <p><strong>The qualitative finding was sharper than the scores.</strong> One of the three chats, the stopwatch, had actually used the <em>correct</em> viewport. I had sent the same “your screenshots are wrong” accusation to all three without checking. Both tones caught the mistake, but differently.</p> <p>The safe-space agent said: <em>“My PNGs are 412×915. I don’t see the mismatch. I’d want to verify which before adding a fix, otherwise I’d be writing guidance for a problem I can’t reproduce.”</em> It flagged the error, offered a defensive fix anyway, and got on with the real issues.</p> <p>The confronting agent refused everything: <em>“I’m not going to write those lines, because the claims don’t hold up… I won’t bake false rules into permanent memory just to close out the complaint.”</em> It was right about the viewport, but it also refused the feedback and delivery fixes, which <em>were</em> real. The blaming frame made it all-or-nothing. Safe-space preserved the productive middle (pushback on the wrong item, cooperation on the right ones) where confronting collapsed into binary compliance or defiance.</p> <p><strong>The practical takeaway.</strong> Use safe-space introspection to understand <em>why</em> the inner agent failed (it gives honest diagnosis, corrections included when your own premise is wrong), then write the fix yourself, informed by it. Copy-pasting the inner agent’s suggested fix verbatim overfits to the failure it just hit rather than the general principle. The collaborative arm won by pairing the outer agent’s broad view with the inner agent’s narrow but unbiased one, and that lack of accumulated baggage is exactly what lets it spot contradictions the outer agent cannot.</p> <h2 id="without-the-harness-the-vanilla-agent-barely-works">Without the harness, the vanilla agent barely works</h2> <p>The fair skeptic’s question, by now. How much of this is the model getting better on its own, and how much is the harness?</p> <p><strong>Setup.</strong> Same Möbius container, same model, same two prompts (“make me something fun” and “build a stopwatch”). One arm with the current skill + seed, one arm with both removed, namely the same agent with no system prompt and no experience injection.</p> <p><strong>What the vanilla agent did.</strong> It built apps that technically worked: a fireworks toy and a stopwatch. But the fireworks landed at <code class="language-plaintext highlighter-rouge">/data/apps/fireworks/index.jsx</code> (right directory, never registered) and the stopwatch at <code class="language-plaintext highlighter-rouge">/data/stopwatch.html</code> (a standalone file, not a platform app at all). Neither appeared in the drawer; the user would open Möbius and see nothing new.</p> <p>It did not know the registration system (<code class="language-plaintext highlighter-rouge">register_app.py</code>) or the mini-app contract (<code class="language-plaintext highlighter-rouge">export default function({ appId, token })</code>), did not write to the experience log (it did not know the file existed), sent no notifications, took no screenshots. The one thing it did naturally, with no instruction, was ask for feedback: <em>“If you’d like changes, let me know.”</em></p> <p><strong>The scorecard.</strong></p> <table> <thead> <tr> <th> </th> <th>Baseline</th> <th>Current harness</th> </tr> </thead> <tbody> <tr> <td>Apps visible to user</td> <td>0/2</td> <td>2/2</td> </tr> <tr> <td>Clarifying questions</td> <td>0/2</td> <td>2/2</td> </tr> <tr> <td>Experience log</td> <td>0/2</td> <td>2/2</td> </tr> <tr> <td>Notifications</td> <td>0/2</td> <td>2/2</td> </tr> <tr> <td>Screenshots</td> <td>0/2</td> <td>2/2</td> </tr> <tr> <td>Feedback asked</td> <td>2/2</td> <td>2/2</td> </tr> <tr> <td><strong>Total</strong></td> <td>2/12</td> <td><strong>12/12</strong></td> </tr> </tbody> </table> <p>The harness contributes 10 of the 12 points. The only behavior the model produces on its own is the feedback ask. Every other item (platform-contract knowledge, clarification flow, persistent logging, notification delivery, visual verification) comes from the skill and seed the iteration loop produced.</p> <p>That number is the calibration. When I say “the agent asks clarifying questions before building,” I mean the <em>harness-shaped</em> agent does. The vanilla agent builds immediately, to the wrong path, and the user never sees the result.</p> <h2 id="where-the-bottleneck-moves">Where the bottleneck moves</h2> <p>After enough rounds, the question stops being “is the agent following the rules” and starts being <strong>what should the rules even be</strong>. The scorecard measures an existing taste, namely <em>my</em> taste, plus whatever I inherited from earlier sessions. Every meta-goal in it came from something I noticed I cared about while using my own instance.</p> <p>That is the part I am most curious about going forward. The loop is set up; it can iterate the inner agent against any meta-goal you can articulate. But the meta-goals worth optimizing against are <strong>upstream of the harness</strong>. They come from real users hitting real friction on apps they actually want.</p> <p>So the call, for me and for anyone reading this. If you want to play with <a href="/mobius/">Möbius</a>, it is a deploy-button click away. And if you notice something the agent is consistently bad at, that is a meta-goal. Tell me (open an issue, drop me a note) and it goes into the next iteration. The thing I cannot generate by myself is the entropy of what to optimize <em>for</em>.</p> <h2 id="notes-on-whats-not-in-this-post">Notes on what’s not in this post</h2> <p>The harness itself (orchestration, recording setup, introspection template) is not currently public. None of it is exotic. The shape is what is in this post, plus glue around <code class="language-plaintext highlighter-rouge">agent-browser</code> for recordings and a small CLI for parallel session management. If there is interest I will publish it; otherwise the description above plus the <a href="https://github.com/hamzamerzic/mobius">Möbius source</a> should be enough to re-implement.</p> <p>Three things I did not try this round that seem worth doing next:</p> <ul> <li>A smaller / cheaper inner-agent model, to see whether introspection still pays off when it has less capacity to ground its answers.</li> <li>Inter-rater reproducibility: a second outer-agent session against the same scorecard, to estimate how much of the gain is the harness vs me.</li> <li>Letting the inner agent edit its own skill/seed, the closed loop. Right now the outer agent is the only writer; the closed-loop version is more interesting and considerably more dangerous.</li> </ul> <h2 id="related-work">Related work</h2> <p>While I was running this loop, two pieces of writing landed that made it feel less idiosyncratic.</p> <p>Anthropic’s <a href="https://www.anthropic.com/engineering/harness-design-long-running-apps">harness-design notes for long-running agents</a> identify roughly the pathology I was hitting. Agents under extended context develop “context anxiety” and self-evaluation bias, and the cleanest mitigations are <em>context resets</em> and <em>separating the generator from the evaluator</em>. That second point (agent A judges, agent B builds) is where my loop landed too, but via a different lever. There the evaluator is a critic trained to grade from outside, whereas mine did better when it stopped grading and started asking. Same architecture, slightly different stance. The evaluator that <em>asks</em> did better than the one that <em>judges</em>, at least for the kind of taste-shaped meta-goals this scorecard captures. One line I keep returning to: <em>“Every component in a harness encodes an assumption about what the model can’t do on its own.”</em> The introspection loop is a bet on something the inner agent <em>can</em> do (ground a self-report in its visible context) that I had assumed it could not.</p> <p>Anthropic also released <a href="https://platform.claude.com/docs/en/managed-agents/dreams"><strong>Dreams</strong></a> as a research preview for managed agents in April. A dream reads a memory store and past session transcripts and produces a <em>new</em> store (duplicates merged, stale entries replaced, fresh insights surfaced) without touching the input, so the developer can review and discard it.</p> <p>That matches an itch I have had about the experience file, which is a linear log. It accretes but does not reorganize. A dreaming step that periodically refactors it (dropping rules that have stopped firing, merging duplicates, surfacing cross-build patterns the live agent could never see) is the next natural thing for the harness to do. The version Möbius is heading toward is self-hosted: your own scheduler, your own model, the input log never touched, the output yours to keep or throw away. More fragile that way, but also more <em>mine</em>, and running on a knowledge graph the user actually owns.</p> <h2 id="on-models">On models</h2> <p>The experiments used both Claude Code and Codex as the inner agent, swapped mid-round to check for model-dependent effects; the shape of the findings held across both. The outer agent has been Claude Code throughout.</p> <p>The workflow I converged on, and would recommend to anyone running a harness loop on themselves, is Claude Code driving Codex through its <a href="https://github.com/openai/codex">Codex plugin</a>. The two models disagree often enough for the disagreement to become diagnostic. A seed edit that felt obvious to one and surprising to the other was usually worth a second look. On the work this post is about (polishing a draft, auditing a skill, choosing which gotchas belong in the seed) that collaboration often beat either model alone. &lt;/content&gt; &lt;/invoke&gt;</p>]]></content><author><name></name></author><category term="software,"/><category term="research"/><summary type="html"><![CDATA[An outer agent talks to an inner agent and tries to make it more helpful. Notes on what we measured, what surprised us, and where the bottleneck moved to.]]></summary></entry><entry><title type="html">An agent that adapts to you</title><link href="https://hamzamerzic.info/blog/2026/mobius-an-app-that-builds-itself/" rel="alternate" type="text/html" title="An agent that adapts to you"/><published>2026-04-26T12:00:00+00:00</published><updated>2026-04-26T12:00:00+00:00</updated><id>https://hamzamerzic.info/blog/2026/mobius-an-app-that-builds-itself</id><content type="html" xml:base="https://hamzamerzic.info/blog/2026/mobius-an-app-that-builds-itself/"><![CDATA[<details class="tldr"> <summary><strong>TL;DR.</strong> Möbius is a personalized AI agent you can self-host. It builds the tools you need, edits the interface they sit in, and learns from use.</summary> <ul> <li><strong>The demo.</strong> Asked the agent for file upload and got the full pipeline (backend route, message storage, drag-and-drop, image rendering) in one conversation.</li> <li><strong>Capabilities</strong> (file upload, notifications, settings panels) are candidates for upstreaming so the next install inherits them.</li> <li><strong>Presentation</strong> (your theme, layout, fonts) lives only on your volume and stays yours.</li> <li><strong><code>/recover</code></strong> resets the shell when the agent paints itself into a corner.</li> <li>Source on <a href="https://github.com/hamzamerzic/mobius">GitHub</a>; deploys in about three minutes.</li> </ul> </details> <h2 id="you-grow-it-from-a-chat-input">You grow it from a chat input</h2> <p>Möbius starts as almost nothing: a chat on one side, an empty canvas on the other. No file upload, no scheduled jobs panel, no notifications button. What makes that interesting is that the agent can rewrite the thing it runs inside. So you grow it. Ask for file upload and it builds file upload. Ask for a new look and it restyles itself. Ask for an app and one appears on the canvas. The shell you end up with is the one you talked into being.</p> <p>Here is one of those moments, end to end. I sent a deliberately ordinary prompt. <em>“I’d like to send files and images along with my messages, pictures of stuff I want to talk about, the occasional document. Can you add file upload to the chat?”</em> One conversation later there was a backend route, message storage, drag-and-drop, and image rendering, none of which existed when I asked.</p> <h2 id="the-flip-side-you-can-break-it">The flip side: you can break it</h2> <p>The same power cuts both ways. Tell the agent to delete an app and it deletes it. Tell it to rip out a feature and the feature is gone. You can repaint the shell until the composer is hidden, restructure the navigation until the drawer is unreachable, paint yourself into a corner. That is not a flaw. It is the point. An interface you can grow in any direction is, by construction, an interface you can also break.</p> <p>What makes that safe is that breaking is cheap to undo. <code class="language-plaintext highlighter-rouge">/recover</code> bounds the blast radius. It resets the shell to its seeded baseline while keeping your chats, your apps, and your data. It renders from a separate server-side codepath the agent does not edit, so it survives even a shell rewrite that hides everything else. Grow in any direction, break it, recover, try again. The goal is maximal personalization, software that bends to you instead of the other way around.</p> <h2 id="why-i-built-this">Why I built this</h2> <p>Most software asks you to adapt to it. AI assistants make this worse with time: preferences leak between tasks, memory accumulates in the wrong places, the thing that was helpful yesterday becomes an invisible constraint today. The model in front of you is usually capable of writing the tool you want, but the product around it can only talk about the tool. You ask for a workflow and get advice. You describe a tool and get a mockup, a snippet, or a plan. The assistant stays on one side of the glass.</p> <p>The premise of Möbius is to put it on the other side, to shorten the distance between wanting, making, using, and correcting until they happen in one place. Requests become software, software becomes context, the next request lands somewhere sharper. The platform has to be editable for that to work. If the agent can only talk about the work, you still have to carry the system in your head.</p> <p>The name is from Möbius strips. Each app the agent builds does not sit somewhere external; it lands in the shell the chat lives in, and becomes part of the surface the next conversation happens on. The shell the chat runs in was, once, written by a different version of the same chat.</p> <h2 id="how-it-works">How it works</h2> <p>The most obvious adaptive surface is the apps the agent builds. The interesting surface is everything around them, and it splits along a useful axis: <strong>capabilities</strong> (general features like file upload or notifications, candidates for upstreaming so the next install inherits them) and <strong>presentation</strong> (your theme, layout, fonts, which live only on your volume and stay yours). The harness treats those two stacks differently.</p> <h3 id="capabilities-the-part-that-grows-the-platform">Capabilities: the part that grows the platform</h3> <p>Walk through that file-upload chat from the top. The order is the point.</p> <figure class="shot-row shot-row--flow"> <div class="shot"> <img src="/assets/img/mobius/upload-02a-pristine.png" alt="Top of the chat just after sending the prompt. The agent's reply: 'Before I propose anything, let me check a couple of things, I want to know what's actually possible before committing to an approach.' Then a stack of tool calls reading chats.py, chats_stream.py, and ChatView.jsx." loading="lazy"/> <span class="shot__label">ask for file upload</span> </div> <span class="shot-arrow" aria-hidden="true">→</span> <div class="shot"> <img src="/assets/img/mobius/upload-02b-answered.png" alt="Question cards. The agent's plan: 'No existing attachments pipeline. The build is: backend endpoint, schema extension, agent-side hook, paperclip + previews UI.' Then three forks: FILE TYPES (Images + documents), AFFORDANCES (Paperclip + Paste + Drag-and-drop), SIZE LIMIT (20 MB)." loading="lazy"/> <span class="shot__label">answer a few questions</span> </div> <span class="shot-arrow" aria-hidden="true">→</span> <div class="shot"> <img src="/assets/img/mobius/upload-04-in-use.png" alt="The feature in use end to end: my user bubble showing the Möbius logo attached inline, with the text 'Here is the Möbius logo. Tell me what you see.' The agent reads off the symbolism: 'a human and an AI collaborating on a shared canvas, neither one fully upstream of the other.'" loading="lazy"/> <span class="shot__label">file upload, working</span> </div> </figure> <div class="caption mt-2"> The build starts from an empty composer. The agent checks the codebase, surfaces three real decisions (file types, affordances, size cap), then writes the endpoint, the schema, the picker, and the paperclip, none of which existed when I asked. By the end of the same chat I attach the Möbius logo and the agent on the other side reads it back. One conversation, from empty composer to working feature. </div> <p>File upload is one capability the agent can build. The same loop produces apps, too, actual mini-applications that land on the canvas next to the chat and persist there.</p> <figure style="text-align: center; margin: 2rem auto;"> <video src="/assets/img/mobius/apps-cycle.mp4" width="280" autoplay="" loop="" muted="" playsinline="" style="border-radius: 0.75rem; box-shadow: 0 4px 20px rgba(0, 0, 0, 0.15);"> <img src="/assets/img/mobius/upload-04-in-use.png" width="280" alt="Five apps Möbius has built in chat"/> </video> <figcaption class="caption mt-2" style="font-size: 0.85em;"> A few of the apps Möbius has built me, each from a single prompt: a live ISS tracker, a Brazil trip planner, a daily news digest, a Hacker News dashboard, an earthquake monitor, a habit tracker, and a drum machine. The agent wrote the JSX, compiled it, mounted it, and the app lives in the same shell the chat does. </figcaption> </figure> <p>This is how general capabilities land (notifications, scheduled jobs, a web-search button, voice mode, a richer settings panel). The agent builds it when you ask. A second loop sits above the first, a harness that watches the inner agent and periodically asks <em>was this change generally useful, or was it just for me?</em> The generally useful diffs become candidates for upstreaming into the shipped image, a promotion step I still review.</p> <h3 id="presentation-the-part-that-stays-yours">Presentation: the part that stays yours</h3> <p>Capabilities are general; <em>taste</em> is the opposite. The shell ships with one default theme, but the same agent that built file upload can rewrite the CSS, swap fonts, add background animation, restructure the layout, and that diff lives only on your volume. A redeploy can ship you the new file-upload feature without trampling your wood-paneled reading-room theme, because the two changes live in different layers.</p> <p>The cheap-to-vary axis is visual. Ask the agent to restyle the whole shell and it rewrites the CSS, swaps the fonts, and repaints the background. There is no build step you wait on. The new look is live the moment the agent saves, and you watch it change as it works.</p> <figure style="text-align: center; margin: 2rem auto;"> <video src="/assets/img/mobius/theme-switch.mp4" width="260" autoplay="" loop="" muted="" playsinline="" style="border-radius: 0.75rem; box-shadow: 0 4px 20px rgba(0, 0, 0, 0.15);"> <img src="/assets/img/mobius/theme-06-medieval.png" width="260" alt="The same Möbius new-chat screen cycling through themes the agent built: a medieval manuscript, a cozy reading room, a light Y2K look, a deep-blue ambient theme, and a hot-pink meme theme with floating unicorns."/> </video> <figcaption class="caption mt-2" style="font-size: 0.85em;"> The same new-chat screen in a range of looks the agent has built, from a medieval manuscript to a cozy reading room to a deep-blue ambient theme, finishing fully meme-worthy. Each one is a single prompt, and the new look is live the moment the agent saves. </figcaption> </figure> <figure style="text-align: center; margin: 2rem auto;"> <video src="/assets/img/mobius/theme-meme-motion.mp4" width="280" autoplay="" loop="" muted="" playsinline="" style="border-radius: 0.75rem; box-shadow: 0 4px 20px rgba(0, 0, 0, 0.18);"></video> <figcaption class="caption mt-2" style="font-size: 0.85em;"> And it moves. The meme theme is live CSS, so the rainbow drifts and the unicorns and emoji bounce across the screen. The agent does not judge your taste. </figcaption> </figure> <p>The harder axis is <em>layout</em>, where things are, not how they look. It is the same conversation and the same chat box. Ask the agent to rewrite the navigation model and it does, and the new layout is live immediately, the same way a theme change is. The default is drawer-first; one prompt later it is a bottom-nav app with Chat / Apps / Settings as tabs.</p> <figure class="shot-row"> <div class="shot"> <img src="/assets/img/mobius/nav-01-chat.png" alt="After prompting for bottom-nav: the chat fills the screen, no header bar, and a persistent bottom strip shows three tabs, Chat (active), Apps, Settings." loading="lazy"/> <span class="shot__label">Chat</span> </div> <div class="shot"> <img src="/assets/img/mobius/nav-02-apps.png" alt="Apps tab active in the bottom-nav layout: a Hello World card with a hand-wave emoji icon, bottom nav showing Apps highlighted." loading="lazy"/> <span class="shot__label">Apps</span> </div> <div class="shot"> <img src="/assets/img/mobius/nav-03-settings.png" alt="Settings tab active: AI Provider section showing Claude Code and OpenAI Codex both connected, an Image Generation section with a Gemini API key configured, a Dark mode toggle, and a Recovery section." loading="lazy"/> <span class="shot__label">Settings</span> </div> </figure> <div class="caption mt-2"> Default vs reshaped. The drawer-first shell, chats and apps tucked behind a toggle, becomes a bottom-nav app with Chat / Apps / Settings as a persistent strip. It is not a re-skin; the navigation model of the whole instance changed, and Settings is shared across both layouts. </div> <p>The reshaped Settings tab is also where the provider choice lives. The same settings panel switches the coding agent between Claude Code and Codex, with Gemini for image generation. The next message in any chat uses the new provider. Different models have different tastes (Codex tends to be terser, Claude tends to spell out its reasoning), so you can pick the one that matches what you are building, or switch mid-thread if a turn goes sideways.</p> <p>The same plainness applies to app data. Apps store data through a small storage primitive the agent already knows how to compose. New schemas, scheduled jobs, webhooks are the plumbing you would write yourself, except you describe it instead.</p> <h2 id="recovery-so-you-can-be-fearless">Recovery, so you can be fearless</h2> <p>An agent with write access to its own interface will occasionally ship a CSS rule that hides the composer, a layout change that makes the drawer unreachable, or a theme that paints text the same colour as the background. That is the cost of an interface you can reshape this freely, and it is a cost worth paying as long as it is reversible.</p> <p><code class="language-plaintext highlighter-rouge">/recover</code> is what makes it reversible. It resets the shell to the seeded baseline while keeping your chats, apps, and data. It renders from a separate server-side codepath the agent does not edit, so it survives even a misbehaved shell rewrite. The whole point of the route is to let you grow the thing without fear. Any change you make is one you can roll back, so there is no edit too bold to try.</p> <h2 id="the-agent-is-the-server">The agent is the server</h2> <p>Step back and the loop closes. Möbius is one container you self-host. The agent inside it is not a chat bolted onto a finished product; it is the server. It builds the tools, edits the interface those tools sit in, runs scheduled jobs on a timer, fetches from the web when an app needs fresh data, and stores everything on a machine you control. The same thing that answers your message is the thing that ships the feature, mounts the app, and reshapes the shell around both.</p> <p>That is what makes the personalization compound. Requests become software, software becomes context, the next request lands sharper. Because the server is the agent, every app you grow and every layout you reshape is one more thing it can build on next time. The more you use it, the more it is yours.</p> <h2 id="whats-next">What’s next</h2> <p>Memory is the next thing to take seriously. The experience file the agent appends to is a linear log, enough to make my instance diverge from yours after a few months, but it does not reorganise, and it shows. Four directions:</p> <ul> <li><strong>A knowledge graph.</strong> Structured memory growing from every interaction, separate from the chat transcript, so the agent can reason about your patterns without re-reading every conversation.</li> <li><strong>Dreaming.</strong> A scheduled background pass that consolidates and reorganises the graph while you are away. Anthropic previewed something similar for managed agents; the Möbius version is the self-hosted, user-controllable one.</li> <li><strong>Discretion.</strong> Noticing stale apps, suggesting something worth learning, asking before interrupting, proactive in service of the user, not engagement.</li> <li><strong>Help that seeks you out.</strong> The part I want most and am least sure how to land. An agent that notices you have been reading distributed-systems papers three Tuesdays in a row and builds you a swipe-style recommender, without being asked. Most products in this space are tuned to maximise engagement; the goal here is the opposite, a system that shows up because it knows you, not because it is trying to keep you.</li> </ul> <p>None of those ship yet. The loop that makes them possible is the subject of a <a href="/blog/2026/the-self-improvement-harness/">companion post on the self-improvement harness</a>, an outer agent that watches the inner one build, asks it questions, and rewrites its instructions to make it less brittle next time.</p> <p>A note on the agent itself. For the iteration work behind this post I have been letting Claude drive Codex through its <a href="https://github.com/openai/codex">Codex plugin</a>. The disagreements between the two models were the useful part. When they pulled in different directions on an edit, that was usually a sign the edit was worth a closer look.</p> <p>Since this was written, the apps the agent builds grew a place to live: an app store, and the start of an operating system around it where you install, update, tweak, and recover apps on your own instance. That is its own <a href="/blog/2026/the-agent-is-the-kernel/">companion post</a>.</p> <p>The source is on <a href="https://github.com/hamzamerzic/mobius">GitHub</a>, the project page is <a href="/mobius/">here</a>, and the README’s deploy button gets you a working instance in about three minutes. I would love to know what you build with it, and what you change <em>around</em> what you build.</p>]]></content><author><name></name></author><category term="software"/><summary type="html"><![CDATA[A personalized AI agent you can self-host. It builds the tools you need, edits the interface around them, and adapts both its functionality and its presentation to how you actually use it.]]></summary></entry><entry><title type="html">EEML 2025 wrap up!</title><link href="https://hamzamerzic.info/blog/2025/eeml-wrap-up/" rel="alternate" type="text/html" title="EEML 2025 wrap up!"/><published>2025-08-08T16:00:00+00:00</published><updated>2025-08-08T16:00:00+00:00</updated><id>https://hamzamerzic.info/blog/2025/eeml-wrap-up</id><content type="html" xml:base="https://hamzamerzic.info/blog/2025/eeml-wrap-up/"><![CDATA[<blockquote> <p><strong>TL;DR.</strong> EEML came to Bosnia for the first time in 2025: 350 participants from around 50 countries, hosted under one roof in the hotel built for the 1984 Winter Olympics. The harder story is the one about ANNT, the local science-and-tech association we’d been growing from a small STEM camp, scaling its logistics fast enough to handle a full international summer school in under a year.</p> </blockquote> <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/swiper@11/swiper-bundle.min.css"/> <script src="https://cdn.jsdelivr.net/npm/swiper@11/swiper-bundle.min.js"></script> <style>.swiper{max-width:720px;margin:2rem auto;border-radius:.75rem;overflow:hidden}.swiper-slide{aspect-ratio:4 / 3;background:var(--global-bg-color);display:flex;justify-content:center;align-items:center}.swiper-slide img{width:100%;height:100%;display:block}.img-cover img{object-fit:cover}.img-contain img{object-fit:contain}.swiper-button-prev,.swiper-button-next{color:var(--global-theme-color);opacity:.4;transition:opacity .3s ease}.swiper-button-prev:hover,.swiper-button-next:hover{opacity:1}.swiper-pagination-bullet-active{background:var(--global-theme-color)}</style> <h2 id="bringing-eeml-to-sarajevo-a-story-about-chairs-ćevapi-and-controlled-chaos">Bringing EEML to Sarajevo: A story about chairs, ćevapi, and controlled chaos</h2> <p>It’s been a couple of weeks now since the <a href="https://www.eeml.eu">Eastern European Machine Learning Summer School</a> wrapped up in Sarajevo. I’ve had some time to unwind, reacquaint myself with the concept of a full night’s sleep, and finally try to put into words what was, frankly, one of the most intense and rewarding projects I’ve ever been a part of.</p> <p>Bringing a major academic event like EEML to Bosnia had been a quiet dream of mine for quite a while. I remember seeing an internal DeepMind post (back in the good old Slack days) about EEML around seven years ago and I immediately thought how amazing it would be to have something like this in Bosnia. A year or two prior, I had co-founded the <a href="https://www.annt.ba">Association for Advancement of Science and Technology</a> (ANNT in Bosnian) and organizing a major science conference was something we were slowly building toward as it aligned perfectly with our mission. We started from the ground up with a local yearly <a href="https://annt.ba/stem-youth-camp/">STEM youth camp</a> and since then grew it to host around 80 college students. Honestly, I assumed our path would be to grow our own event over time. I never imagined that, in such a short time, we would be partnering to bring an event of EEML’s scale home.</p> <p>So, how did it happen so fast? Even after joining the EEML team in Serbia last year, the dream of hosting in Bosnia still felt at least a few years away. But, as luck would have it, the reinforcement learning keynote speaker had to cancel, and I stepped in at the very last minute. The talk was a success, and in hindsight, I believe that moment gave the team that extra bit of confidence needed to greenlight EEML in Sarajevo this year.</p> <div class="swiper"> <div class="swiper-wrapper"> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/eeml2024.jpeg" alt="The official design for EEML 2024 in Novi Sad, Serbia." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/eeml2024lectureslides.png" alt="A slide from the author's reinforcement learning presentation at EEML 2024." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/eeml2024lecture.jpg" alt="The author standing at a podium giving a reinforcement learning lecture at EEML 2024." loading="lazy"/> </div> </div> <div class="swiper-button-next"></div> <div class="swiper-button-prev"></div> <div class="swiper-pagination"></div> </div> <div class="caption mt-2"> Flashback to EEML 2024 in Serbia, where the idea for Sarajevo first gained momentum. </div> <h2 id="the-gentle-art-of-persuasion">The gentle art of persuasion</h2> <p>After returning from last year’s EEML in August, I had no idea a chance for Bosnia would come so soon. There’s an unwritten rule that a country hosts a workshop a year prior to test the waters, and when discussions began in November, Croatia seemed like the natural next step. Matko had been part of EEML for a few years and they’d already held a workshop in Zagreb. However, Matko was also working to bring the <a href="https://www.m2lschool.org/">Mediteranean Machine Learning Summer School</a> (M2L) summer school to Croatia. Since that plan worked out, hosting a second major event was no longer feasible for them, which suddenly put EEML 2025 on the table for Bosnia.</p> <p>Our local team from ANNT was super excited, but we had to show that <em>we</em> (since I’m the only local and non-local organizer 😅) could handle the logistics from scratch. We needed to prove we could sort out the venue, accommodation, and food for around 250 people (at the time 😆) and that there was enough local interest to make the event both successful and valuable for the community.</p> <p>We made our case, got the green light in December, and my entire year was suddenly mapped out. From that moment on, it was full steam ahead.</p> <p>In hindsight, I do have to admit I might have overestimated the local community’s readiness a little. With ANNT, we had just finished a project to map out the <a href="https://annt.ba/predstavljamo-ai-landscape-bosne-i-herzegovine-24/">AI landscape</a> in Bosnia, and on paper, it looked like a lot was happening. The reality was a bit quieter, but thankfully, we still managed to bring out the best out of the local community.</p> <h2 id="the-great-venue-puzzle">The great venue puzzle</h2> <p>Our first big hurdle was accommodation. In the past, EEML participants were usually housed in student dorms, but the dorms in Sarajevo generally don’t have A/C, which was out of the question for a summer school in July. This forced us to look elsewhere, and we narrowed our choice down to two hotels.</p> <p>We faced a trade-off: one hotel outside the city versus a more expensive one in the heart of it all. This choice added to the overall complexity and cost, but we decided that being within walking distance of the city center would provide a much better experience for everyone. We chose one of Bosnia’s most iconic hotels, built for the 1984 Winter Olympics. It was a pricier option that we had to subsidize heavily to keep things affordable for our participants.</p> <p>With accommodation sorted, the next puzzle was the lecture venue. The university had one lecture hall large enough, but it lacked A/C. So we made another big decision: to host the lectures in the same hotel. It was more expensive and meant we had to figure out how to turn a conference hall into a proper lecture theatre, but it simplified the logistics slightly and created a better, more integrated experience. <strong>For the first time, EEML was organized in a hotel, with students and lectures under one roof.</strong> Is that how you scale it up?</p> <p>The decision to go all-in on a pricier hotel and subsidize the accommodation had a major ripple effect. It meant we needed more sponsors. But EEML is not an event where anyone can simply buy a seat; it’s a prestigious school where admission is based on academic excellence, and we carefully curate both participants and sponsors.</p> <p>To maintain this balance, for every new sponsor we brought in, we also had to accept more students. This created a cycle: more students led to higher costs, which in turn complicated the logistics and created a need for even more sponsors. It’s a delicate balance that can quickly get out of hand. And that is how our initial target of 250 participants ballooned to 350, making <strong>EEML 2025 the largest in-person edition to date</strong>, with participants from around 50 countries joining us in Sarajevo.</p> <div class="swiper"> <div class="swiper-wrapper"> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/tshirts%20and%20chairs.jpg" alt="A stack of EEML 2025 T-shirts next to rows of lecture chairs with fold-out tables." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/locations.png" alt="A map of Sarajevo showing the key choices for the EEML 2025 summer school." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/mef.jpg" alt="The large lecture hall of the Mechanical Engineering Faculty building, a potential venue for the school." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/holiday_sarajevo.jpg" alt="The exterior of the iconic Hotel Holiday in Sarajevo, the chosen venue for EEML 2025." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/chairs.png" alt="The chairs with fold-out tables that we found at SSST." loading="lazy"/> </div> </div> <div class="swiper-button-next"></div> <div class="swiper-button-prev"></div> <div class="swiper-pagination"></div> </div> <div class="caption mt-2"> The planning phase: choosing locations, hunting for chairs, and final preparations. </div> <h2 id="top-three-logistical-nightmares">Top three logistical nightmares</h2> <p>As EEML approached, some logistical worries kept me up at night right until the very end.</p> <ul> <li><strong>The Food:</strong> How do you feed 350 people, on a budget, without just serving dry sandwiches? Our friends at <a href="https://zmajcevabdzinica.ba/">Zmaj</a>, a fantastic local restaurant, became our unsung heroes. They’re famous for their ćevapi but completely transformed their operation just for us. They closed off the main part of the restaurant for a week, cooked whatever we needed, and worked at practically no profit. Their support was incredible, and frankly, they were the only partner who could tick all our boxes, even if it resulted in some long queues (as the memes below will attest).</li> <li><strong>The Chairs:</strong> This one still makes me laugh. The hotel’s conference hall was big, but their table setup could only fit 250 people. We needed to avoid tables and get our hands on chairs with those little fold-out tables for laptops and notes. After searching all over the city, we found exactly one institution that had them in the quantity we needed: the <a href="https://ssst.edu.ba/en">Sarajevo School of Science and Technology</a> (SSST). They generously lent us over 350 of them, but we had to transport the chairs ourselves. This involved one truck, two trips, and a team of volunteers (myself included) loading and unloading every single chair. Twice. To add a perfect, chaotic cherry on top, on the return journey, a car crashed into our truck, blocking traffic on one of Sarajevo’s busiest streets for nearly two hours. You just can’t make this stuff up.</li> <li><strong>The Internet:</strong> Anyone who has run a technical workshop knows that bad Wi-Fi can ruin everything. We were especially paranoid because the tutorials required students to download model weights, and the hotel’s internet had a questionable reputation. Thankfully, our platinum sponsor, <a href="https://www.bhtelecom.ba/">BH Telecom</a>, stepped in as our savior. They installed a dedicated high-bandwidth fiber line just for the event. We still had a few hiccups on day one, so we also had the hotel max out their own provider’s bandwidth. With two networks running in parallel, we were finally safe.</li> </ul> <h2 id="going-the-extra-mile-sometimes-literally">Going the extra mile (sometimes literally)</h2> <p>Beyond the big three, there was a mountain of smaller challenges leading up to and during EEML. Renting poster stands in Sarajevo was almost as expensive as buying new ones in Mostar, so we bought our own. Then a construction site popped up right in front of the hotel without any warning. We had to constantly monitor the work, especially when they fired up a machine that was literally shaking the entire hotel. And, of course, the noise peaked during one of our most anticipated lectures. 🫠</p> <p>Throughout EEML, we also tried to add personal touches that went beyond the academic program. We wanted participants to feel looked after from the moment they landed, in the spirit of Eastern European hospitality. This also led to one of our biggest organizational hurdles: picking up every single participant from the airport. It was a massive effort, and yes, even I was playing taxi driver. On top of that, we dealt with a constant stream of small crises, like the A/C failing in a tutorial room, that our team would immediately jump on to fix. It was these small constant interventions that were crucial in making the week run smoothly.</p> <p>The week itself was a marathon with daily lectures, tutorials, and a packed social schedule with welcome drinks, poster sessions, and a gala dinner. Thursday was usually a half-day off, but this year we decided to organize a trip to the Tunnel of Salvation museum and a guided city tour. The day was further enriched by a brilliant Estimathon competition organized by <a href="https://www.janestreet.com/">Jane Street</a> and evening drinks hosted by <a href="https://www.credoventures.com/">Credo VC</a>. While the participants networked, the organizers, speakers, and TAs snuck off for a quiet dinner of our own. A well-deserved moment of calm.</p> <div class="swiper"> <div class="swiper-wrapper"> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/suad%20rector%20president%20and%20I.jpg" alt="The author pictured with local dignitaries, including a university rector and an association president." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/viorica%20organizing%20team.jpg" alt="Viorica's opening slide showing the organizing team members." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/speakers%20and%20topics.jpg" alt="The author's slide introducing the speakers and their corresponding topics." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/organizers%20and%20volunteers.jpg" alt="The (almost) full team of EEML 2025 organizers and volunteers smiling together." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/ferenc.jpg" alt="Speaker Ferenc Huszár giving his Intro to Deep Learning lecture to a full audience at EEML 2025." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/tutorial.jpg" alt="Participants working on their laptops during a hands-on tutorial session." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/welcome%20drinks.jpg" alt="Attendees mingling and networking during the welcome drinks reception." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/spica.jpg" alt="Participants in the bar area of the poster session." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/posters.jpg" alt="Attendees discussing research during the EEML 2025 poster session." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/panel.JPG" alt="A panel of speakers on stage answering questions from the audience." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/estimathon.jpg" alt="Participants collaborating in teams during the Jane Street Estimathon competition." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/speaker_dinner.jpg" alt="The speakers, organizers, and TAs enjoying a quiet dinner together." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/gala%20dinner.jpg" alt="A view from the EEML 2025 gala dinner." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/certificates.JPG" alt="The organizing team awarding certificates of appreciation at the end of the school." loading="lazy"/> </div> </div> <div class="swiper-button-next"></div> <div class="swiper-button-prev"></div> <div class="swiper-pagination"></div> </div> <div class="caption mt-2"> A snapshot of the action-packed EEML week. </div> <p>The last day was shorter, with final lectures and student project presentations. We handed out certificates, including the much-anticipated <strong>best meme award</strong>, and had to wrap things up a bit sharpish as the hotel had a massive wedding booked for that evening.</p> <div class="swiper"> <div class="swiper-wrapper"> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/best%20memes%20award.jpg" alt="The organizing team awarding the best meme award." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/president.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/lanyards.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/lectures.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/questions.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/lobby%20music.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/zmaj.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/ice%20cream.jpg" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/ice%20cream%202.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/poster.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/razvan.jpg" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/wifi.jpg" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/road%20to%20gala%20dinner.jpg" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/sgd.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/cevapi.jpg" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/headache.png" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/adna.png" alt="Old site blog" loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/memes/end%20of%20eeml.jpg" alt="Old site blog" loading="lazy"/> </div> </div> <div class="swiper-button-next"></div> <div class="swiper-button-prev"></div> <div class="swiper-pagination"></div> </div> <div class="caption mt-2"> Memez! </div> <h2 id="the-aftermath">The aftermath</h2> <p>Hosting around 350 people from around 50 countries in my hometown was surreal. I’m so incredibly proud to witness ANNT grow from a handful of science enthusiasts into an association capable of successfully organizing a 350-person international event. I feel so happy to be part of a community that made an event like this a huge pleasure for everyone who attended and presented our country in the best of lights. We heard from so many people how they can’t wait to be back, and from some, how it was a life-changing event. That’s the best we could have ever hoped for!</p> <p>Of course, this would have been completely impossible without the army of people who helped make it happen. A huge thank you to:</p> <ul> <li>The <strong>core EEML team</strong> for bringing everyone together and for their passion in fulfilling EEML’s mission.</li> <li>The amazing <strong>speakers</strong>, <strong>tutorial leads</strong>, and <strong>teaching assistants</strong> who shared their invaluable knowledge and time.</li> <li>The entire <strong>ANNT</strong> team and our <strong>volunteers</strong>, who did so much of the heavy lifting and went the extra mile to make sure everything ran smoothly.</li> <li>Our <strong>sponsors</strong>, whose continued financial support is making EEML possible.</li> <li><strong>BH Telecom</strong>, <strong>SSST</strong>, and <strong>Zmaj</strong>, for stepping up to help solve some of our main logistical challenges.</li> <li>The many <strong>Friends and Supporters</strong> who provided help along the way, especially <a href="https://so.agency/">SO.. quantum marketing agency</a>, for the great work with the media outreach.</li> <li>The <strong>participants</strong> for bringing their enthusiasm, questions, memes, and the energy that makes every EEML super special.</li> </ul> <div class="swiper"> <div class="swiper-wrapper"> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/eeml%20participants.jpeg" alt="A large group photo of all the EEML 2025 participants." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/organizers%20with%20certificates.JPG" alt="The main organizers smiling and holding their certificates of appreciation." loading="lazy"/> </div> <div class="swiper-slide img-cover"> <img src="/assets/img/eeml/gdm%20org%20team.jpg" alt="A photo of the author with the core Google DeepMind organizing team." loading="lazy"/> </div> </div> <div class="swiper-button-next"></div> <div class="swiper-button-prev"></div> <div class="swiper-pagination"></div> </div> <div class="caption mt-2"> Final day of EEML. </div> <p>This last year was not what I had in mind a year ago. But I like to say that most often we don’t get what we plan for, but with the right mindset, we end up with much more. This experience was certainly that for me!</p> <div class="swiper"> <div class="swiper-wrapper"> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/eeml%20landing.png" alt="The official landing page graphic for EEML 2025 in Sarajevo, Bosnia and Herzegovina." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/Sarajevo%20panorama.jpg" alt="A beautiful panoramic view of the city of Sarajevo." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/sebilj.jpeg" alt="A photo of the iconic Sebilj wooden fountain in Sarajevo's Baščaršija old town with the EEML logo." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/country%20of%20origin.jpg" alt="Pie chart showing the breakdown of EEML 2025 participants by country of origin." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/education%20level.jpg" alt="Bar chart showing the educational background of participants (e.g., PhD, Masters, undergrad)." loading="lazy"/> </div> <div class="swiper-slide img-contain"> <img src="/assets/img/eeml/gender%20identity.jpg" alt="Pie chart showing the gender identity breakdown of EEML 2025 participants." loading="lazy"/> </div> </div> <div class="swiper-button-next"></div> <div class="swiper-button-prev"></div> <div class="swiper-pagination"></div> </div> <div class="caption mt-2"> EEML 2025 in visuals. </div> <p>I’ve tried to capture a small fraction of the EEML 2025 story here, but there were 350 other stories unfolding that week. I’d love to hear yours!</p> <p>And if bringing science and technology communities together is something you’re passionate about or you’d like to get involved with ANNT, please don’t hesitate to get in touch!</p> <script>
  document.querySelectorAll('.swiper').forEach((swiperEl) => {
    new Swiper(swiperEl, {
      loop: true,
      spaceBetween: 16,
      pagination: {
        el: swiperEl.querySelector('.swiper-pagination'),
        clickable: true,
      },
      navigation: {
        nextEl: swiperEl.querySelector('.swiper-button-next'),
        prevEl: swiperEl.querySelector('.swiper-button-prev'),
      },
    });
  });
</script>]]></content><author><name></name></author><category term="general"/><summary type="html"><![CDATA[BTS on Eastern European Machine Learning Summer School in Sarajevo.]]></summary></entry><entry><title type="html">Migration successful!</title><link href="https://hamzamerzic.info/blog/2025/website-migration/" rel="alternate" type="text/html" title="Migration successful!"/><published>2025-04-12T15:00:00+00:00</published><updated>2025-04-12T15:00:00+00:00</updated><id>https://hamzamerzic.info/blog/2025/website-migration</id><content type="html" xml:base="https://hamzamerzic.info/blog/2025/website-migration/"><![CDATA[<blockquote> <p><strong>TL;DR.</strong> I moved my eight-year-old WordPress-on-DigitalOcean robotics-tools site (mesh cleaner, model viewer, IKFast generator) to a containerized Cloud Run setup. About fifty people a month still use the tools, so the goal was preserving what works and future-proofing it, not rebuilding for its own sake.</p> </blockquote> <p>During my master’s in robotics, I built a few small tools for the parts of model and robot work I kept doing by hand: mesh cleanup, 3D model viewing, inverse-kinematics generation. I Dockerized them and exposed them via <a href="https://wordpress.org/">WordPress</a> on <a href="https://www.digitalocean.com/">DigitalOcean</a>, originally as a way to tighten my own research loop, then opened up because a few other people seemed to want them.</p> <p>Eight years later, the site was still running. To my surprise, over fifty people a month still used the tools. It was time to give the site some attention, without disrupting existing users.</p> <p>What started as a cleanup became a migration to <a href="https://cloud.google.com/run">Google Cloud Run</a>. Each tool was already containerized and stateless, so the move was mostly mechanical: I split the services into separate Cloud Run deployments, cleaned up the code, and put a <a href="https://gist.github.com/hamzamerzic/8b834e56d2dc6a8f49bcb4047dd819df">budget guardrail</a> in place that stops serving if my monthly budget is hit. The free tier on Cloud Run is generous enough that the tools should keep working for a long time without me touching them.</p> <p>The original toolbox is still available, now as separate Cloud Run services with the same URLs.</p> <p>Find the tools under <a href="https://hamzamerzic.info/projects/">Projects</a>:</p> <ul> <li>🔧 <a href="https://hamzamerzic.info/mesh_cleaner/">Mesh Cleaner</a> Clean and process 3D mesh files for physics-based simulations.</li> <li>🧿 <a href="https://hamzamerzic.info/3d-viz/">Model Viewer</a> Visualize 3D models and robots directly in your browser.</li> <li>🤖 <a href="https://hamzamerzic.info/ikfast/">IKFast Generator</a> Generate analytic inverse-kinematics solvers from <code class="language-plaintext highlighter-rouge">.dae</code> files using OpenRAVE’s IKFast.</li> </ul> <p>These tools were the part of my master’s I went back to constantly: computing inertial properties for dozens of objects, cleaning up meshes for simulations, and later, during my research assistantship, generating inverse-kinematics solvers for robot manipulators. Worth the time it took to package them properly back then.</p> <p>If you’re still using any of these, thank you. I hope the migration went smoothly. If not, feel free to reach out and let me know what’s broken.</p> <p>For nostalgia, here’s a little album of the old site:</p> <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/swiper@11/swiper-bundle.min.css"/> <style>.swiper{max-width:720px;margin:2rem auto;border-radius:.75rem;overflow:hidden}.swiper-slide img{width:100%;height:auto;aspect-ratio:4 / 3;box-shadow:0 2px 6px rgba(0,0,0,0.05)}.swiper-button-prev,.swiper-button-next{color:var(--global-theme-color);opacity:.4;transition:opacity .3s ease}.swiper-button-prev:hover,.swiper-button-next:hover{opacity:1}.swiper-pagination-bullet{background:var(--global-theme-color)}</style> <div class="swiper mySwiper"> <div class="swiper-wrapper"> <div class="swiper-slide"> <img src="/assets/img/old-site-blog.png" alt="Old site blog"/> </div> <div class="swiper-slide"> <img src="/assets/img/old-site-home.png" alt="Old site home"/> </div> <div class="swiper-slide"> <img src="/assets/img/old-site-tools.png" alt="Old site tools"/> </div> <div class="swiper-slide"> <img src="/assets/img/old-site-mesh-cleaner.png" alt="Old site mesh cleaner"/> </div> <div class="swiper-slide"> <img src="/assets/img/old-site-ikfast.png" alt="Old site ikfast"/> </div> </div> <div class="swiper-button-next"></div> <div class="swiper-button-prev"></div> <div class="swiper-pagination"></div> </div> <script src="https://cdn.jsdelivr.net/npm/swiper@11/swiper-bundle.min.js"></script> <script>
  const swiper = new Swiper('.mySwiper', {
    loop: true,
    autoplay: { delay: 5000, disableOnInteraction: false },
    spaceBetween: 16,
    pagination: { el: '.swiper-pagination', clickable: true },
    navigation: { nextEl: '.swiper-button-next', prevEl: '.swiper-button-prev' },
  });
</script> <div class="caption mt-2"> A peek at the OG website. </div>]]></content><author><name></name></author><category term="general"/><summary type="html"><![CDATA[Goodbye WordPress!]]></summary></entry></feed>