The Experience
You are lying on your bed. Your laptop has four terminal windows arranged in a 2x2 grid. Top-left is refactoring authentication. Top-right is writing content. Bottom-left is debugging a deploy. Bottom-right is investigating a production bug. You pick up your phone and open the dashboard. Four tiles, same 2x2 grid. Top-left tile maps to top-left terminal. Bottom-right tile maps to bottom-right terminal. Each tile has a dot. Green means that agent is working. Red means it is done. Yellow means it needs you.
You see a red dot in the top-right tile. You already know what that is without reading the label. It is the content agent, because that is where the content terminal sits on your screen. You tap it and type: "Start the next chapter." The dot turns green. You glance at the other three tiles. All green. You put the phone down.
Twenty minutes later you check again. Two green, one red, one yellow. The yellow one is bottom-left. You know it is the deploy debugger because that is where you put it. You read its question, type a two-sentence answer, and the dot turns green. The red one finished its task. You type a new one. Four agents, all visible, all reachable, from your phone.
Open the live dashboard to see it running.
What It Is
Hive is a local daemon that runs on your machine. It discovers every Claude Code instance you have open, shows their status on a dashboard, and gives you a way to message any of them from your phone or a second screen. The dashboard tiles mirror your terminal layout. Top-left terminal, top-left tile. You manage agents by where they sit, not by what they are called.
The thing that makes running four agents useful is that each one builds up its own context over time. One is an hour deep into an authentication refactor and knows every file it has touched. Another has been debugging a deploy for thirty minutes with a mental model of the infrastructure. These are not four workers splitting one task. They are four independent contexts, each deeply embedded in a different problem. A fresh agent starts from zero. An agent that has been working for an hour starts from everything it already knows. The dashboard is how you keep track of all of them without re-reading everything each time you switch your attention.
The whole system was built using the agents it manages. Four Claude Code instances working on the daemon, the dashboard, and each other's output while I directed what to build and resolved conflicts. They wrote all the code.
The Gap
Open Claude Code right now. It works well. You give it a task, it reads your codebase, runs commands, writes code, iterates. Within a single terminal, the experience is strong and getting stronger every quarter. Now open a second instance. And a third. And a fourth.
There is no management layer. No way to see which agent is working and which one finished. No collision prevention when two agents edit the same config file. No mechanism for one agent to ask another about the API schema it just designed. You are left alt-tabbing between terminals, manually copying context from one window to another, hoping nothing breaks while you are looking at a different screen.
Every major AI lab is racing to build smarter agents. Better reasoning, longer context windows, more autonomy. Anthropic, OpenAI, Google, all of them. The models improve every quarter. But none of them are shipping the infrastructure for running fleets of these agents across real projects with real coordination needs. They build the worker. Nobody builds the foreman's tools.
This is the same gap that existed before Kubernetes. Docker made containers easy to run. Running twenty of them across a distributed system with health checks, restart policies, and resource isolation required infrastructure Docker never shipped. Containers got smarter every year. Kubernetes got more necessary, not less. The same dynamic is playing out with AI agents right now.
How It Works
The daemon runs locally, managed by launchd. It has an API for coordination, a WebSocket server for the live dashboard, and a tunnel that makes the dashboard accessible from any device. It snapshots its state every 30 seconds, so if you restart your computer, the routing and queues restore automatically.
Auto-Discovery
Open a terminal and run Claude Code. Within three seconds, it shows up on the dashboard. No registration, no config. The daemon scans running processes, finds each one's working directory and session file, and adds it. Close the terminal, it disappears. Open a new one, it appears.
Spatial Mapping
Arrange four terminal windows in a 2x2 grid. Open Claude Code in each one. Hive assigns each terminal a quadrant based on when you opened it: first becomes Q1 (top-left tile), second Q2 (top-right), third Q3 (bottom-left), fourth Q4 (bottom-right). The dashboard mirrors your screen.
When you see a yellow dot in the bottom-left tile on your phone, you already know which terminal that is because you can see the bottom-left terminal on your laptop. The position is the label. Close a terminal and its tile disappears. Open a new one and it fills the next slot.
Status Detection
Each tile shows green, red, or yellow. Getting the color right takes a three-layer detection system: real-time hooks from Claude Code's tool lifecycle, pattern analysis of session log files, and CPU-based idle detection as a fallback. The layers cross-validate each other. One method alone produces false positives. Three together produce a reliable signal.
Auto-Pilot
Claude Code pauses for permission prompts. "Allow this bash command?" "Approve this file edit?" With one terminal, you click yes and move on. With four agents, these prompts stack up. One agent pauses, you do not notice for ten minutes, and an hour of potential output is lost.
Auto-pilot handles it. Routine prompts auto-approve within a three-second grace window. The dashboard shows the prompt briefly so you can override if needed. If you do nothing, the agent continues. A watchdog also monitors for stuck loops where an agent keeps calling the same tool repeatedly, and flags it on the dashboard so you can step in.
Coordination
Four agents working in shared codebases will eventually collide. Two editing the same file. One designing an API another needs to consume.
Hive handles this with a few primitives. Inter-agent messaging lets any agent send a prompt to any other. Advisory file locks let an agent claim a file and get a warning if someone else holds it. A shared scratchpad stores working context that auto-expires. An artifact tracker records which agent modified which files. And a conflict detection API checks whether another agent recently touched a file you are about to edit.
There is also a task queue. Push a task and the daemon dispatches it to the next idle agent. You can tag related tasks with the same workflow ID, and when one step finishes, the next agent automatically gets a summary of what was done and which files changed before starting its task.
Compound Learning
An agent spends twenty minutes debugging a path resolution bug. It finds the fix. The session ends. Tomorrow, a different agent hits the same problem and spends another twenty minutes arriving at the same solution.
Hive's learning system prevents this. When an agent solves something non-obvious, it writes the lesson to a file in the project. Every future agent reads that file before starting. The lessons compound across sessions and across agents. After months of running, the system knows things about your projects that no fresh agent could replicate.
Everything Else
Talk to agents from anywhere. Tap any tile, type a message, and it goes straight to that agent's terminal. If the agent is busy, messages queue up and drain automatically when it is ready. The 2x2 grid on your phone matches your screen, so you always know which agent you are talking to.
Flag agents for later. Each tile has a small circle in the corner. Tap it and the tile turns orange. Use it to mark which agent you want to come back to, or which one has something you have not reviewed yet. Tap again to unflag.
Walk away. Start four agents, close your laptop, go do something else. Come back and the dashboard shows you what happened. Green tiles kept working. Yellow tiles are waiting. Red tiles finished. You pick up where things paused without re-reading anything.
Put it on any screen. Open the dashboard on a tablet, a second monitor, or your phone. The colored dots update in real time. When one turns yellow, you notice it without switching windows. Install it as an app on your phone and it runs full-screen like a native app.
Push notifications. When an agent goes yellow, macOS sends a native notification with the project name and what it needs. You do not have to be looking at the dashboard to know something needs attention.
Four Contexts, Not Four Workers
Watch someone use a single Claude Code instance on a hard problem. It spawns internal subagents to parallelize: one reading files, one searching the codebase, one running tests. All of them sharing one context window, one session, one project. When the task finishes, every subagent's context disappears. This is good. Claude already handles internal delegation well and will only get better at it.
Now watch someone run four instances across four projects. One is an hour deep into an authentication refactor, with full context on every file it has touched. Another has been debugging a deployment pipeline for thirty minutes, with a mental model of the infrastructure built step by step. A third is writing content, tuned to the site's voice after reading the style guide and ten existing articles. A fourth is iterating on a video processing pipeline, aware of every edge case from previous runs.
These are not four workers splitting one task. They are four independent contexts, each deeply embedded in a different problem. A subagent spawned fresh starts from zero. An agent that has been working in a codebase for an hour starts from everything it already knows. That distinction is the entire point.
The spatial mapping is what makes moving between them fast. Q1 is not a name you memorize. It is the top-left terminal on your screen and the top-left tile on your phone. You see a yellow dot in the top-right, and you already know it is Q2 because that is where you put the content project an hour ago. The position carries the context. You do not look up which agent is which. You just look at where it sits.
The human moves between them. You check the dashboard, see that Q2 is stuck, read its question, and realize the answer is in Q1's project. You tell Q1: "What is the API schema you designed for the auth endpoint?" Q1 already knows, because it spent the last hour building it. It responds from deep context, not from a cold read of the codebase. You relay that to Q2. Or Q2 messages Q1 directly through Hive's coordination layer, and Q1 answers from the same deep context. Either way, the knowledge transfer happens between two agents that each have genuine understanding of their own domain. Not between a parent and a disposable subagent.
This is where the delegation question resolves. Yes, agents can dispatch tasks to each other through the task queue. But the valuable dispatch is not "split this task into four subtasks." A single Claude instance already does that internally. The valuable dispatch is: Q3 just changed the data schema in one project. Q1 is building the dashboard that consumes that data in a different project. Q1 needs to know. That is a cross-context bridge. No single instance holds both contexts. The human does, or the coordination layer does, but the agents individually do not.
Eventually, AI labs will ship native multi-instance awareness. Agents might share context across sessions automatically, know what other instances are working on, avoid collisions without advisory locks. That eliminates the manual bridging, not the workflow. You still want four deep contexts running across your portfolio. You still want visibility into all of them. You still want the learning to persist across sessions. What changes is how much the system handles automatically versus how much you handle by hand. The infrastructure stays. The overhead shrinks. The pattern holds.
Orchestration Is the Edge
Watch what happens during a real session. Four terminals are open. You type a single prompt into Q1: "Act as a full-scale auditor. Tell Q2 to act as a senior developer. Tell Q3 to act as an investor. They should each go deep into their roles, conduct their analysis, and feed their findings back to you. Your audit should read their responses, synthesize, and pass the complete package to Q4 for final formatting."
One sentence from a human. The system decomposes it. Q1 receives its role assignment and dispatches focused messages to Q2 and Q3 through the coordination API. Q2 drops into the codebase with developer eyes, auditing architecture, scalability bottlenecks, security gaps, and migration paths. Q3 reads the published articles and competitive landscape with investor eyes, evaluating market position, defensibility, and gaps. Both feed structured reports back to Q1. Q1 reads both, reconciles the technical reality with the market thesis, and dispatches a synthesis prompt to Q4. Q4 takes all of it and produces a formatted deliverable. Four agents, four roles, one coordinated output. The human typed one sentence.
No existing tool does this. Omnara, the closest competitor, is a monitoring dashboard. It shows you what agents are doing. It sends you a push notification when one needs attention. You respond manually. There is no dispatch, no role assignment, no inter-agent message routing, no auto-approval of routine prompts, no task queue, no file collision prevention. It watches agents. Hive directs them.
CrewAI lets you define agent roles and task flows in Python. You write code to set up roles, define delegation chains, specify output formats. That works for developers building automated pipelines. It does not work for someone who wants to type "audit this from four angles" and have it happen. The orchestration layer that takes natural language, decomposes it into structured directives, and routes each part to the right agent with the right role does not exist in any shipping product.
This is the real gap. Not visibility. Not status detection. The ability to take a complex directive, split it across multiple agents with different expertise, coordinate their outputs, and synthesize the results. A prompt compiler that turns one human sentence into a multi-agent workflow. The dashboard is how you see it happening. The orchestration engine is what makes it happen.
Where It Fits
There are three layers in the emerging agent stack. The bottom layer is the agents themselves: Claude, GPT, Gemini. Anthropic, OpenAI, and Google are pouring resources into making these smarter. Better reasoning, longer context windows, tool use, computer use. This layer is improving fast and will continue to improve. It is not something any individual needs to build.
The top layer is the thesis for why agent fleets matter at all. The argument that one person directing multiple agents produces the output of a team. That the PM role becomes an agent operator role. That the human serves as the bridge between independent AI workers who cannot see each other. This layer is written and published.
The middle layer is the infrastructure for actually running fleets of agents on real projects with real coordination needs. Status visibility, collision prevention, dispatch automation, compound learning. This is the layer that does not exist yet. The AI labs are focused on making individual agents better. The application layer is building single-agent products. The operational infrastructure for multi-agent work is an open gap.
Hive sits in that middle layer. Not smarter agents. Not a new AI product. The plumbing that lets one person run as many agents as possible and not lose track of what is happening.
The broader thesis connects here. Every person around you has signal that would make AI output better. Someone in finance sees incentives. Someone in operations sees bottlenecks. Someone in design feels what is wrong before naming it. Tech Is Not Tech Anymore traces the adoption gap, The Compound traces the extraction pattern, The Positive Loop traces the acceleration. None of those people use AI because the interface is a terminal. Fleet orchestration is the mechanism for channeling all of that signal, different agents with different roles synthesizing different perspectives into one output. But orchestration is invisible if the interface requires engineering. Hive is the layer that makes the mechanism accessible. Stoplights instead of terminals. A phone instead of a laptop. The orchestration disappears. What remains is a person directing work in plain English.
The system gets smarter from two directions at once. From below, the models improve. Context windows grow from 128k to 200k to a million tokens. Each agent can hold more of the project in its head, work longer without losing track, handle more complex tasks autonomously. That is the AI labs' job. They pour billions into making the engine faster. Every quarter, the agents you are running become more capable without you changing anything.
From above, the personal context compounds. Every editing session adds corrections to the writing-voice file. Every debugging session writes a lesson back to the project's knowledge base. Every article benchmarks against the gold standard from two months ago. The agents start each session from everything that was learned before. A fleet that runs for six months knows things about your projects, your voice, your patterns that no fresh agent could replicate. The model gets smarter because the lab improved it. The system gets smarter because it remembers what happened last Tuesday.
These two forces multiply. A smarter model reading a thicker knowledge base produces better output than either force alone. And both forces make the management layer more necessary, not less. The smarter the agents get, the more of them you run. The more personal context accumulates, the more valuable each session becomes. Four agents today, ten next year, twenty when context windows and hardware allow it. The infrastructure that lets you manage the fleet is what makes scaling the fleet possible.
The End State
Right now Hive runs four Claude Code instances on one machine. That is the proof of concept. The end state is bigger than Claude, bigger than Anthropic, bigger than any single model provider.
Think about Find My iPhone. You open one app and see every Apple device you own on a map. Green dot, it is online. Grey dot, it is offline. You do not open a separate app for your MacBook, a different one for your iPad, a third for your AirPods. One visual layer shows you everything. The value is not in the devices. It is in the map.
Now replace devices with AI agents. You are running a Claude instance refactoring your backend. An OpenAI Codex agent writing tests in a different repo. A Gemini agent researching competitors. Each one came from a different provider, uses a different API, runs in a different environment. Today, you alt-tab between three separate interfaces with three separate mental models. No shared view. No coordination. No way to glance at one screen and know what all of them are doing.
The end state is a universal visual layer. One dashboard. You hit "Add Instance" and pick a provider: Anthropic, OpenAI, Google, whoever ships next quarter. The agent appears as a tile. Green means working. Red means idle. You type into the tile and your message goes to that agent, regardless of which company built it. The coordination layer underneath handles status detection, file locks, conflict prevention, compound learning, all of it, across providers.
This is not a wrapper around APIs. It is the management interface that sits above all of them. The same way Kubernetes does not care whether your container runs Node or Python or Go. It cares whether the container is healthy, whether it needs to be restarted, whether it is consuming the resources you allocated. Hive does not care whether the agent is Claude or GPT or Gemini. It cares whether the agent is working, whether it is stuck, whether it is about to collide with another agent on the same file.
Every AI lab is racing to build the best individual agent. None of them are building the layer that lets you run agents from multiple providers simultaneously and manage them as a fleet. They have no incentive to. Anthropic wants you using Claude. OpenAI wants you using GPT. Google wants you using Gemini. The visual layer that treats all of them as interchangeable workers in a single fleet is not a product any of them will ship. It is a product that sits above all of them.
Think about it like driving. Before cars existed, nobody knew how to drive. You did not fix that by making the engine more powerful. You fixed it by building the car: a steering wheel, a speedometer, mirrors, a dashboard. The engine matters. But the interface between the human and the engine is what turned driving from a mechanic's skill into something anyone could do. Every AI lab is building a faster engine. Nobody is building the car.
There is a more recent version of this pattern. Robinhood did not invent options trading. Options existed for decades. Wall Street understood them. The instruments were complex, the interfaces were Bloomberg terminals, and the barrier to entry was expertise. Robinhood wrapped the same complexity in a visual layer anyone could use. Green means up, red means down, swipe to trade. The underlying financial instrument did not get simpler. The interface did. Duolingo did the same thing with language learning. The grammar did not change. Streaks and XP made it feel like a game instead of homework. The learning stayed the same. The surface transformed who could access it.
Agent orchestration is in that exact moment right now. The orchestration exists. File locks, task queues, status detection, conflict prevention, message routing. Engineers access it through Python frameworks and CLI tools. Everyone else opens four terminal windows and hopes nothing breaks. The dashboard is the Robinhood layer for AI orchestration. It does not simplify the orchestration. It wraps it in a visual surface that determines who can use it.
That is where this goes. The proof of concept is four Claude instances on a MacBook with a stoplight dashboard on your phone. The product is the first car for AI labor. Provider-agnostic. A steering wheel that turns natural language into fleet operations. A speedometer that shows status as color, not text. Mirrors that give you spatial memory across every running agent. The interface that lets any person, not just engineers, direct any combination of agents from any company through one screen. Green means driving. Red means stopped. Type into the tile. The agent moves.
What It Does Not Do
It does not make the AI smarter. The models still hallucinate. They still drift when context fills up. They still need restarting when a task runs too long. The dashboard does not fix any of that. It just makes those problems visible sooner. You see the drift happening instead of discovering it twenty minutes later in a log. You see the stall instead of wondering why nothing is moving.
The biggest limitation is context. When an agent's context window fills, it compacts memory and starts losing the thread. You can see this on the dashboard because the agent starts behaving differently. Longer yellow states, more frequent stops, output that drifts. You learn to recognize that pattern. Something is off. You restart the agent with fresh context. It picks up.
This is also not for everyone. You need to be running multiple AI agents to get anything out of it. If you are using AI one conversation at a time, you do not need this. It is for the situation where you have crossed into running parallel work and you are drowning in terminal output.
Try It
It has helped my workflow a lot. I can see what my agents are doing while they work, tell when something looks off, and send a message to correct it from my phone. The whole loop is describe, watch, adjust.