Inline Relay

Home /
Inline Relay

A Claude Code plugin that moves code review conversations into the source file itself.

Most code review tools assume the conversation lives somewhere other than the code. GitHub PRs and issues, Linear comments, Slack threads, Google Docs. The discussion is separated from the artifact, and the artifact is what you actually need to change. inline-relay moves the conversation into the file.

You leave a question as a tagged comment. Claude reads it, decides what to do, and adds a response in another tagged comment indented underneath. The whole thread is plain text inside the source file, versioned with git, visible in diffs, and mechanically formatted by an MCP server so it cannot get into a corrupt state.

This page walks through how the plugin works, the design choices that surprised me, and the closely related variant of this idea that uses draft pull requests as the review surface instead.

Why an MCP server, not a prompt

I tried to do this with prompting first. A careful system prompt, explicit formatting rules, reminders on every turn. It did not work. No amount of instruction got the agent to follow the thread procedure exactly and every time. Getting it slightly wrong was close to the default outcome, not the exception: the formatting would drift, threads would land in states the parser could not read, and the conversation would quietly corrupt.

So the procedure moved out of the prompt and into an MCP server that owns thread formatting outright. The agent never edits thread structure directly. It calls a tool, and the tool writes the canonical form. Deterministic enforcement succeeded exactly where careful prompting failed.

Newer models follow instructions better than the ones I started with, but that has not made the enforcement redundant. I still use inline-relay at work, and the mechanical guarantee still earns its place. When the cost of one corrupt thread is a confused agent and a manual cleanup, “usually formats it right” is not good enough.

What a Thread Looks Like

An inline-relay source file: an AUTHOR question with an AGENT reply threaded beneath it, and a second thread requesting a code change

A thread starts as a single // AUTHOR: comment with a trailing empty // cursor. When Claude processes it, it appends an // AGENT: reply and a fresh empty cursor, exactly like the top thread above. If you write anything in that trailing cursor the thread is awaiting_agent again; leave it empty and it is awaiting_author, and Claude leaves it alone.

The bottom thread is a change request. A past-tense rule forces the agent to do the work before it answers, so the reply is a report (“done. extracted…”) rather than a promise.

A few ways I use it

Ask a design question right next to the code and let the agent answer in place, as in the top thread above.
Request a change. The agent makes the edit and reports back in the same file, in past tense.
Resolve and commit. Writing done, reset, or commit in the cursor triggers a tool action instead of a reply. commit clears the resolved threads and stages the cleanup in one step.

Architecture

Content-Addressable Thread IDs

Thread IDs are derived from SHA256(file_path + first_comment_text)[:8], not line numbers. This makes IDs stable across file modifications. When lines shift up or down because of edits above a thread, find_thread_location() re-locates the thread by searching for its first text rather than relying on cached positions.

When duplicate thread IDs occur (the same first text in multiple places), automatic salt injection ([salt:xxxx]) ensures uniqueness, then threads are re-scanned.

Status Detection

Three thread states determine what the agent does next:

Thread ending with non-empty author text means awaiting_agent, and Claude should respond.
Thread ending with an empty author cursor means awaiting_author, and Claude does nothing.
Exact matches for done, reset, or commit in the cursor trigger action_required with tool-specific guidance.

MCP Server

A FastMCP-based server with four tools:

get_threads: scan files, find all conversation threads, detect status, report cross-repo thread counts.
respond_to_thread: add a response with mechanical formatting (correct indentation, trailing placeholder).
dismiss_thread: remove a completed thread from the code.
clear_and_commit: remove all resolved threads and commit the cleanup.

Edit Guards

Pre-tool-use hooks prevent Claude from accidentally modifying or deleting thread markers during normal editing. The guards recognize the role-tagged comment patterns and block Edit/Write tool calls that would destroy in-progress conversations. The MCP server has exclusive write access to thread content.

Plugin self-editing is detected and exempted, which is necessary for developing the plugin’s own documentation.

Quality Enforcement

Three rules enforce a writing quality bar at format-check time:

Past-tense forcing. Future-tense detection warns when responses contain “will add” or “going to fix.” This encourages a “do the work first, then respond” workflow rather than promises.
Duplicate response blocking. If a thread is awaiting the developer and Claude tries to respond with the same text as the previous response, it is blocked.
Inline comment normalization. Cases where a role-tagged comment appears on the same line as code are mechanically moved to a new line above.

Testing

85 tests across 21 test classes cover thread scanning, response formatting, dismissal logic, and edge cases. The interesting ones:

Line drift resilience. Thread operations survive line number changes from edits above.
Duplicate detection and auto-salting. Content-addressable IDs with collision handling.
Action command parsing. Exact match vs. partial match. “done” triggers an action, “done with refactoring” does not.
Plugin directory protection. Prevents scanning the plugin’s own files, which is an infinite loop risk.
Hook plugin detection. Edit guard logic for allowing plugin self-edits during development.

The PR-Based Variant

At a previous role, I worked on a variant of this same concept that uses draft pull requests as the review surface instead of inline source comments. The reviewer opens a draft PR, leaves review comments through GitHub’s UI, and Claude responds through the GitHub review API. This turns out to be even more natural in some ways. The PR diff view is already designed for exactly this kind of conversation, and the threading is built into GitHub’s UI.

Both approaches share the same core insight: code review should happen where the code is, with mechanical formatting that prevents thread corruption. inline-relay is the open-source expression of that pattern, simplified to what fits inside a source file.

What to Take From This

If you are designing review tooling for an AI-augmented codebase, two things from inline-relay are worth lifting.

Put the conversation in the artifact. Anywhere else, and the conversation goes stale or gets lost. In the file, it is naturally next to the code being discussed, naturally versioned with the code, and naturally visible in diffs.

Make the format mechanical, not editorial. Humans are bad at consistent formatting. Hooks and MCP servers are good at it. If you let the agent freely format thread responses, threads drift into states the parser cannot understand. Lock the format down at the tool layer.

The full source is on GitHub at synodic-studio/inline-relay, MIT-licensed.