Field equipment for AI agents

Inference got 100× faster. That doesn’t make software faster — it makes new software possible. Six open-source CLIs built for the new physics.

recon ask "what is prospera" --json

Talk to the page

Inference is now fast enough to redesign a page while you're looking at it. gemma-4-31b on Cerebras is wired into this one — type a command below and it restyles the page in real time, with an honest latency receipt on every run. Real inference, live, no tricks.

// tool-call trace — real model output, real elapsed-ms offsets from submit

Speed is a material

60 tokens a second is a chatbot. 1,500 is a page that repaints before your finger leaves the key.

Typical GPU stream60 tok/s

done in 2.47 s

Inference got 100× faster. That doesn’t make software faster — it makes new software possible. Six open-source CLIs built for the new physics. 1,500 tok/s — Cerebras measured throughput. 14.4 s — recon: source-verified research brief, 19 supported claims, $0.15. 98 s — lens: 1,100/1,100 images indexed for $2.21; search in ~2 s. <$20 — inference cost to build each tool, one evening each. 528 ms — ambient copilot tick — full context read + suggestion. A research brief in 14.4 seconds. Claims verified, sources attached. 1,100 images indexed in 98 seconds. Any photo found in two. Six tools that assume inference is effectively free and instant. JSON out, stable exit codes, self-describing offline.
Cerebras, measured1,500 tok/s

done in 0.10 s

Inference got 100× faster. That doesn’t make software faster — it makes new software possible. Six open-source CLIs built for the new physics. 1,500 tok/s — Cerebras measured throughput. 14.4 s — recon: source-verified research brief, 19 supported claims, $0.15. 98 s — lens: 1,100/1,100 images indexed for $2.21; search in ~2 s. <$20 — inference cost to build each tool, one evening each. 528 ms — ambient copilot tick — full context read + suggestion. A research brief in 14.4 seconds. Claims verified, sources attached. 1,100 images indexed in 98 seconds. Any photo found in two. Six tools that assume inference is effectively free and instant. JSON out, stable exit codes, self-describing offline.

Side-by-side word-chunked streams calibrated to true token rates: typical GPU at60 tokens per second versus Cerebras measured at1,500 tokens per second, streaming identical sample text. The fast pane completes in 0.10 s; the slow pane completes in 2.47 s.

Word-chunked rendering at ≈1.3 tokens per word; rates are true token rates.

live run unavailable — simulation

The kit

Six tools, open source, measured. Numbers below are from live acceptance runs — nothing projected.

P1

recon

A research brief in 14.4 seconds. Claims verified, sources attached.

14.4 s · $0.15 · 19 supported claims (live)

cargo install --git https://github.com/treygoff24/recon

GitHub

P2

lens

1,100 images indexed in 98 seconds. Any photo found in two.

1,100/1,100 images · 98 s · $2.21 · search ~2 s

cargo install --git https://github.com/treygoff24/lens

GitHub

P3

exa-agent

The full Exa API — search, contents, research, websets — as one token-lean CLI. 68 commands where the MCP hands your agent a dozen, at a fraction of the context cost.

68 commands · self-describing offline

GitHub

P4

elv

The entire ElevenLabs API as a CLI — speech, cloning, dubbing, agents, 300+ operations. The MCP gives your agent ten of them; elv gives it all of them, for fewer tokens.

300+ operations, one envelope

GitHub

P5

delegate

Spawn subagents on any model from any provider — Codex, Gemini, Grok, DeepSeek, more — from one dispatcher. Isolated workspaces, reviewable diffs, token-efficient by design.

6+ model families · safe/work isolation

pip install "delegate-agent @ git+https://github.com/treygoff24/delegate-agent.git"

GitHub

P6

law

Legal research built for agent workflows: case law and statutes at tool speed, every citation verified against cases that actually exist.

agent-first citations, checked

GitHub

The fastest tools your agent will touch

Speed is the whole advantage: a tool that answers in seconds stays in the loop, one that answers in minutes gets designed around. So we timed everything. Every figure below is from a live acceptance run — real data, real APIs, nothing rounded up.

All figures from live acceptance runs, 2026-07-01. Local builds, real data, real APIs.

One contract

Every Fieldcraft tool speaks the same envelope: JSON out, stable exit codes, self-describing offline, budget-metered. An agent learns the whole interface in one read.

  • One JSON envelope for every response
  • Stable, documented exit codes
  • Self-describing offline — schema, flags, errors from the binary itself
  • Budget-metered — every call reports what it cost
  • The agent is the primary user; the human reviews

recon ask "what is prospera" --json

{
  "schema": "recon.cli.response.v1",
  "ok": true,
  "command": "ask",
  "data": {
    "question": "what is prospera",
    "outcome": "answered",
    "claims": [
      {
        "claim": "Próspera is a ZEDE in Honduras.",
        "sourceUrl": "https://example.com/source",
        "verdict": "supported",
        "published": "2026-07-01"
      }
    ],
    "searchTrail": [{ "query": "prospera law", "results": 4 }],
    "uncertainties": []
  },
  "costDollars": { "model": 0.09, "search": 0.04, "total": 0.13 },
  "diagnostics": { "durationMs": 12100, "retries": 0 }
}
0
ok
2
auth
4
network
6
rate limit
10
partial — budget hit, work reported

The method

The tools build the tools

Fieldcraft started as a bet in January 2025: an agent loop that couldn't declare itself done until the work actually was. That became the autonomous loop — the /goal architecture, completion enforced by a Stop hook — shipped long before Anthropic or OpenAI offered anything like it natively.

The loop grew into an autonomous dev kit. It built delegate, which put every model family on tap. delegate grew into foundry: waves of different models implement, review, and fix each other's work while a coordinator re-verifies every claim. Then foundry started shipping products — recon and lens were each built in one evening, for under $20 of inference, receipts attached.

The tools build the tools. This site was built the same way — and the log of what broke is in the repo.

Built by Trey Goff and the team behind Praxient.

wright

Builds and maintains the agents.

memorum

Memory that survives across sessions.

agentlinters

Lint rules for agent-written code.

llm-council

Multiple models deliberate hard calls.

Work with the team

Fieldcraft is the workshop: tools we build for our own agents, published with their receipts. Praxient is the practice — the same discipline, bounded, measured, owned, pointed at a business’s operation.

Work with Praxient