What does Arklow want to accomplish?

  • Arklow is a traffic light company. Agent work can be infinite, hardware is constrained.
  • We help that work get done durably, and at a rate that matches downstream demands.
  • By being the brains of when work can get done, and when it should get done. We let agents, and developers focus on building.
  • We focus on near-realtime, and long running workloads.

Vision

I like to think about Arklow, as the control rod. While we build more and more powerful models, and operate increasingly longer amounts of work - it drags on me to think of finding ways to enable these things to properly respect the constraints of our current time. An agent that decides to spawn hundreds of subtasks, is usually unaware of the broader limits of the system; if it is aware, there isn’t really anything that forces it to respect it, other than hard determinism. Hard determinism here, is the Traffic light. Being a sentinel against a growing volume of work, or cars in this case.

If the road buckles, and clogs up with cars - nothing moves. Similarly, how does an Agent reason itself out of a highly fluid problem if the resources it needs to even reason with - are unavailable or overloaded.

I want Arklow to build these control rods, to help people continue to build these systems - and allow these systems to have a watchdog.


Background

2024 - Replit

Just before the real takeoff of LLM-based agents, I joined Replit to work on anti-abuse engineering. In short this was finding engineering solutions to reducing malicious actors abusing the platform through excessive resource usage, or fraud. Surprising things occur when you’re tasked with cleaning up/banning thousands of users all at once; it puts strain on our infrastructure.

To account for this, a large part of the engineering effort was building from first principals - a queue system to decide when work should get done. Similarly, it needed to be easy for other developers on the team to add new types of work, to be routed by the queue.

At this time, our constraint was suffocating our database.

2025 - Bland

At Bland, the same problem emerged. Our customers wanted to perform thousands of calls per day, a piece. If we wanted to satisfy all the load, at all times - our GPU spending would be considerably more higher. Customers cared that their calls went out, and were durably made - a few minutes here or there didn’t matter. Some customers cared about calls going out super fast. Prior to this, we would regularly get overloaded - and the quality of the calls would suffer from overloaded GPUs.

The same demand was asked of me here again, to build something that performs the critical job of deciding when work can be done.

In this case, our constraint was GPU cost, spending, and availability. If we had an easy way to spread out work, our GPU needs were smoother.

So we see the common thread, that surrounds both AI, and non-AI workloads. It’s hard to focus on what you’re building when you’re stuck having to decide when it’s safe to perform work. Some work needs to prioritize latency, some, throughput. You want a smooth experience no matter your constraints.

What if there was something that took in your workload, listened to your infrastructure, and told you when it was safe to perform the work? That’s Arklow.


Product

Arklow consists of four visible pieces of the product for the end user.

Ingress

A highly durable input endpoint. HTTP, MCP, gRPC - whatever.

  • Enriches inputs - token count, work size, weighting.

Metrics

Collects, or allows the submission of metrics about your infrastructure. Potentially make decisions on your behalf.

  • Planned: Integrate with K8s, Datadog, or hosted GPU providers.

Engine

The brain, taking into account the outstanding work received in the Ingress, combined with your Metrics to decide when to prioritize and perform the work.

  • You tell us the ideal state you want to be in, or allow us to find it out ourselves for you.
  • We’ll take into account things like: how long do you take to acknowledge the work was done. Are you failing to respond in time? Are your Pxx%‘s going weird?

Egress

Pull/Push work to be performed.

  • Pull The end user asks us what work is available and safe to be performed.
  • Push We push work over a variety of connectors, MCP, again - whatever.