AI Reliability Engineering (AIRE) - Creating Dependable Humans
It’s a little after 5p, and I’m about to wrap up for the day. As I’m starting to shut things down, I get a message from my boss:
It’s a little after 5p, and I’m about to wrap up for the day. As I’m starting to shut things down, I get a message from my boss:
As organizations start to deploy AI agents in earnest, we are discovering just how easy it is to attack these kind of systems. I went into quite some detail about how “natural language” introduces new attack vect...
The Model Context Protocol (MCP) and Agent 2 Agent (A2A) specification are similar RPC style protocols that specify interaction between Agents and Tools (MCP) and Agents and other Agents (A2A). They both focus on...
I was recently chatting with Matt McLarty and Mike Amundsen on their podcast about a recent blog I wrote about describing APIs in terms of capabilities. One thing that came up was the idea of describing APIs with...
Enterprise application architecture is once again on the verge of transformation. We’ve moved from mainframes to client-server, and recently from monoliths to microservices. Each evolution has been driven by the ...
The Model Context Protocol has created quite the buzz in the AI ecosystem at the moment, but as enterprise organizations look to adopt it, they are confronted with a hard truth: it lacks important security functi...
Anthropic introduced the Model Context Protocol (MCP) to standardize the way an LLM communicates with the “outside world” to extend its capabilities through tool/function support. The idea is if we could simplify...
The way LLMs run in Kubernetes is quite a bit different than running web apps or APIs. Recently I was digging into the benefits of the Inference Extensions for the Kubernetes Gateway API and I needed to generate ...
Recently, I’ve been building AI agents to help automate some parts of my workflow such as deep, meaningful technical research to contribute to technical material that I build. I am using the AutoGen framework fro...
Organizations need to think about what data gets sent to any AI services. They also need to consider the LLM may respond with some unexpected or risky results. This is where guardrails come in. There are a number...
NVIDIA NIM is a great way to run AI inference workloads in containers. I deploy primarily to Kubernetes, so I wanted to dig into deploying NIM using the Kubernetes NIM Operator and use GPUs in Google Cloud. I act...
Things change quickly in the land of technology. AI is the “hot” thing. I feel for the platform engineers out there struggling with technologies like Docker, Kubernetes, Prometheus, Istio, ArgoCD, Zipkin, Backsta...
You probably wouldn’t be surprised if I told you modern networking based on open source projects like Istio, SPIFFE, Cilium and others (See my paper about the CAKES stack) are typically consumed by what we now ca...
Platform engineering has emerged recently in part because organizations recognize the value in improving developer experience and the need to improve app developer delivery speed. And in typical organizational fa...
It’s been a while since I’ve blogged, and just like other posts in the past, this one is meant as a way to dig into something and for me to catalog my own thoughts for later. While digging into some issues for so...
Continuing on with my series about microservices implementations (see “Why Microservices Should Be Event Driven”, “Three things to make your microservices more resilient”, “Carving the Java EE Monolith: Prefer Ve...
Some of this I cover in my book “Microservices for Java Developers” O’Reilly June 2016 (launching soon!), but I want to give a more specific treatment of it here. I get questions from folks about NetflixOSS (it’s...
One of the advantages of building distributed systems as microservices is the ability of the system as a whole to withstand faults and unexpected failures of components, networks, compute resources, etc. These s...
I just delivered a 4-day deep-dive training course on Docker and Kubernetes to a customer in Atlanta. In true open-source spirit, I’d like to publish the source/slides and allow other people to benefit from it an...
A lot of teams I talk to recently are very interested in “DevOps” (whatever that means… seems to mean different things to different people?) and when we sit down and talk about what that really means, the directi...