What Happens When You Build an Industrial AI Agent in 48 Hours

7 min read
AI AgentsIndustrial AIHackathonRAGBuilding

I spent a weekend building an industrial AI agent for an offshore oil and gas platform. The data was simulated. The problem was not. Six months of operations. 96 assets. 175 sensors producing 3 million rows of 15-minute readings. 84 maintenance work orders. 7 real failures hidden in the timeline, seasoned with misleading anomalies. A folder of SOPs, P&IDs, and equipment manuals in PDF form.

The hackathon was called Hackazona, hosted by Cognite. The brief was some version of "build a next-generation industrial AI agent." I took it as "build something a field manager would actually open on Monday morning."

I built it solo. Here is what shipped, what mattered, and what I would take to a paying customer.

If you want to see it end to end, the Hackazona case study has the visualization I used to frame the problem and a walkthrough of the finished dashboard.


The Problem Underneath the Problem

The obvious problem in industrial operations is alarm fatigue. Too many sensors, too many thresholds, too many alerts. The interesting problem is one level down.

A field manager running a platform does not know what they do not know. The expertise that matters, the kind that tells you a temperature drift on one exchanger is real but the same signature on another is a known artifact of that loop's control strategy, lives in the heads of a small number of subject matter experts. Those experts only get pulled in reactively, after something has broken.

I started calling it the tribal engineer problem. A refinery on the coast has no way to reach into a venue ten states away where an SME is sitting with all the context. By the time the SME is looped in, production is down.

An agent does not solve that by trying to be the expert. It solves it by narrowing the field to a handful of good questions and handing the field manager something concrete to react to.


What I Built

A dashboard with one main loop: from 3 million sensor readings to a grounded root cause report to a scoped Telegram channel where the technicians on the ground can chime in.

Loading diagram...

The top left of the UI shows the top six questions the system has generated. Each question is a thesis written in the language a human would use. "Is E-301 fouled? Outlet temperature trending above normal while flow is dropping." Not a reading, not a threshold, a question you could read out loud in a shift handoff.

Click a question and the right side fills. A 3D sensor timeline over the breach window. The raw sensor logs one click away. A Run Root Cause Analysis button that reads the event context, pulls the relevant maintenance history, includes the right SOP, and comes back with a structured markdown report.

A Telegram panel in the right sidebar creates a scoped channel for the asset if one does not exist, posts the question, and opens the conversation.


The Architecture

Three decisions did most of the work.

DuckDB Instead of a Hosted Time Series Database

The timeseries file is heavy. 3 million rows. The reflex is to stand up TimescaleDB or a cloud equivalent and index it. DuckDB reads it straight off disk with zero setup. I pre-built a platform.duckdb file once and the backend opens it read-only on startup.

Loading diagram...

Sub-second queries against arbitrary sensor windows. No infrastructure to manage. No Postgres to tune. For a 48-hour build this was the single highest-leverage call.

Questions as the Primary Abstraction

Every breach detection system I have seen surfaces every breach. It is the wrong default. The signal to noise ratio is terrible and the interface feels like a security monitor instead of a briefing.

I ranked events by severity and breach count, rolled up the top six across the whole platform, and rewrote each one into natural language. A field manager reads six lines and knows what to ask about. Everything else is available, but nothing else is loud.

RAG Folded Into the RCA Prompt

The instinct is to stand up a separate retrieval pipeline: vectorize the PDFs, build an index, run a nearest neighbor search at query time. For 11 PDFs and a 48-hour clock, that was overkill.

I built the event context directly, pulled the related sensors, the recent maintenance, the nearby failure records, and the matching SOP, and packed all of it into a single prompt. Slower calls, richer answers, no vector database to maintain. This does not scale to 10,000 documents. It is exactly right for a hackathon surface area.


The Thing I Did Not Expect

The hardest part was not the LLM call. It was the prompt's context section.

I spent more time on the build_event_context and format_context_for_llm functions than on anything else. The model is only as good as what it reads, and the context for an offshore platform event is not clean. You have sensors with different units, quality flags that mean different things, asset hierarchies that skip levels, maintenance records written in shorthand by a dozen different technicians.

Getting the context right meant making decisions I had no subject matter expertise to make. I leaned on the simulated data's schema and the SOP language to stay grounded. The lesson that generalized: when you are building an agent over a domain you do not own, the context builder is the domain model. The prompt is just the last mile.


What I Would Build Next

Three things, in order.

Feedback writes back into the agent. When the SME responds in the Telegram channel, that response should update the question's data section and feed into the RCA prompt for the next similar event. The agent should get smarter inside a single shift, not over a quarterly retraining cycle.

A case library. Every resolved anomaly becomes a retrievable case. New events get similarity-matched against resolved ones before the LLM call. The agent gets to ride on the work that has already been done, and the expensive LLM calls get cheaper over time.

A handoff artifact. At the end of every RCA, the system should generate a one-page SOP update proposal. The tribal knowledge that got surfaced in the chat becomes a durable document instead of evaporating into channel history.

None of these are hackathon features. They are what I would scope with a customer in week one.


The Transferable Pattern

The tribal engineer problem is not about refineries. Every organization has a small number of people who hold the working knowledge of how the business actually runs, and the rest of the team pays a tax every time those people are not in the room.

The pattern Hackazona made obvious: AI agents are good at generating hypotheses, pulling the right context, and handing a human a starting point. They are not good at being the final answer. The system works because the agent narrows the field to six questions and hands the expert something concrete to react to. The expert does the last mile.

That shape transfers to sales ops, legal review, customer support triage, and any domain where the judgment calls are the bottleneck. If you are wondering whether this shape applies to your business, the answer is almost always yes, and the interesting question is where the tribal knowledge is hiding.


About

I am Brian Sowards. I run an AI consulting practice at sowards.ai. If you want the full walkthrough of what I built at Hackazona, the case study has the visualization and the architecture in one place. If the tribal knowledge pattern resonates for a domain your team is wrestling with, book a call and we can talk through where it is hiding.

← Back to Blog