Home About Writing Let's talk
EN | FR

Writing

Technical deep-dives, architecture decisions, and lessons from building production systems.

GPU Spot Market Arbitrage: Finding Value in Real-Time

Architecture

GPU Spot Market Arbitrage: Finding Value in Real-Time

The dashboard said "cheapest available: RTX 4090 at $0.35/hr." But the API showed 24 offers under $0.30. The dashboard was optimizing for the wrong thing.

8 min read

The SOCKS5 Escape Hatch: Networking from Restricted Containers

Architecture

The SOCKS5 Escape Hatch: Networking from Restricted Containers

The GPU rental started. Tailscale installed. But nothing could connect out. The container was sandboxed. I needed a tunnel through the only hole they left open.

7 min read

Speed vs. Completeness

ML Pipelines

Speed vs. Completeness

When your enrichment service goes down, should your entire index stop updating? The two-phase ingestion pattern solves it.

8 min read

Your Batch Size Shouldn't Be a Constant

ML Pipelines

Your Batch Size Shouldn't Be a Constant

I hardcoded batch_size=8. Then I rented an A100 with 80GB of VRAM. My code used 24GB and left 56GB sitting idle. Throughput could have been 4x higher.

6 min read

Building a Form Templating Engine

Architecture

Building a Form Templating Engine

Every hospital had its own forms. Every department its own fields. Doctors requested modifications. Developers drowned in tickets. I built a templating engine. Doctors now configure their own forms.

9 min read

When Your Sources Lie

Data Architecture

When Your Sources Lie

What happens when three different systems claim different truths about the same product? Trust hierarchies solve it.

5 min read

Concurrency Limits Aren't Just for CPUs

Architecture

Concurrency Limits Aren't Just for CPUs

The queue said process 10 messages. My GPU said I can handle 4 at once. The queue won. The service crashed with OOM.

7 min read

Why I Version My ML Outputs

MLOps

Why I Version My ML Outputs

I upgraded the embedding model. Hundreds of thousands of items needed re-embedding. I kicked off the job. Three hours later, I realized half of them had already been processed by v2.

7 min read

When Your Taxonomy Doesn't Match Reality

ML Pipelines

When Your Taxonomy Doesn't Match Reality

The input said 'robe.' My taxonomy said 'Dresses.' String matching said 'no match.' But they meant the same thing.

7 min read

Turning an Internal Tool into a PWA

Frontend

Turning an Internal Tool into a PWA

The internal chatbot worked well. Too well. Teams wanted to access it from their phones, on the go, between client meetings. 'Can you make a mobile app?' I said yes. But not the one they imagined.

7 min read

When Your GPU Bill Becomes the Autoscaler

DevOps

When Your GPU Bill Becomes the Autoscaler

The queue had 10,000 items. I spun up a GPU. The queue emptied. The GPU kept running. For 6 hours. At $0.50/hour. I paid $3 to process nothing.

8 min read

Why Culture Beats Tech

Architecture

Why Culture Beats Tech

The biggest win wasn't technical--it was cultural. When consultants started answering each other's questions through the platform instead of email chains, we knew it worked.

6 min read

Not All Errors Deserve the Same Retry

DevOps

Not All Errors Deserve the Same Retry

Rate limited? Wait and retry. Blocked? Rotate proxy immediately. Parsing failed? Don't retry at all.

8 min read

When Stage 4 Fails

ML Pipelines

When Stage 4 Fails

Your GPU ran out of memory. Should 10,000 items disappear into a retry queue--or get processed anyway, just worse?

9 min read

Adopting a Design System Mid-Project

Frontend

Adopting a Design System Mid-Project

Every PR contained a new button variant. Every form had its own styles. Juniors spent more time on CSS than on business logic. I proposed adopting ElementPlus. Six months later, our development velocity had doubled.

6 min read

Why Your Search Needs Both Keywords and Vectors

Search

Why Your Search Needs Both Keywords and Vectors

Semantic search understood intent perfectly but missed brand names. Keyword search found exact terms but missed concepts. The solution combines both using Reciprocal Rank Fusion.

8 min read

The Database Lock You Didn't Need

Architecture

The Database Lock You Didn't Need

Two workers grabbed the same job. Both started processing. One overwrote the other's work. I almost added Redis. Then I remembered: my database already has locks.

7 min read

The Case for Internal Accelerators

Architecture

The Case for Internal Accelerators

Every SaaS needs auth, mailing, and background jobs. Instead of building these from scratch each time, I invested upfront in a battle-tested template.

7 min read