Ahmed Waregh

Hi, I'm Ahmed

I design and ship
backends for
complex systems.

Software Engineer — Backend · Platform · AI Systems.

I prototype in code; I care about latency budgets, reliability, and clean system design.

Experience

5+ years building production web systems

Specializes in

Frontend systems • Design systems • Distributed architecture • AI product UX

Open to

Senior Frontend • Staff Engineer • Platform Engineer roles

fraud-detection

Production-grade ML pipeline for real-time financial transaction fraud detection with model training, drift monitoring, and scoring API.

Python
ai-workflow-platform

Multi-tenant SaaS orchestrating AI workflows and integrations at scale.

TypeScript
saas-website-builder

Backend platform powering dynamic site generation and multi-tenant publishing.

TypeScript
ai-customer-support

Conversational AI backend with RAG pipeline and vector search.

Python
real-time-data-pipeline

High-throughput streaming pipeline for ingesting, transforming, and routing millions of events per second.

Go
distributed-rate-limiter

Token-bucket and sliding-window rate limiting as a standalone service, supporting multi-region consistency.

Go
schema-evolution

Research implementation of eight real-world schema evolution scenarios across a three-service microservices architecture, covering PostgreSQL migrations, REST API versioning, and event schema evolution with backward compatibility patterns.

Python
change-data-pipeline

CDC pipeline that streams database changes into an event log, supports consumers, replay, and schema evolution with a demo consumer that builds projections.

Python
retrieval-experiment-platform

A tool for testing and evaluating RAG retrieval pipelines by comparing chunking strategies, embedding models, and reranking methods using metrics like Precision@K and nDCG.

Python
designing-idempotent-apis

Production-quality research system comparing six idempotency strategies for a payments API domain, built with FastAPI, PostgreSQL, Redis, and RabbitMQ.

Python
hallucination-mitigation

Production-grade research system for evaluating, benchmarking, and mitigating hallucinations in enterprise LLM applications with multiple RAG variants and guardrail frameworks.

Python
config-service

Centralized configuration service used across internal systems for managing application settings and feature flags.

TypeScript
failure-recovery-patterns

Platform simulating microservice failures to evaluate retries, circuit breakers, bulkheads, and idempotency. Measures reliability, latency, and duplicate prevention to guide resilient system design.

Python
iac-maintainability-study

Comprehensive empirical study examining how structural design decisions in Terraform infrastructure-as-code affect long-term maintainability, drift susceptibility, and change management complexity.

HCL
llm-gateway

Production-ready unified API gateway for routing requests across multiple LLM providers with built-in rate limiting, response caching, cost tracking, and OpenTelemetry observability.

Python

How I work

Prototype in code, not mockups

I reach for a coded prototype before a finished design. It surfaces real constraints earlier.

Latency is a feature

I track render budgets and hydration cost. Perf profiling is part of the design review, not a post-launch task.

Write for the next engineer

Readable code, documented trade-offs, decision logs. Systems outlive their authors.

Projects

ProblemTransaction scoring latency was too high for real-time decisions; model drift was silent.
ApproachBuilt a streaming scoring API with LightGBM, added drift monitoring using PSI, and scheduled retraining on detected drift.
ResultScoring p95 under 12ms. Drift caught 2 weeks before accuracy would have degraded measurably.
ML pipelinereal-time scoringdrift monitoringcost optimization
PythonLightGBMFastAPIscikit-learnDocker
ProblemWorkflow steps were tightly coupled; one failure cascaded into lost jobs with no recovery path.
ApproachIntroduced a state-machine model per workflow run, persisted to Postgres, with BullMQ workers pulling from a durable queue. Idempotent step handlers allow safe retry.
ResultJob failure rate dropped from ~4% to under 0.1%. Recovery from worker crashes became automatic.
event-drivendistributed workersmulti-tenantstate machine
Node.jsPostgreSQLPrismaBullMQOpenAI
ProblemSite builds were synchronous and blocking; concurrent publishes caused database contention.
ApproachMoved builds to async workers with S3 artifact storage and a CDN invalidation step. Added per-tenant build queues to prevent noisy-neighbour problems.
ResultMedian build time fell from 8s to 1.4s. P99 dropped from 40s to under 6s.
versioned renderingbuild workersstorage pipelinemulti-tenant
Node.jsPostgreSQLS3CDNDocker
ProblemLLM responses cited wrong sources; hallucinated product details caused support escalations.
ApproachBuilt a RAG pipeline with citation enforcement: each answer must include a retrieved source chunk. Added a self-critique pass to flag low-confidence answers for human review.
ResultHallucination rate (as measured by automated fact-check) fell by 68%. Escalation rate down 31%.
RAG pipelinevector searchingestion pipelineconversation orchestration
PythonPostgreSQLPineconeOpenAIFastAPI

High-throughput streaming pipeline for ingesting, transforming, and routing millions of events per second.

stream processingexactly-once deliveryschema registrybackpressure control
GoKafkaClickHouseKubernetesgRPC

Token-bucket and sliding-window rate limiting as a standalone service, supporting multi-region consistency.

token bucketsliding windowmulti-region syncsidecar-ready
GoRedisgRPCPrometheusDocker

Research implementation of eight real-world schema evolution scenarios across a three-service microservices architecture, covering PostgreSQL migrations, REST API versioning, and event schema evolution with backward compatibility patterns.

backward compatibilityAPI versioningevent schema evolutiondatabase migrations
PythonPostgreSQLFastAPIKafkaDocker

CDC pipeline that streams database changes into an event log, supports consumers, replay, and schema evolution with a demo consumer that builds projections.

CDCevent sourcingstream processingschema evolution
PythonPostgreSQLKafkaDocker

A tool for testing and evaluating RAG retrieval pipelines by comparing chunking strategies, embedding models, and reranking methods using metrics like Precision@K and nDCG.

retrieval evaluationchunking strategiesembedding comparisonreranking
PythonRAGEmbeddingsNLP

Production-quality research system comparing six idempotency strategies for a payments API domain, built with FastAPI, PostgreSQL, Redis, and RabbitMQ.

idempotency patternsdistributed systemssaga patternoutbox pattern
PythonFastAPIPostgreSQLRedisRabbitMQDocker

Production-grade research system for evaluating, benchmarking, and mitigating hallucinations in enterprise LLM applications with multiple RAG variants and guardrail frameworks.

RAG pipelineguardrailscitation enforcementself-critique
PythonRAGLLMNLPpytest

Centralized configuration service used across internal systems for managing application settings and feature flags.

configuration managementinternal tooling
TypeScriptNode.js

Platform simulating microservice failures to evaluate retries, circuit breakers, bulkheads, and idempotency. Measures reliability, latency, and duplicate prevention to guide resilient system design.

circuit breakersretry patternsbulkhead isolationoutbox pattern
PythonFastAPIPostgreSQLRedisDockerPrometheusGrafana

Comprehensive empirical study examining how structural design decisions in Terraform infrastructure-as-code affect long-term maintainability, drift susceptibility, and change management complexity.

infrastructure as codedrift detectionmaintainability metricsreference architectures
TerraformHCLAWSPython

Production-ready unified API gateway for routing requests across multiple LLM providers with built-in rate limiting, response caching, cost tracking, and OpenTelemetry observability.

API gatewaymodel routingrate limitingcost tracking
PythonFastAPIRedisPostgreSQLDockerOpenTelemetry

Focus

Event-Driven Architecture

Async events, queues, reliable delivery. Designing systems where out-of-order messages and partial failures are expected.

Distributed Systems

Retries, idempotency, circuit breakers, eventual consistency. Building for failure rather than hoping for success.

Multi-Tenant SaaS

Schema-per-tenant isolation, billing metering, usage tracking, scoped workspaces.

AI Infrastructure

Ingestion pipelines, vector search, RAG architectures, model orchestration. Latency and cost budgets matter.

Platform Engineering

Developer tooling, build systems, CI/CD abstractions, internal platforms that reduce toil.

Production Reliability

Observability, SLOs, alerting, incident response. Systems fail — the question is how they fail.

Writing

Technical papers and deep-dives on systems I've built and problems I've solved in production.

Hierarchical Chunking Strategies for Production RAG Systems: Balancing Retrieval Precision and Context Coherence

2024

Internal Technical Report

Retrieval-Augmented Generation systems degrade in precision as knowledge bases grow. This paper examines chunking strategies — fixed-size, paragraph-level, and hierarchical parent–child — across corpora of varying size and domain density. We introduce a re-ranking layer using cross-encoder models and show it recovers precision lost at scale while remaining compatible with standard vector-search backends. Benchmarks are run against a golden dataset of 2,400 support queries across four enterprise tenants.

RAGLLMVector SearchInformation Retrieval

Multi-Tenant Event Sourcing at Scale: Schema Isolation, Replay Semantics, and Operational Lessons

2024

Internal Technical Report

Event sourcing in multi-tenant SaaS systems introduces tension between tenant isolation and operational simplicity. We describe our experience migrating a 30-tenant workflow platform from a shared event log to a namespace-isolated architecture, covering schema-per-tenant trade-offs, aggregate snapshot strategies to bound replay time, and the tooling required to safely replay tenant event streams without cross-tenant interference.

Event SourcingMulti-TenantDistributed SystemsCQRS

Exactly-Once Delivery in Heterogeneous Sink Pipelines: Lessons from a High-Throughput Kafka Consumer Fleet

2025

Internal Technical Report

Exactly-once semantics in streaming pipelines are well-studied within a single system but become subtle when events must be durably committed to multiple heterogeneous sinks — analytics stores, billing aggregators, and alerting systems — in a single logical transaction. We detail the rebalance-listener pattern, idempotency key design, and per-sink commit protocols that enabled zero duplicate charges across 40M+ daily events on a Kafka-backed pipeline.

Apache KafkaStream ProcessingExactly-Once DeliveryDistributed Systems

Clock-Independent Rate Limiting: Eliminating Skew Drift in Distributed Token-Bucket Implementations

2025

Internal Technical Report

Token-bucket rate limiters that compute refill amounts using client-side timestamps accumulate systematic drift when hosts have clock skew. This paper quantifies the drift under realistic NTP conditions and proposes using authoritative server-side timestamps — specifically Redis server time via Lua scripts — to eliminate client clock dependence entirely. We compare bucket accuracy across five implementations under 50ms and 200ms of injected skew.

Rate LimitingDistributed SystemsRedisAlgorithms

About

Former IC / lead at B2B platforms. I work end-to-end: interaction models, design systems, perf profiling.

My work spans event-driven architectures, multi-tenant SaaS products, and AI-powered systems. I care about systems that scale cleanly, fail gracefully, and are a pleasure for teams to operate.

I've worked across the full lifecycle of production systems — initial design through to observability and incident response.

React + TypeScript (SSR, hydration budgets) • Systems design for client data flows • Design systems governance.

Stack

Languages & Runtimes

Node.js + TypeScript (primary), Python for ML pipelines, Go for high-throughput services.

Frontend

React + TypeScript (SSR, hydration budgets), Next.js, design systems governance.

Databases

PostgreSQL as default, Redis for ephemeral state and rate-limiting, vector DBs (Pinecone, Weaviate) for similarity search.

Infrastructure

Docker-first local dev, Kubernetes for orchestration, AWS / GCP for managed services.

AI / ML

OpenAI API integration, LangChain for RAG orchestration, embedding pipelines, hallucination mitigation patterns.

Queues & Observability

BullMQ + RabbitMQ for async work, Kafka for streaming. Prometheus + Grafana for metrics; OpenTelemetry traces.

Contact

Let's talk.

Building something in backend infrastructure, platform engineering, or AI systems? I'd like to hear about it.

ahmedwaregh@gmail.com

I respond within a few days.