Hi, I'm Ahmed

I design and ship
backends for
complex systems.

Software Engineer — Backend · Platform · AI Systems.

I prototype in code; I care about latency budgets, reliability, and clean system design.

reconcile.ts

// reconciliation view — event-sourced cache
async function reconcile(accountId: string) {
  const events = await eventLog.since(
    lastCheckpoint(accountId)
  );
  return events.reduce(applyEvent, currentState);
}

// result: p95 render -180ms, tickets -42%

Experience

5+ years building production web systems

Specializes in

Frontend systems • Design systems • Distributed architecture • AI product UX

Open to

Senior Frontend • Staff Engineer • Platform Engineer roles

Pinnedgithub.com/awaregh

fraud-detection

Production-grade ML pipeline for real-time financial transaction fraud detection with model training, drift monitoring, and scoring API.

Python

ai-workflow-platform

Multi-tenant SaaS orchestrating AI workflows and integrations at scale.

TypeScript

saas-website-builder

Backend platform powering dynamic site generation and multi-tenant publishing.

TypeScript

ai-customer-support

Conversational AI backend with RAG pipeline and vector search.

Python

real-time-data-pipeline

High-throughput streaming pipeline for ingesting, transforming, and routing millions of events per second.

distributed-rate-limiter

Token-bucket and sliding-window rate limiting as a standalone service, supporting multi-region consistency.

schema-evolution

Research implementation of eight real-world schema evolution scenarios across a three-service microservices architecture, covering PostgreSQL migrations, REST API versioning, and event schema evolution with backward compatibility patterns.

Python

change-data-pipeline

CDC pipeline that streams database changes into an event log, supports consumers, replay, and schema evolution with a demo consumer that builds projections.

Python

retrieval-experiment-platform

A tool for testing and evaluating RAG retrieval pipelines by comparing chunking strategies, embedding models, and reranking methods using metrics like Precision@K and nDCG.

Python

designing-idempotent-apis

Production-quality research system comparing six idempotency strategies for a payments API domain, built with FastAPI, PostgreSQL, Redis, and RabbitMQ.

Python

hallucination-mitigation

Production-grade research system for evaluating, benchmarking, and mitigating hallucinations in enterprise LLM applications with multiple RAG variants and guardrail frameworks.

Python

config-service

Centralized configuration service used across internal systems for managing application settings and feature flags.

TypeScript

failure-recovery-patterns

Platform simulating microservice failures to evaluate retries, circuit breakers, bulkheads, and idempotency. Measures reliability, latency, and duplicate prevention to guide resilient system design.

Python

iac-maintainability-study

Comprehensive empirical study examining how structural design decisions in Terraform infrastructure-as-code affect long-term maintainability, drift susceptibility, and change management complexity.

HCL

llm-gateway

Production-ready unified API gateway for routing requests across multiple LLM providers with built-in rate limiting, response caching, cost tracking, and OpenTelemetry observability.

Python

How I work

Prototype in code, not mockups

I reach for a coded prototype before a finished design. It surfaces real constraints earlier.

Latency is a feature

I track render budgets and hydration cost. Perf profiling is part of the design review, not a post-launch task.

Write for the next engineer

Readable code, documented trade-offs, decision logs. Systems outlive their authors.

Projects

Fraud Detection System

ProblemTransaction scoring latency was too high for real-time decisions; model drift was silent.

ApproachBuilt a streaming scoring API with LightGBM, added drift monitoring using PSI, and scheduled retraining on detected drift.

ResultScoring p95 under 12ms. Drift caught 2 weeks before accuracy would have degraded measurably.

ML pipelinereal-time scoringdrift monitoringcost optimization

PythonLightGBMFastAPIscikit-learnDocker

Case study Live demo Source

AI Workflow Automation Platform

ProblemWorkflow steps were tightly coupled; one failure cascaded into lost jobs with no recovery path.

ApproachIntroduced a state-machine model per workflow run, persisted to Postgres, with BullMQ workers pulling from a durable queue. Idempotent step handlers allow safe retry.

ResultJob failure rate dropped from ~4% to under 0.1%. Recovery from worker crashes became automatic.

event-drivendistributed workersmulti-tenantstate machine

Node.jsPostgreSQLPrismaBullMQOpenAI

Case study Live demo Source

SaaS Website Builder Infrastructure

ProblemSite builds were synchronous and blocking; concurrent publishes caused database contention.

ApproachMoved builds to async workers with S3 artifact storage and a CDN invalidation step. Added per-tenant build queues to prevent noisy-neighbour problems.

ResultMedian build time fell from 8s to 1.4s. P99 dropped from 40s to under 6s.

versioned renderingbuild workersstorage pipelinemulti-tenant

Node.jsPostgreSQLS3CDNDocker

Case study Live demo Source

AI Customer Support Platform

ProblemLLM responses cited wrong sources; hallucinated product details caused support escalations.

ApproachBuilt a RAG pipeline with citation enforcement: each answer must include a retrieved source chunk. Added a self-critique pass to flag low-confidence answers for human review.

ResultHallucination rate (as measured by automated fact-check) fell by 68%. Escalation rate down 31%.

RAG pipelinevector searchingestion pipelineconversation orchestration

PythonPostgreSQLPineconeOpenAIFastAPI

Case study Live demo Source

Real-Time Data Processing Pipeline

High-throughput streaming pipeline for ingesting, transforming, and routing millions of events per second.

stream processingexactly-once deliveryschema registrybackpressure control

GoKafkaClickHouseKubernetesgRPC

Case study Live demo Source

Distributed Rate Limiter Service

Token-bucket and sliding-window rate limiting as a standalone service, supporting multi-region consistency.

token bucketsliding windowmulti-region syncsidecar-ready

GoRedisgRPCPrometheusDocker

Case study Live demo Source

Schema Evolution in Long-Lived Systems

backward compatibilityAPI versioningevent schema evolutiondatabase migrations

PythonPostgreSQLFastAPIKafkaDocker

Case study Live demo Source

Change Data Capture Pipeline

CDC pipeline that streams database changes into an event log, supports consumers, replay, and schema evolution with a demo consumer that builds projections.

CDCevent sourcingstream processingschema evolution

PythonPostgreSQLKafkaDocker

Case study Live demo Source

Retrieval Experiment Platform

A tool for testing and evaluating RAG retrieval pipelines by comparing chunking strategies, embedding models, and reranking methods using metrics like Precision@K and nDCG.

retrieval evaluationchunking strategiesembedding comparisonreranking

PythonRAGEmbeddingsNLP

Case study Live demo Source

Designing Idempotent APIs at Scale

Production-quality research system comparing six idempotency strategies for a payments API domain, built with FastAPI, PostgreSQL, Redis, and RabbitMQ.

idempotency patternsdistributed systemssaga patternoutbox pattern

PythonFastAPIPostgreSQLRedisRabbitMQDocker

Case study Live demo Source

Hallucination Mitigation in Enterprise LLM Apps

Production-grade research system for evaluating, benchmarking, and mitigating hallucinations in enterprise LLM applications with multiple RAG variants and guardrail frameworks.

RAG pipelineguardrailscitation enforcementself-critique

PythonRAGLLMNLPpytest

Case study Live demo Source

Config Service

Centralized configuration service used across internal systems for managing application settings and feature flags.

configuration managementinternal tooling

TypeScriptNode.js

Case study Live demo Source

Failure Recovery Patterns in Microservices

Platform simulating microservice failures to evaluate retries, circuit breakers, bulkheads, and idempotency. Measures reliability, latency, and duplicate prevention to guide resilient system design.

circuit breakersretry patternsbulkhead isolationoutbox pattern

PythonFastAPIPostgreSQLRedisDockerPrometheusGrafana

Case study Live demo Source

IaC Maintainability Study

Comprehensive empirical study examining how structural design decisions in Terraform infrastructure-as-code affect long-term maintainability, drift susceptibility, and change management complexity.

infrastructure as codedrift detectionmaintainability metricsreference architectures

TerraformHCLAWSPython

Case study Live demo Source

LLM Gateway — AI Infrastructure

Production-ready unified API gateway for routing requests across multiple LLM providers with built-in rate limiting, response caching, cost tracking, and OpenTelemetry observability.

API gatewaymodel routingrate limitingcost tracking

PythonFastAPIRedisPostgreSQLDockerOpenTelemetry

Case study Live demo Source

View all projects

Focus

Event-Driven Architecture

Async events, queues, reliable delivery. Designing systems where out-of-order messages and partial failures are expected.

Distributed Systems

Retries, idempotency, circuit breakers, eventual consistency. Building for failure rather than hoping for success.

Multi-Tenant SaaS

Schema-per-tenant isolation, billing metering, usage tracking, scoped workspaces.

AI Infrastructure

Ingestion pipelines, vector search, RAG architectures, model orchestration. Latency and cost budgets matter.

Platform Engineering

Developer tooling, build systems, CI/CD abstractions, internal platforms that reduce toil.

Production Reliability

Observability, SLOs, alerting, incident response. Systems fail — the question is how they fail.

Writing

Technical papers and deep-dives on systems I've built and problems I've solved in production.

Hierarchical Chunking Strategies for Production RAG Systems: Balancing Retrieval Precision and Context Coherence

2024

Internal Technical Report

Retrieval-Augmented Generation systems degrade in precision as knowledge bases grow. This paper examines chunking strategies — fixed-size, paragraph-level, and hierarchical parent–child — across corpora of varying size and domain density. We introduce a re-ranking layer using cross-encoder models and show it recovers precision lost at scale while remaining compatible with standard vector-search backends. Benchmarks are run against a golden dataset of 2,400 support queries across four enterprise tenants.

RAGLLMVector SearchInformation Retrieval

Multi-Tenant Event Sourcing at Scale: Schema Isolation, Replay Semantics, and Operational Lessons

2024

Internal Technical Report

Event sourcing in multi-tenant SaaS systems introduces tension between tenant isolation and operational simplicity. We describe our experience migrating a 30-tenant workflow platform from a shared event log to a namespace-isolated architecture, covering schema-per-tenant trade-offs, aggregate snapshot strategies to bound replay time, and the tooling required to safely replay tenant event streams without cross-tenant interference.

Event SourcingMulti-TenantDistributed SystemsCQRS

Exactly-Once Delivery in Heterogeneous Sink Pipelines: Lessons from a High-Throughput Kafka Consumer Fleet

2025

Internal Technical Report

Exactly-once semantics in streaming pipelines are well-studied within a single system but become subtle when events must be durably committed to multiple heterogeneous sinks — analytics stores, billing aggregators, and alerting systems — in a single logical transaction. We detail the rebalance-listener pattern, idempotency key design, and per-sink commit protocols that enabled zero duplicate charges across 40M+ daily events on a Kafka-backed pipeline.

Apache KafkaStream ProcessingExactly-Once DeliveryDistributed Systems

Clock-Independent Rate Limiting: Eliminating Skew Drift in Distributed Token-Bucket Implementations

2025

Internal Technical Report

Token-bucket rate limiters that compute refill amounts using client-side timestamps accumulate systematic drift when hosts have clock skew. This paper quantifies the drift under realistic NTP conditions and proposes using authoritative server-side timestamps — specifically Redis server time via Lua scripts — to eliminate client clock dependence entirely. We compare bucket accuracy across five implementations under 50ms and 200ms of injected skew.

Rate LimitingDistributed SystemsRedisAlgorithms

View all papers

About

Former IC / lead at B2B platforms. I work end-to-end: interaction models, design systems, perf profiling.

My work spans event-driven architectures, multi-tenant SaaS products, and AI-powered systems. I care about systems that scale cleanly, fail gracefully, and are a pleasure for teams to operate.

I've worked across the full lifecycle of production systems — initial design through to observability and incident response.

React + TypeScript (SSR, hydration budgets) • Systems design for client data flows • Design systems governance.

Stack

Languages & Runtimes

Node.js + TypeScript (primary), Python for ML pipelines, Go for high-throughput services.

Frontend

React + TypeScript (SSR, hydration budgets), Next.js, design systems governance.

Databases

PostgreSQL as default, Redis for ephemeral state and rate-limiting, vector DBs (Pinecone, Weaviate) for similarity search.

Infrastructure

Docker-first local dev, Kubernetes for orchestration, AWS / GCP for managed services.

AI / ML

OpenAI API integration, LangChain for RAG orchestration, embedding pipelines, hallucination mitigation patterns.

Queues & Observability

BullMQ + RabbitMQ for async work, Kafka for streaming. Prometheus + Grafana for metrics; OpenTelemetry traces.

Contact

Let's talk.

Building something in backend infrastructure, platform engineering, or AI systems? I'd like to hear about it.

ahmedwaregh@gmail.com

I respond within a few days.

GitHub LinkedIn

I design and shipbackends forcomplex systems.

How I work

Projects

Focus

Event-Driven Architecture

Distributed Systems

Multi-Tenant SaaS

AI Infrastructure

Platform Engineering

Production Reliability

Writing

Hierarchical Chunking Strategies for Production RAG Systems: Balancing Retrieval Precision and Context Coherence

Multi-Tenant Event Sourcing at Scale: Schema Isolation, Replay Semantics, and Operational Lessons

Exactly-Once Delivery in Heterogeneous Sink Pipelines: Lessons from a High-Throughput Kafka Consumer Fleet

Clock-Independent Rate Limiting: Eliminating Skew Drift in Distributed Token-Bucket Implementations

About

Stack

Contact

I design and ship
backends for
complex systems.