Things I'm building

Open source work, side projects, and infrastructure experiments.

Active

● Active

Kafka → Iceberg Streaming Pipeline

Real-time event streaming from Kafka topics into Apache Iceberg tables using Flink, with automatic compaction and schema evolution.

kafkaapache-icebergflinkkubernetes
● Active GitHub ↗

Data Platform Modernisation

Migrating our batch pipelines to streaming with Kafka and dbt.

kafkadbtdata-engineering
● Active

MLflow on Kubernetes

Self-hosted MLflow tracking server on EKS with S3 artifact storage, PostgreSQL backend, and Keycloak authentication. Helm-based deployment.

mlflowkubernetesawsml-platform

Shipped

● Shipped

Cloudera CDP Health Monitor

Internal Grafana dashboard stack for Cloudera CDP cluster health — HDFS, YARN, Impala, and Kafka metrics unified in one view with alerting.

clouderagrafanaprometheusobservability