Data Platform Lead, Lakehouse & AI
Jan 2025 – Present Customer: US adtech company
Lead the data platform and applied-AI initiatives for a US adtech company, owning lakehouse modernization, production reliability, and agentic AI systems while leading data engineering delivery.
Responsibilities:
- Spearheaded the end-to-end migration of a business-critical analytics platform from a legacy cloud warehouse and third-party analytical database to a Databricks Lakehouse, establishing a unified serving layer and removing vendor dependencies from the critical data path.
- Architected medallion-style data pipelines using Spark and Delta Lake: incremental ingestion, snapshot-consistent merges, granular fact models, pre-aggregations, and probabilistic counting for large-scale analytical workloads.
- Led controlled cutovers across data pipelines, backend APIs, frontend clients, and downstream services using automated source-versus-target parity testing and stable interface contracts; retired obsolete infrastructure and removed thousands of lines of legacy code.
- Delivered substantial efficiency improvements: 4–5× faster production processing, 100×+ acceleration in validated analytical prototypes, order-of-magnitude corrections to sizing logic, and recurring cloud-cost reductions.
- Restored correctness in complex measurement and attribution pipelines by resolving temporal-window, slowly-changing-dimension, canonicalization, deduplication, status-management, and aggregation defects.
- Led root-cause analysis and remediation of high-impact production incidents spanning schema inference, Unicode normalization, database connection lifecycle, distributed storage configuration, cache semantics, and accidental Cartesian joins.
- Built and launched an LLM-powered natural-language data discovery service from zero to production using FastAPI, agent frameworks, MCP tools, embeddings, and vector search over tens of thousands of domain attributes.
- Designed and productionized a Claude-powered autonomous incident-triage agent that performs permission-gated, read-only investigations, maintains an auditable activity trail, posts structured incident summaries, and opens draft documentation pull requests.
- Led the architecture (not implementation) of a resilient, self-learning agent-based LLM parser for adtech postlogs: scoped the stack, intermediate-result storage, off-the-shelf vs hosted OCR / bounding-box models (e.g., Datalab), clear responsibility separation from other agents in the flow, and an assessment of current SOTA approaches.
- Hardened agentic systems for production with structured-output validation, permission boundaries, bounded concurrency, token controls, observability, secrets management, container orchestration, and Git-based episodic memory.
- Applied Claude and Claude Code as engineering force multipliers for repository-scale analysis, migration planning, debugging, implementation, and documentation, backed by automated tests, data-parity checks, and human review.
- Automated several data-team and sprint/planning processes as reusable agent skills, converting runbooks into codified workflows with built-in confirmation steps and invariant checks.
Messaging Project
Oct 2021 – Dec 2024 Customer: US SaaS company
Developed core components of omnichannel messaging platform, enabling enterprise customers to reach users across WhatsApp Business, RCS, and SMS channels through unified APIs.
Responsibilities:
- Implemented key parts of sender registration system supporting multiple messaging channels (WhatsApp, RCS and potentially others), handling provider-specific requirements and compliance rules.
- Optimized message delivery latency and reliability while maintaining complex business rules across different OTT providers.
- Resolved critical customer escalations through deep technical investigation and edge case analysis.
- Worked on updating billing logic on a critical path, ensuring accurate and timely billing for customers.
- Contributed to cross-team Scala, Golang and Java projects.
Marketing Project
Mar 2021 – Oct 2021 Customer: US technology company
Developed and optimized Apache Spark pipelines processing cross-service engagement data with strict privacy preservation requirements across multiple digital entertainment and subscription platforms.
Responsibilities:
- Engineered high-performance Spark jobs processing TB-scale user engagement data.
- Built privacy-preserving data aggregation pipelines enabling anonymous cross-service analytics.
- Optimized data processing pipelines reducing job completion times while ensuring data minimization principles.
- Documented dataset lineage and data flow for compliance and reproducibility.
Healthcare Project
May 2020 – Mar 2021 Customer: US healthcare technology company
Played key role in modernizing healthcare data processing platform, enabling efficient transformation of diverse medical records into analytics-ready formats for business intelligence.
Responsibilities:
- Transformed legacy Python batch jobs into scalable Apache Spark workflows, migrating complex healthcare data processing logic.
- Designed and implemented production-grade ETL pipelines handling diverse healthcare data formats from multiple source systems.
- Optimized large-scale data reprocessing jobs reducing execution time while ensuring HIPAA compliance.
- Developed automated AWS S3 to Redshift data pipeline using PySpark and boto3, enabling real-time BI reporting.
- Contributed significant technical input to architecture decisions affecting multiple project initiatives.
Adtech Project
Oct 2019 – May 2020 Customer: Israel adtech company
Developed real-time ad analytics platform processing high-volume impression data (670+ MPS) to enable automated bidding decisions and campaign optimization.
Responsibilities:
- Implemented Spark Structured Streaming jobs for real-time ad performance analysis
- Optimized data processing architecture reducing operational costs while maintaining system reliability
- Modernized legacy Python ETL pipelines to improve maintainability and processing efficiency
Communications Project
Feb 2018 – Oct 2019 Customer: US SaaS company
Developed high-throughput streaming applications for SMS data processing, implementing complex rate calculation logic and generating business intelligence insights.
Responsibilities:
- Optimized messaging metadata enrichment pipeline during Kinesis-to-Kafka migration, achieving 830+ MPS through elimination of processing redundancies.
- Implemented comprehensive cross-stream validation to ensure data consistency between source systems during the transition.
- Built cost-effective internal services replacing external API dependencies, resulting in significant operational savings
- Led Scala knowledge-sharing initiatives including mentoring sessions and technical workshops
Digital Transformation Project
Jun 2017 – Dec 2017 Customer: US energy company
Contributed to enterprise-wide data modernization initiative, migrating traditional database workloads to cloud-based big data processing platform.
Responsibilities:
- Replaced legacy Oracle and MySQL batch processes with scalable Apache Spark pipelines
- Built new data processing workflows using Spark SQL and Hive, eliminating dependency on legacy SQL jobs
IoT Monitoring Platform
Nov 2015 – Jun 2017 Customers: European food and beverage companies
Developed real-time monitoring system processing data from IoT sensors across warehouse and retail locations, enabling predictive maintenance and inventory optimization.
Responsibilities:
- Implemented scalable data processing pipelines using Apache Spark and Akka Streams to handle real-time sensor data
- Built diagnostic tools enabling QA and hardware teams to validate device performance and data accuracy
- Developed anomaly detection system for early identification of hardware issues