Nazarii Melnychuk – Lead Data & Applied AI Engineer

Major Points

Lead Data Engineer and Applied AI Engineer specializing in data-platform modernization, lakehouse architecture, and production agentic (LLM) systems
10 years of experience across distributed systems, large-scale data pipelines, and applied AI
Data Lead for 6 data engineers; grew the team from 2 to 6 through hands-on hiring, onboarding, and development
Also lead a 9-person cross-functional delivery team, keeping technical direction aligned with business goals
Generalist engineer: primary language Python; reads Scala, Go, and TypeScript
Driven by architecture, performance, business correctness, and design elegance

Specialized in

Data-platform modernization and lakehouse architecture (Databricks, Delta Lake, medallion pipelines)
Large-scale batch and streaming data processing with Apache Spark
Production applied-AI and agentic systems: LLM services, agent frameworks, MCP tools, embeddings, and vector search
Scalable pipelines and services in Python on AWS, with a focus on performance and business correctness

Leadership and mentoring

Serve as Data Lead for 6 data engineers, growing the team from 2 to 6 across sourcing, interviewing, technical assessment, hiring, onboarding, and development
Lead a 9-person cross-functional delivery team (including 4 data engineers), coordinating technical direction, priorities, dependencies, and execution
Own a multi-quarter data-platform roadmap, translating business goals into architecture decisions, implementation plans, and launch criteria
Navigate complex organizational change and shifting priorities while preserving delivery reliability and measurable business outcomes
Technical interviewer; previously ran Scala bootcamp lectures and internal workshops to upskill mid-to-senior engineers

Industries

Cloud communications (WhatsApp, RCS, SMS)
Marketing/Adtech (user engagement data, ad analytics)
Healthcare data processing and HIPAA compliance
IoT monitoring for food/beverage companies
Energy sector digital transformation

Achievements

Applied AI & automation

Built and launched an LLM-powered natural-language data discovery service from zero to production (FastAPI, agent frameworks, MCP, embeddings, and vector search).
Designed and productionized a Claude-powered autonomous incident-triage agent with permission-gated, read-only investigations and an auditable activity trail.
Automated data-team and sprint/planning processes as reusable agent skills, codifying runbooks into workflows with confirmation steps and invariant checks.

Communications project

Developed in-house services replacing costly external APIs, generating $100k-$250k customer savings
Optimized message routing and billing pipeline during Kinesis-to-Kafka migration
Achieved improved throughput (830+ MPS) by eliminating processing redundancies
Implemented comprehensive cross-stream validation to ensure data consistency between source systems during this critical transition

Healthcare project

Transformed legacy Python batch jobs into scalable Apache Spark workflows, successfully migrating complex healthcare data processing logic
Contributed significant system architecture decisions across multiple projects to build production-grade ETL pipelines handling diverse healthcare data formats

Skills

Languages

Python (primary)ScalaGoTypeScript

Data & Lakehouse

DatabricksDelta LakeApache SparkApache KafkaAirflow

Applied AI

LLMsClaude / Claude CodeMCPFastMCPEmbeddingsVector searchFastAPI

Agent SDKs

OpenAI Agents SDKClaude Agent SDKInngest AgentKit

Data Warehousing & Analytics

Amazon RedshiftBigQueryAmazon AthenaElasticSearchpandas

Databases

PostgreSQLAmazon RDSElastiCache: Redis / Valkey

Cloud & Infrastructure

AWS (EMR, Lambda, SQS, Aurora)DockerK8SDatadog

Experience

Data Platform Lead, Lakehouse & AI

Jan 2025 – Present

Customer: US adtech company

Lead the data platform and applied-AI initiatives for a US adtech company, owning lakehouse modernization, production reliability, and agentic AI systems while leading data engineering delivery.

Responsibilities:

Spearheaded the end-to-end migration of a business-critical analytics platform from a legacy cloud warehouse and third-party analytical database to a Databricks Lakehouse, establishing a unified serving layer and removing vendor dependencies from the critical data path.
Architected medallion-style data pipelines using Spark and Delta Lake: incremental ingestion, snapshot-consistent merges, granular fact models, pre-aggregations, and probabilistic counting for large-scale analytical workloads.
Led controlled cutovers across data pipelines, backend APIs, frontend clients, and downstream services using automated source-versus-target parity testing and stable interface contracts; retired obsolete infrastructure and removed thousands of lines of legacy code.
Delivered substantial efficiency improvements: 4–5× faster production processing, 100×+ acceleration in validated analytical prototypes, order-of-magnitude corrections to sizing logic, and recurring cloud-cost reductions.
Restored correctness in complex measurement and attribution pipelines by resolving temporal-window, slowly-changing-dimension, canonicalization, deduplication, status-management, and aggregation defects.
Led root-cause analysis and remediation of high-impact production incidents spanning schema inference, Unicode normalization, database connection lifecycle, distributed storage configuration, cache semantics, and accidental Cartesian joins.
Built and launched an LLM-powered natural-language data discovery service from zero to production using FastAPI, agent frameworks, MCP tools, embeddings, and vector search over tens of thousands of domain attributes.
Designed and productionized a Claude-powered autonomous incident-triage agent that performs permission-gated, read-only investigations, maintains an auditable activity trail, posts structured incident summaries, and opens draft documentation pull requests.
Led the architecture (not implementation) of a resilient, self-learning agent-based LLM parser for adtech postlogs: scoped the stack, intermediate-result storage, off-the-shelf vs hosted OCR / bounding-box models (e.g., Datalab), clear responsibility separation from other agents in the flow, and an assessment of current SOTA approaches.
Hardened agentic systems for production with structured-output validation, permission boundaries, bounded concurrency, token controls, observability, secrets management, container orchestration, and Git-based episodic memory.
Applied Claude and Claude Code as engineering force multipliers for repository-scale analysis, migration planning, debugging, implementation, and documentation, backed by automated tests, data-parity checks, and human review.
Automated several data-team and sprint/planning processes as reusable agent skills, converting runbooks into codified workflows with built-in confirmation steps and invariant checks.

Messaging Project

Oct 2021 – Dec 2024

Customer: US SaaS company

Developed core components of omnichannel messaging platform, enabling enterprise customers to reach users across WhatsApp Business, RCS, and SMS channels through unified APIs.

Responsibilities:

Implemented key parts of sender registration system supporting multiple messaging channels (WhatsApp, RCS and potentially others), handling provider-specific requirements and compliance rules.
Optimized message delivery latency and reliability while maintaining complex business rules across different OTT providers.
Resolved critical customer escalations through deep technical investigation and edge case analysis.
Worked on updating billing logic on a critical path, ensuring accurate and timely billing for customers.
Contributed to cross-team Scala, Golang and Java projects.

Marketing Project

Mar 2021 – Oct 2021

Customer: US technology company

Developed and optimized Apache Spark pipelines processing cross-service engagement data with strict privacy preservation requirements across multiple digital entertainment and subscription platforms.

Responsibilities:

Engineered high-performance Spark jobs processing TB-scale user engagement data.
Built privacy-preserving data aggregation pipelines enabling anonymous cross-service analytics.
Optimized data processing pipelines reducing job completion times while ensuring data minimization principles.
Documented dataset lineage and data flow for compliance and reproducibility.

Healthcare Project

May 2020 – Mar 2021

Customer: US healthcare technology company

Played key role in modernizing healthcare data processing platform, enabling efficient transformation of diverse medical records into analytics-ready formats for business intelligence.

Responsibilities:

Transformed legacy Python batch jobs into scalable Apache Spark workflows, migrating complex healthcare data processing logic.
Designed and implemented production-grade ETL pipelines handling diverse healthcare data formats from multiple source systems.
Optimized large-scale data reprocessing jobs reducing execution time while ensuring HIPAA compliance.
Developed automated AWS S3 to Redshift data pipeline using PySpark and boto3, enabling real-time BI reporting.
Contributed significant technical input to architecture decisions affecting multiple project initiatives.

Adtech Project

Oct 2019 – May 2020

Customer: Israel adtech company

Developed real-time ad analytics platform processing high-volume impression data (670+ MPS) to enable automated bidding decisions and campaign optimization.

Responsibilities:

Implemented Spark Structured Streaming jobs for real-time ad performance analysis
Optimized data processing architecture reducing operational costs while maintaining system reliability
Modernized legacy Python ETL pipelines to improve maintainability and processing efficiency

Communications Project

Feb 2018 – Oct 2019

Customer: US SaaS company

Developed high-throughput streaming applications for SMS data processing, implementing complex rate calculation logic and generating business intelligence insights.

Responsibilities:

Optimized messaging metadata enrichment pipeline during Kinesis-to-Kafka migration, achieving 830+ MPS through elimination of processing redundancies.
Implemented comprehensive cross-stream validation to ensure data consistency between source systems during the transition.
Built cost-effective internal services replacing external API dependencies, resulting in significant operational savings
Led Scala knowledge-sharing initiatives including mentoring sessions and technical workshops

Digital Transformation Project

Jun 2017 – Dec 2017

Customer: US energy company

Contributed to enterprise-wide data modernization initiative, migrating traditional database workloads to cloud-based big data processing platform.

Responsibilities:

Replaced legacy Oracle and MySQL batch processes with scalable Apache Spark pipelines
Built new data processing workflows using Spark SQL and Hive, eliminating dependency on legacy SQL jobs

IoT Monitoring Platform

Nov 2015 – Jun 2017

Customers: European food and beverage companies

Developed real-time monitoring system processing data from IoT sensors across warehouse and retail locations, enabling predictive maintenance and inventory optimization.

Responsibilities:

Implemented scalable data processing pipelines using Apache Spark and Akka Streams to handle real-time sensor data
Built diagnostic tools enabling QA and hardware teams to validate device performance and data accuracy
Developed anomaly detection system for early identification of hardware issues

Education

MSc Degree, Computer Science, Lviv Polytechnic National University, Ukraine

2016 – 2017

Other Professional Development

Natural Language Processing & AI

2023

Contributed Ukrainian localization and dataset improvements to OpenAssistant/oasst1 open-source project
Built practical applications using transformer models and HuggingFace libraries
Explored large language models and their enterprise applications

Deep Learning & Time Series Analysis

2022

Completed fast.ai's Practical Deep Learning for Coders course
Implemented time-series forecasting solutions using Facebook's Prophet
Applied deep learning techniques to real-world data problems

Personal Projects

I enjoy designing practical, custom-built apps with Claude and OpenAI Codex:

A real-time FTMS logging and workout tracker for my air bike, with heart-rate and cadence support and data export.
A high-performance 180° and 360° panorama and stereo-image viewer for the Meta Quest VR headset, with mipmap support and low latency.
Automation of personal workflows with a Nous Hermes agent, through custom-built skills and purpose-built mini-apps.

Languages

English - Advanced