# Nazarii Melnychuk – Lead Data & Applied AI Engineer > Lead Data Engineer and Applied AI Engineer specializing in data-platform modernization, lakehouse architecture, and production agentic systems. Data Lead for a team of 6 engineers (grown from 2 through hands-on hiring) and lead of a 9-person cross-functional delivery team. A generalist who works primarily in Python and reads Scala, Go, and TypeScript, focused on architecture, performance, business correctness, and design elegance. This file contains the complete CV published at https://nmeln.com, provided in Markdown for automated and LLM consumption. The CV is a single page; everything on it is reproduced below, so no other pages need to be fetched. ## Contact & links - Website: https://nmeln.com - LinkedIn: https://www.linkedin.com/in/nazarii-melnychuk-84581111b - GitHub: https://github.com/nmeln - Contacts (Linktree): https://linktr.ee/nmeln - JSON Resume: https://nmeln.com/resume.json ## Major Points - Lead Data Engineer and Applied AI Engineer specializing in data-platform modernization, lakehouse architecture, and production agentic (LLM) systems - 10 years of experience across distributed systems, large-scale data pipelines, and applied AI - Directly managed 6 data engineers, including hiring, onboarding, regular 1:1s, development planning, performance feedback, and workload allocation; grew the team from 2 to 6 - Also lead a 9-person cross-functional delivery team, keeping technical direction aligned with business goals - Experienced in roadmap ownership, capacity and dependency management, automated data validation, root-cause analysis, privacy-sensitive processing, and production AI systems - Strong technical partner to ML engineers and data scientists, with a current focus on experiment tracking, data lineage/registries, data quality, and synthetic data - Generalist engineer: primary language **Python**; reads Scala, Go, and TypeScript - Driven by architecture, performance, business correctness, and design elegance ## Specialized in - Data-platform modernization and **lakehouse** architecture (**Databricks**, **Delta Lake**, medallion pipelines) - Large-scale batch and streaming data processing with **Apache Spark** - Production applied-AI and agentic systems: **LLM** services, agent frameworks, **MCP** tools, embeddings, and vector search - Scalable pipelines and services in **Python** on **AWS**, with a focus on performance and business correctness ## Leadership and mentoring - Serve as Data Lead for 6 data engineers, growing the team from 2 to 6 across sourcing, interviewing, technical assessment, hiring, onboarding, and development - Lead a 9-person cross-functional delivery team (including 4 data engineers), coordinating technical direction, priorities, dependencies, and execution - Own a multi-quarter data-platform roadmap, translating business goals into architecture decisions, implementation plans, and launch criteria - Navigate complex organizational change and shifting priorities while preserving delivery reliability and measurable business outcomes - Technical interviewer; previously ran Scala bootcamp lectures and internal workshops to upskill mid-to-senior engineers ## Industries - Cloud communications (WhatsApp, RCS, SMS) - Marketing/Adtech (user engagement data, ad analytics) - Healthcare data processing and HIPAA compliance - IoT monitoring for food/beverage companies - Energy sector digital transformation ## Achievements ### Applied AI & automation - Built and launched an LLM-powered natural-language data discovery service from zero to production (FastAPI, agent frameworks, MCP, embeddings, and vector search). - Designed and productionized a Claude-powered autonomous incident-triage agent with permission-gated, read-only investigations and an auditable activity trail. - Automated data-team and sprint/planning processes as reusable agent skills, codifying runbooks into workflows with confirmation steps and invariant checks. ### Communications project - Developed in-house services replacing costly external APIs, generating $100k-$250k customer savings - Optimized message routing and billing pipeline during Kinesis-to-Kafka migration - Achieved improved throughput (830+ MPS) by eliminating processing redundancies - Implemented comprehensive cross-stream validation to ensure data consistency between source systems during this critical transition ### Healthcare project - Transformed legacy Python batch jobs into scalable Apache Spark workflows, successfully migrating complex healthcare data processing logic - Contributed significant system architecture decisions across multiple projects to build production-grade ETL pipelines handling diverse healthcare data formats ## Skills - Languages: Python (primary), Scala, Go, TypeScript - Data & Lakehouse: Databricks, Delta Lake, Apache Spark, Apache Kafka, Airflow - Applied AI: LLMs, Claude / Claude Code, MCP, FastMCP, Embeddings, Vector search, FastAPI - Agent SDKs: OpenAI Agents SDK, Claude Agent SDK, Inngest AgentKit - Data Warehousing & Analytics: Amazon Redshift, BigQuery, Amazon Athena, ElasticSearch, pandas - Databases: PostgreSQL, Amazon RDS, ElastiCache: Redis / Valkey - Cloud & Infrastructure: AWS (EMR, Lambda, SQS, Aurora), Docker, K8S, Datadog ## Experience ### Data Platform Lead, Lakehouse & AI (Jan 2025 – Present) **Customer:** US adtech company Lead the data platform and applied-AI initiatives for a US adtech company, owning lakehouse modernization, production reliability, and agentic AI systems while leading data engineering delivery. Responsibilities: - Spearheaded the end-to-end migration of a business-critical analytics platform from a legacy cloud warehouse and third-party analytical database to a Databricks Lakehouse, establishing a unified serving layer and removing vendor dependencies from the critical data path. - Architected medallion-style data pipelines using Spark and Delta Lake: incremental ingestion, snapshot-consistent merges, granular fact models, pre-aggregations, and probabilistic counting for large-scale analytical workloads. - Led controlled cutovers across data pipelines, backend APIs, frontend clients, and downstream services using automated source-versus-target parity testing and stable interface contracts; retired obsolete infrastructure and removed thousands of lines of legacy code. - Delivered substantial efficiency improvements: 4–5× faster production processing, 100×+ acceleration in validated analytical prototypes, order-of-magnitude corrections to sizing logic, and recurring cloud-cost reductions. - Restored correctness in complex measurement and attribution pipelines by resolving temporal-window, slowly-changing-dimension, canonicalization, deduplication, status-management, and aggregation defects. - Led root-cause analysis and remediation of high-impact production incidents spanning schema inference, Unicode normalization, database connection lifecycle, distributed storage configuration, cache semantics, and accidental Cartesian joins. - Built and launched an LLM-powered natural-language data discovery service from zero to production using FastAPI, agent frameworks, MCP tools, embeddings, and vector search over tens of thousands of domain attributes. - Designed and productionized a Claude-powered autonomous incident-triage agent that performs permission-gated, read-only investigations, maintains an auditable activity trail, posts structured incident summaries, and opens draft documentation pull requests. - Led the architecture (not implementation) of a resilient, self-learning agent-based LLM parser for adtech postlogs: scoped the stack, intermediate-result storage, off-the-shelf vs hosted OCR / bounding-box models (e.g., Datalab), clear responsibility separation from other agents in the flow, and an assessment of current SOTA approaches. - Hardened agentic systems for production with structured-output validation, permission boundaries, bounded concurrency, token controls, observability, secrets management, container orchestration, and Git-based episodic memory. - Applied Claude and Claude Code as engineering force multipliers for repository-scale analysis, migration planning, debugging, implementation, and documentation, backed by automated tests, data-parity checks, and human review. - Automated several data-team and sprint/planning processes as reusable agent skills, converting runbooks into codified workflows with built-in confirmation steps and invariant checks. ### Messaging Project (Oct 2021 – Dec 2024) **Customer:** US SaaS company Developed core components of omnichannel messaging platform, enabling enterprise customers to reach users across WhatsApp Business, RCS, and SMS channels through unified APIs. Responsibilities: - Implemented key parts of sender registration system supporting multiple messaging channels (WhatsApp, RCS and potentially others), handling provider-specific requirements and compliance rules. - Optimized message delivery latency and reliability while maintaining complex business rules across different OTT providers. - Resolved critical customer escalations through deep technical investigation and edge case analysis. - Worked on updating billing logic on a critical path, ensuring accurate and timely billing for customers. - Contributed to cross-team Scala, Golang and Java projects. ### Marketing Project (Mar 2021 – Oct 2021) **Customer:** US technology company Developed and optimized Apache Spark pipelines processing cross-service engagement data with strict privacy preservation requirements across multiple digital entertainment and subscription platforms. Responsibilities: - Engineered high-performance Spark jobs processing TB-scale user engagement data. - Built privacy-preserving data aggregation pipelines enabling anonymous cross-service analytics. - Optimized data processing pipelines reducing job completion times while ensuring data minimization principles. - Documented dataset lineage and data flow for compliance and reproducibility. ### Healthcare Project (May 2020 – Mar 2021) **Customer:** US healthcare technology company Played key role in modernizing healthcare data processing platform, enabling efficient transformation of diverse medical records into analytics-ready formats for business intelligence. Responsibilities: - Transformed legacy Python batch jobs into scalable Apache Spark workflows, migrating complex healthcare data processing logic. - Designed and implemented production-grade ETL pipelines handling diverse healthcare data formats from multiple source systems. - Optimized large-scale data reprocessing jobs reducing execution time while ensuring HIPAA compliance. - Developed automated AWS S3 to Redshift data pipeline using PySpark and boto3, enabling real-time BI reporting. - Contributed significant technical input to architecture decisions affecting multiple project initiatives. ### Adtech Project (Oct 2019 – May 2020) **Customer:** Israel adtech company Developed real-time ad analytics platform processing high-volume impression data (670+ MPS) to enable automated bidding decisions and campaign optimization. Responsibilities: - Implemented Spark Structured Streaming jobs for real-time ad performance analysis - Optimized data processing architecture reducing operational costs while maintaining system reliability - Modernized legacy Python ETL pipelines to improve maintainability and processing efficiency ### Communications Project (Feb 2018 – Oct 2019) **Customer:** US SaaS company Developed high-throughput streaming applications for SMS data processing, implementing complex rate calculation logic and generating business intelligence insights. Responsibilities: - Optimized messaging metadata enrichment pipeline during Kinesis-to-Kafka migration, achieving 830+ MPS through elimination of processing redundancies. - Implemented comprehensive cross-stream validation to ensure data consistency between source systems during the transition. - Built cost-effective internal services replacing external API dependencies, resulting in significant operational savings - Led Scala knowledge-sharing initiatives including mentoring sessions and technical workshops ### Digital Transformation Project (Jun 2017 – Dec 2017) **Customer:** US energy company Contributed to enterprise-wide data modernization initiative, migrating traditional database workloads to cloud-based big data processing platform. Responsibilities: - Replaced legacy Oracle and MySQL batch processes with scalable Apache Spark pipelines - Built new data processing workflows using Spark SQL and Hive, eliminating dependency on legacy SQL jobs ### IoT Monitoring Platform (Nov 2015 – Jun 2017) **Customers:** European food and beverage companies Developed real-time monitoring system processing data from IoT sensors across warehouse and retail locations, enabling predictive maintenance and inventory optimization. Responsibilities: - Implemented scalable data processing pipelines using Apache Spark and Akka Streams to handle real-time sensor data - Built diagnostic tools enabling QA and hardware teams to validate device performance and data accuracy - Developed anomaly detection system for early identification of hardware issues ## Education **MSc Degree, Computer Science**, Lviv Polytechnic National University, Ukraine (2016 – 2017) ## Other Professional Development ### Natural Language Processing & AI (2023) - Contributed Ukrainian localization and dataset improvements to OpenAssistant/oasst1 open-source project - Built practical applications using transformer models and HuggingFace libraries - Explored large language models and their enterprise applications ### Deep Learning & Time Series Analysis (2022) - Completed fast.ai's Practical Deep Learning for Coders course - Implemented time-series forecasting solutions using Facebook's Prophet - Applied deep learning techniques to real-world data problems ## Personal Projects I enjoy designing practical, custom-built apps with Claude and OpenAI Codex: - A real-time FTMS logging and workout tracker for my air bike, with heart-rate and cadence support and data export. - A high-performance 180° and 360° panorama and stereo-image viewer for the Meta Quest VR headset, with mipmap support and low latency. - Automation of personal workflows with a Nous Hermes agent, through custom-built skills and purpose-built mini-apps. ## Languages **English** - Advanced