# Nazarii Melnychuk – Lead Data & Applied AI Engineer

> Lead Data Engineer and Applied AI Engineer specializing in data-platform modernization, lakehouse architecture, and production agentic systems. Data Lead for a team of 6 engineers (grown from 2 through hands-on hiring) and lead of a 9-person cross-functional delivery team. A generalist who works primarily in Python and reads Scala, Go, and TypeScript, focused on architecture, performance, business correctness, and design elegance.

This file contains the complete CV published at https://nmeln.com, provided in Markdown for automated and LLM consumption. The CV is a single page; everything on it is reproduced below, so no other pages need to be fetched.

## Contact & links

- Website: https://nmeln.com
- LinkedIn: https://www.linkedin.com/in/nazarii-melnychuk-84581111b
- GitHub: https://github.com/nmeln
- Contacts (Linktree): https://linktr.ee/nmeln
- JSON Resume: https://nmeln.com/resume.json

## Major Points

- Lead Data Engineer and Applied AI Engineer specializing in data-platform modernization, lakehouse architecture, and production agentic (LLM) systems
- 10 years of experience across distributed systems, large-scale data pipelines, and applied AI
- Directly managed 6 data engineers, including hiring, onboarding, regular 1:1s, development planning, performance feedback, and workload allocation; grew the team from 2 to 6
- Also lead a 9-person cross-functional delivery team, keeping technical direction aligned with business goals
- Experienced in roadmap ownership, capacity and dependency management, automated data validation, root-cause analysis, privacy-sensitive processing, and production AI systems
- Strong technical partner to ML engineers and data scientists, with a current focus on experiment tracking, data lineage/registries, data quality, and synthetic data
- Generalist engineer: primary language **Python**; reads Scala, Go, and TypeScript
- Driven by architecture, performance, business correctness, and design elegance

## Specialized in

- Data-platform modernization and **lakehouse** architecture (**Databricks**, **Delta Lake**, medallion pipelines)
- Large-scale batch and streaming data processing with **Apache Spark**
- Production applied-AI and agentic systems: **LLM** services, agent frameworks, **MCP** tools, embeddings, and vector search
- Scalable pipelines and services in **Python** on **AWS**, with a focus on performance and business correctness

## Leadership and mentoring

- Serve as Data Lead for 6 data engineers, growing the team from 2 to 6 across sourcing, interviewing, technical assessment, hiring, onboarding, and development
- Lead a 9-person cross-functional delivery team (including 4 data engineers), coordinating technical direction, priorities, dependencies, and execution
- Own a multi-quarter data-platform roadmap, translating business goals into architecture decisions, implementation plans, and launch criteria
- Navigate complex organizational change and shifting priorities while preserving delivery reliability and measurable business outcomes
- Technical interviewer; previously ran Scala bootcamp lectures and internal workshops to upskill mid-to-senior engineers

## Industries

- Cloud communications (WhatsApp, RCS, SMS)
- Marketing/Adtech (user engagement data, ad analytics)
- Healthcare data processing and HIPAA compliance
- IoT monitoring for food/beverage companies
- Energy sector digital transformation

## Achievements

### Applied AI & automation

- Built and launched an LLM-powered natural-language data discovery service from zero to production (FastAPI, agent frameworks, MCP, embeddings, and vector search).
- Designed and productionized a Claude-powered autonomous incident-triage agent with permission-gated, read-only investigations and an auditable activity trail.
- Automated data-team and sprint/planning processes as reusable agent skills, codifying runbooks into workflows with confirmation steps and invariant checks.

### Communications project

- Developed in-house services replacing costly external APIs, generating $100k-$250k customer savings
- Optimized message routing and billing pipeline during Kinesis-to-Kafka migration
- Achieved improved throughput (830+ MPS) by eliminating processing redundancies
- Implemented comprehensive cross-stream validation to ensure data consistency between source systems during this critical transition

### Healthcare project

- Transformed legacy Python batch jobs into scalable Apache Spark workflows, successfully migrating complex healthcare data processing logic
- Contributed significant system architecture decisions across multiple projects to build production-grade ETL pipelines handling diverse healthcare data formats

## Skills

- Languages: Python (primary), Scala, Go, TypeScript
- Data & Lakehouse: Databricks, Delta Lake, Apache Spark, Apache Kafka, Airflow
- Applied AI: LLMs, Claude / Claude Code, MCP, FastMCP, Embeddings, Vector search, FastAPI
- Agent SDKs: OpenAI Agents SDK, Claude Agent SDK, Inngest AgentKit
- Data Warehousing & Analytics: Amazon Redshift, BigQuery, Amazon Athena, ElasticSearch, pandas
- Databases: PostgreSQL, Amazon RDS, ElastiCache: Redis / Valkey
- Cloud & Infrastructure: AWS (EMR, Lambda, SQS, Aurora), Docker, K8S, Datadog

## Experience

### Data Platform Lead, Lakehouse & AI (Jan 2025 – Present)

**Customer:** US adtech company

Lead the data platform and applied-AI initiatives for a US adtech company, owning lakehouse modernization, production reliability, and agentic AI systems while leading data engineering delivery.

Responsibilities:

- Spearheaded the end-to-end migration of a business-critical analytics platform from a legacy cloud warehouse and third-party analytical database to a Databricks Lakehouse, establishing a unified serving layer and removing vendor dependencies from the critical data path.
- Architected medallion-style data pipelines using Spark and Delta Lake: incremental ingestion, snapshot-consistent merges, granular fact models, pre-aggregations, and probabilistic counting for large-scale analytical workloads.
- Led controlled cutovers across data pipelines, backend APIs, frontend clients, and downstream services using automated source-versus-target parity testing and stable interface contracts; retired obsolete infrastructure and removed thousands of lines of legacy code.
- Delivered substantial efficiency improvements: 4–5× faster production processing, 100×+ acceleration in validated analytical prototypes, order-of-magnitude corrections to sizing logic, and recurring cloud-cost reductions.
- Restored correctness in complex measurement and attribution pipelines by resolving temporal-window, slowly-changing-dimension, canonicalization, deduplication, status-management, and aggregation defects.
- Led root-cause analysis and remediation of high-impact production incidents spanning schema inference, Unicode normalization, database connection lifecycle, distributed storage configuration, cache semantics, and accidental Cartesian joins.
- Built and launched an LLM-powered natural-language data discovery service from zero to production using FastAPI, agent frameworks, MCP tools, embeddings, and vector search over tens of thousands of domain attributes.
- Designed and productionized a Claude-powered autonomous incident-triage agent that performs permission-gated, read-only investigations, maintains an auditable activity trail, posts structured incident summaries, and opens draft documentation pull requests.
- Led the architecture (not implementation) of a resilient, self-learning agent-based LLM parser for adtech postlogs: scoped the stack, intermediate-result storage, off-the-shelf vs hosted OCR / bounding-box models (e.g., Datalab), clear responsibility separation from other agents in the flow, and an assessment of current SOTA approaches.
- Hardened agentic systems for production with structured-output validation, permission boundaries, bounded concurrency, token controls, observability, secrets management, container orchestration, and Git-based episodic memory.
- Applied Claude and Claude Code as engineering force multipliers for repository-scale analysis, migration planning, debugging, implementation, and documentation, backed by automated tests, data-parity checks, and human review.
- Automated several data-team and sprint/planning processes as reusable agent skills, converting runbooks into codified workflows with built-in confirmation steps and invariant checks.

### Messaging Project (Oct 2021 – Dec 2024)

**Customer:** US SaaS company

Developed core components of omnichannel messaging platform, enabling enterprise customers to reach users across WhatsApp Business, RCS, and SMS channels through unified APIs.

Responsibilities:

- Implemented key parts of sender registration system supporting multiple messaging channels (WhatsApp, RCS and potentially others), handling provider-specific requirements and compliance rules.
- Optimized message delivery latency and reliability while maintaining complex business rules across different OTT providers.
- Resolved critical customer escalations through deep technical investigation and edge case analysis.
- Worked on updating billing logic on a critical path, ensuring accurate and timely billing for customers.
- Contributed to cross-team Scala, Golang and Java projects.

### Marketing Project (Mar 2021 – Oct 2021)

**Customer:** US technology company

Developed and optimized Apache Spark pipelines processing cross-service engagement data with strict privacy preservation requirements across multiple digital entertainment and subscription platforms.

Responsibilities:

- Engineered high-performance Spark jobs processing TB-scale user engagement data.
- Built privacy-preserving data aggregation pipelines enabling anonymous cross-service analytics.
- Optimized data processing pipelines reducing job completion times while ensuring data minimization principles.
- Documented dataset lineage and data flow for compliance and reproducibility.

### Healthcare Project (May 2020 – Mar 2021)

**Customer:** US healthcare technology company

Played key role in modernizing healthcare data processing platform, enabling efficient transformation of diverse medical records into analytics-ready formats for business intelligence.

Responsibilities:

- Transformed legacy Python batch jobs into scalable Apache Spark workflows, migrating complex healthcare data processing logic.
- Designed and implemented production-grade ETL pipelines handling diverse healthcare data formats from multiple source systems.
- Optimized large-scale data reprocessing jobs reducing execution time while ensuring HIPAA compliance.
- Developed automated AWS S3 to Redshift data pipeline using PySpark and boto3, enabling real-time BI reporting.
- Contributed significant technical input to architecture decisions affecting multiple project initiatives.

### Adtech Project (Oct 2019 – May 2020)

**Customer:** Israel adtech company

Developed real-time ad analytics platform processing high-volume impression data (670+ MPS) to enable automated bidding decisions and campaign optimization.

Responsibilities:

- Implemented Spark Structured Streaming jobs for real-time ad performance analysis
- Optimized data processing architecture reducing operational costs while maintaining system reliability
- Modernized legacy Python ETL pipelines to improve maintainability and processing efficiency

### Communications Project (Feb 2018 – Oct 2019)

**Customer:** US SaaS company

Developed high-throughput streaming applications for SMS data processing, implementing complex rate calculation logic and generating business intelligence insights.

Responsibilities:

- Optimized messaging metadata enrichment pipeline during Kinesis-to-Kafka migration, achieving 830+ MPS through elimination of processing redundancies.
- Implemented comprehensive cross-stream validation to ensure data consistency between source systems during the transition.
- Built cost-effective internal services replacing external API dependencies, resulting in significant operational savings
- Led Scala knowledge-sharing initiatives including mentoring sessions and technical workshops

### Digital Transformation Project (Jun 2017 – Dec 2017)

**Customer:** US energy company

Contributed to enterprise-wide data modernization initiative, migrating traditional database workloads to cloud-based big data processing platform.

Responsibilities:

- Replaced legacy Oracle and MySQL batch processes with scalable Apache Spark pipelines
- Built new data processing workflows using Spark SQL and Hive, eliminating dependency on legacy SQL jobs

### IoT Monitoring Platform (Nov 2015 – Jun 2017)

**Customers:** European food and beverage companies

Developed real-time monitoring system processing data from IoT sensors across warehouse and retail locations, enabling predictive maintenance and inventory optimization.

Responsibilities:

- Implemented scalable data processing pipelines using Apache Spark and Akka Streams to handle real-time sensor data
- Built diagnostic tools enabling QA and hardware teams to validate device performance and data accuracy
- Developed anomaly detection system for early identification of hardware issues

## Education

**MSc Degree, Computer Science**, Lviv Polytechnic National University, Ukraine (2016 – 2017)

## Other Professional Development

### Natural Language Processing & AI (2023)

- Contributed Ukrainian localization and dataset improvements to OpenAssistant/oasst1 open-source project
- Built practical applications using transformer models and HuggingFace libraries
- Explored large language models and their enterprise applications

### Deep Learning & Time Series Analysis (2022)

- Completed fast.ai's Practical Deep Learning for Coders course
- Implemented time-series forecasting solutions using Facebook's Prophet
- Applied deep learning techniques to real-world data problems

## Personal Projects

I enjoy designing practical, custom-built apps with Claude and OpenAI Codex:

- A real-time FTMS logging and workout tracker for my air bike, with heart-rate and cadence support and data export.
- A high-performance 180° and 360° panorama and stereo-image viewer for the Meta Quest VR headset, with mipmap support and low latency.
- Automation of personal workflows with a Nous Hermes agent, through custom-built skills and purpose-built mini-apps.

## Languages

**English** - Advanced