Generative AI Engineer

Morpheus Talent Solutions • Full-time • Remote (Toronto, ON, CA) • C$ 140k - C$ 160k / year • 3h ago

AI/ML Engineer

LLM Inference & Deployment

$150k CAD

Morpheus is partnered with a leader in AI-powered digital transformation to find an exceptional AI/ML Engineer for this role. Our client converts complex, unstructured data including sensitive medical information into structured digital assets for clients in life insurance, healthcare, and regulated industries. This is a hands-on engineering role at the intersection of LLM deployment, inference optimization, and enterprise AI.

You will own the end-to-end AI/ML lifecycle: from prompt engineering and fine-tuning through to inference optimization, scalable deployment on NVIDIA GPU infrastructure, and production monitoring. The work is technically deep and the stakes are real, production-grade AI systems running in regulated environments where reliability and accuracy matter.

What you'll own:

Prompt engineering strategies for summarization, extraction, classification, reasoning, and tool-augmented workflows
Fine-tuning and adapting open-source LLMs (Llama, Mistral, Gemma, Qwen) using LoRA, QLoRA, PEFT, and supervised fine-tuning
LLM inference deployment and optimization using vLLM, TensorRT-LLM, TGI, SGLang, or equivalent frameworks
Inference performance tuning: continuous batching, PagedAttention, KV-cache optimization, quantization, and long-context handling
MLOps and LLMOps pipelines covering training, deployment, monitoring, versioning, and continuous delivery
Model evaluation frameworks tracking quality, hallucination risk, latency, throughput, GPU utilization, and drift

What we're looking for:

Hands-on experience deploying and serving LLMs in production using vLLM or comparable inference engines
Strong experience optimizing inference for latency, throughput, memory efficiency, and cost on NVIDIA GPU infrastructure
Practical knowledge of LLM fine-tuning including LoRA, QLoRA, SFT, RLHF, DPO, and GRPO
Proficiency in Python with strong PyTorch or TensorFlow skills
Familiarity with RAG, LangChain or LlamaIndex, and vector databases
Experience with MLOps tooling: Ray Serve, MLflow, Weights and Biases, Evidently AI, Prometheus, Grafana
Understanding of data privacy and security principles, particularly in regulated or PHI environments
Bachelor's or Master's degree in Computer Science, Data Science, or a related quantitative field

Nice to have:

Oracle Cloud Infrastructure (OCI) experience
Background working with sensitive or regulated data in healthcare or life insurance contexts
Experience with distributed training and GPU quantization at scale