AI/ML Engineer
LLM Inference & Deployment
$150k CAD
Morpheus is partnered with a leader in AI-powered digital transformation to find an exceptional AI/ML Engineer for this role. Our client converts complex, unstructured data including sensitive medical information into structured digital assets for clients in life insurance, healthcare, and regulated industries. This is a hands-on engineering role at the intersection of LLM deployment, inference optimization, and enterprise AI.
You will own the end-to-end AI/ML lifecycle: from prompt engineering and fine-tuning through to inference optimization, scalable deployment on NVIDIA GPU infrastructure, and production monitoring. The work is technically deep and the stakes are real, production-grade AI systems running in regulated environments where reliability and accuracy matter.
What you'll own:
- Prompt engineering strategies for summarization, extraction, classification, reasoning, and tool-augmented workflows
- Fine-tuning and adapting open-source LLMs (Llama, Mistral, Gemma, Qwen) using LoRA, QLoRA, PEFT, and supervised fine-tuning
- LLM inference deployment and optimization using vLLM, TensorRT-LLM, TGI, SGLang, or equivalent frameworks
- Inference performance tuning: continuous batching, PagedAttention, KV-cache optimization, quantization, and long-context handling
- MLOps and LLMOps pipelines covering training, deployment, monitoring, versioning, and continuous delivery
- Model evaluation frameworks tracking quality, hallucination risk, latency, throughput, GPU utilization, and drift
What we're looking for:
- Hands-on experience deploying and serving LLMs in production using vLLM or comparable inference engines
- Strong experience optimizing inference for latency, throughput, memory efficiency, and cost on NVIDIA GPU infrastructure
- Practical knowledge of LLM fine-tuning including LoRA, QLoRA, SFT, RLHF, DPO, and GRPO
- Proficiency in Python with strong PyTorch or TensorFlow skills
- Familiarity with RAG, LangChain or LlamaIndex, and vector databases
- Experience with MLOps tooling: Ray Serve, MLflow, Weights and Biases, Evidently AI, Prometheus, Grafana
- Understanding of data privacy and security principles, particularly in regulated or PHI environments
- Bachelor's or Master's degree in Computer Science, Data Science, or a related quantitative field
Nice to have:
- Oracle Cloud Infrastructure (OCI) experience
- Background working with sensitive or regulated data in healthcare or life insurance contexts
- Experience with distributed training and GPU quantization at scale
This is a role for someone who wants to go deep on the infrastructure that makes enterprise AI actually work in production, not just in notebooks.