Zendesk's people have one goal in mind: to make Customer Experience better. Our products help more than 125,000 global brands make their billions of customers happy, every day.
Role Overview
The AI/ML Platform team is at the forefront of this mission. We build the foundation that powers every AI-driven experience at Zendesk, enabling product teams to build, evaluate, and deploy state-of-the-art Large Language Model (LLM) applications reliably and at scale.
Responsibilities
* Lead the design, rollout, and optimization of ML Platform components including model serving, feature management, training pipelines, and evaluation frameworks.
* Own the design, rollout, and optimization of Zendesk's LLM Proxy, enabling safe, observable, and cost-efficient access to multiple foundation models.
* Champion best practices around service resilience, observability, cost efficiency, and performance optimization.
* Partner with stakeholders to ensure platform investments align with Zendesk's AI/ML roadmap and business priorities.
* Foster a culture of technical excellence, inclusion, and mentorship within the engineering team.
Requirements
* 7+ years of hands-on experience developing and deploying ML models or Generative AI.
* Proven success in production deployments with a focus on scalability, reliability, and availability.
* Familiarity with MLOps best practices (CI/CD for ML, model monitoring, automated retraining pipelines).
* Deep understanding of LLM systems, Gen AI applications, or ML/AI platform components such as vector databases, serving layers, and orchestration tools.
* Experience with provisioning and deploying services with a cloud provider (GCP, AWS, Azure).
* Fluency in any server-side programming language and the testing frameworks (Python, Java, Scala, Golang, Ruby).
* Sound understanding of architecture and software design patterns for server-side domains.
* Experience owning the full lifecycle of ML/AI platform components from early design to production deployment.