Platform team lead - remote

Lisboa

Zyte

Anunciada dia 24 outubro

Descrição

Overview

At Zyte, we eat data for breakfast and you can eat your breakfast anywhere and work for Zyte. Founded in 2010, we are a globally distributed team of over 250 Zytans working from over 28 countries who are on a mission to enable our customers to extract the data they need to continue to innovate and grow their businesses. Zyte builds powerful, easy-to-use tools to collect, format, and deliver web data, helping thousands of organizations make smarter business decisions and drive sustainable growth. Zyte is seeking an experienced Team Lead to manage our Core & MLOps Squad, responsible for building the bedrock infrastructure that powers Zyte at scale. This hands-on technical leadership role requires expertise across MLOps, systems programming, and orchestration to lead a cross-functional team in designing and maintaining the scalable foundation that enables all Zyte teams to build and run their services with confidence.

Location: Lisbon, Portugal (remote-friendly)

Responsibilities

* Technical Leadership: design and evolve the core platform (Kubernetes, Mesos, GPU scheduling/autoscaling, distributed compute); own the model platform (registry, experiment tracking, training orchestration, evaluation, serving, monitoring); build the Golden Path with reference repos, a scaffold CLI, opinionated CI/CD pipelines, runtime contracts (health/metrics/tracing/SLOs), high-performance clients, and production-ready defaults.
* MLOps Excellence: operate a secure, multi-tenant model registry and training platform with standardized experiment/evaluation harnesses; provide turnkey serving patterns (online + batch), drift/quality monitoring, and rollback playbooks; integrate public/open-source AI capabilities as managed platform services with cost and data-governance guardrails.
* Team Management: run the squad (roadmap/prioritization, delivery, mentoring, and high engineering standards); partner with product engineering (Zyte API, Scrapy Cloud), Prod Ops, and Security on adoption and rollout plans; mentor the team and foster a platform-thinking mindset.
* Ownership Areas: container orchestration (Kubernetes/Knative), GPU provisioning & autoscaling, environment & secret management; develop operators, sidecars, and internal SDKs/libraries that enforce the golden path contract; manage model platform components (registry, training, serving, monitoring); establish observability pipelines; maintain billing/metering/cost-tracking; uphold reliability (SRE), cost governance, and SBOM/image signing.

Qualifications

Required
* 5+ years experience building distributed systems; 3+ years in MLOps/ML platform engineering (or equivalent impact)
* Knowledge of Linux/OS internals, networking, concurrency, and performance profiling
* Deep understanding of Kubernetes (bonus: Mesos)
* Proficiency developing high-performance services in Java, Rust, Go or C++ (bonus: vert.x and Netty); strong Python skills
* Experience with GPU infrastructure (scheduling, containerization, optimization)
* Track record of designing and operating model platforms in production
* Demonstrated success leading technical teams and implementing organization-wide platform solutions
Preferred
* Streaming & workflows (Kafka + Argo/Temporal/Airflow or equivalents)
* eBPF-based observability, perf tooling, or io_uring experience
* Cost optimization for ML/AI; multi-tenant quotas and fairness
* Hands-on experience authoring Golden Paths (service chassis/templates, CI/CD blueprints, CLI scaffolds)
* SRE practices (SLIs/SLOs, incident management)

Benefits

* Remote-friendly culture with flexibility to work from anywhere
* Opportunity to work with cutting-edge open-source technologies
* Join a self-motivated, progressive, multi-cultural team

Seniority level

* Mid-Senior level

Employment type

* Contract

Job function

* Other
* Industries: IT Services and IT Consulting
#J-18808-Ljbffr

Se candidatar

Criar um alerta

Salvar