Senior platform engineer

Maia

Sybilion

Anunciada dia 15 janeiro

Descrição

About Sybilion

Sybilion builds AI-driven market forecasting for process industries (chemicals, packaging, pulp & paper, textiles, and broader manufacturing). We help procurement, supply chain, and commercial teams make better buy/sell decisions by turning messy external signals and internal operational data into clear, defensible forecasts that teams trust and act on.

Our stack includes Python-based microservices, PostgreSQL data infrastructure, and ML/AI workflows that support forecasting models and decision tooling.

About the Role

We're hiring someone to own both our platform and data infrastructure: Kubernetes administration, Linux systems, CI/CD, observability, and PostgreSQL administration for our data lakes and ML pipelines. You'll keep production reliable, fast, secure, and scalable, while supporting the day-to-day needs of our engineers and ML workflows.

This is an on-site role in Maia (Porto). We value in-person collaboration and move quickly.

What You'll Do

Platform / Kubernetes / Systems

* Design, deploy, and operate Kubernetes clusters in production (networking, storage, security)
* Operate Linux server infrastructure (Ubuntu/RHEL), patching, hardening, and reliability
* Manage Docker image lifecycle (builds, optimisation, registry management, security scanning)
* Implement and maintain CI/CD pipelines for microservices deployments and infrastructure changes
* Build and maintain Infrastructure as Code (Terraform, Ansible, Helm) and Git workflows
* Operate and improve monitoring, logging, and alerting (Prometheus/Grafana, ELK/EFK/Loki, etc.)
* Manage secrets and credentials securely (Vault, Sealed Secrets, or equivalent)
* Ensure high availability, capacity planning, incident response, and disaster recovery readiness
* Support GPU-enabled workloads and ML/LLM deployments (resource allocation, utilisation, scaling)

PostgreSQL / Data Infrastructure

* Administer and optimise PostgreSQL databases and data lake infrastructure (performance, reliability, cost)
* Own backup/recovery and disaster recovery procedures (including point-in-time recovery)
* Design schemas, indexing strategies, and query optimisation approaches; analyse execution plans
* Manage migrations and versioning (schema changes, rollout strategies, rollback plans)
* Implement replication/failover/clustering patterns for high availability
* Own database security: access controls, encryption at rest/in transit, audit logging, compliance needs

Python Microservices / Data Pipelines / ML Workflows

* Support deployment and troubleshooting of Python microservices (FastAPI/Flask/Django or similar)
* Help maintain Python environments and dependency management (pip/poetry/conda/mamba)
* Support ETL/ELT pipelines feeding our data lake and ML training workflows
* Implement data quality checks and validation where needed
* Partner with engineers and ML team to improve runtime performance, reliability, and operational visibility

Must-Have Experience (Required)

* 5+ years of hands-on production experience in: Linux, Docker, Kubernetes, and PostgreSQL
* Strong Kubernetes administration skills (clusters, networking, ingress, storage, RBAC, security)
* Strong PostgreSQL administration skills (performance tuning, backups, replication/HA, security)
* Strong Linux systems skills (operations, troubleshooting, hardening)
* CI/CD experience (GitHub Actions/GitLab CI/Jenkins or similar)
* Infrastructure as Code experience (Terraform and/or Ansible; Helm for Kubernetes)
* Observability experience (metrics, logs, alerting; root-cause analysis)
* Solid Python literacy for debugging services and automating operational tasks
* Strong communication skills in English and comfort working independently end-to-end
* Willingness to participate in an on-call rotation for critical systems

Preferred (Nice to Have)

* Startup background (you've worked in small teams, moved fast, and owned outcomes end-to-end)
* Experience running ML infrastructure (MLflow, Kubeflow, Airflow, KServe/TorchServe, etc.)
* GPU cluster experience (NVIDIA GPU Operator or similar) and model serving optimisation
* Experience with service mesh (Istio/Linkerd)
* Experience with cloud managed databases (AWS RDS, GCP Cloud SQL, Azure Database)
* Familiarity with data lake / warehouse patterns and data versioning (DVC/MLflow tracking)
* Experience with Redis/MongoDB or other complementary data systems

Soft Skills We Value

* Strong problem-solving and analytical mindset
* Calm, structured incident handling and good judgement under pressure
* Proactive improvement orientation (you spot issues before they become outages)
* High bar for security, documentation, and operational hygiene
* Collaborative approach with product, engineering, and ML teams

What We Offer

* €40,000–€70,000 salary range (depending on experience)
* Professional development and training budget
* Modern office environment in Maia, Porto
* Opportunity to work with cutting-edge ML/AI infrastructure and real-world data systems
* Career growth path within a growing technology organisation
* Coffee and snacks

Work Environment

This is an on-site position based in our Maia office. We value in-person collaboration and believe being together improves speed, clarity, and ownership.

Se candidatar

Criar um alerta

Salvar