High-tech systems reliability specialist

Viana do Castelo

beBeeReliability

Anunciada dia 27 agosto

Descrição

Job Overview:

We are seeking an experienced technical lead to join our operations team. As a site reliability engineer, you will provide guidance and mentorship while maintaining a hands-on approach.

This role offers a unique opportunity to combine leadership with engineering excellence. You will design and implement scalable systems, ensuring the operations team has the necessary expertise and guidance to succeed.

Key Responsibilities:

* Act as technical lead for the operations team, setting standards for reliability, automation, and scalability.
* Mentor and guide engineers, fostering knowledge sharing and technical growth.
* Lead incident response, root cause analysis, and ensure postmortem learnings are translated into improvements.
* Collaborate closely with development and product teams to balance agility with operational stability.

Technical Requirements:

* Infrastructure as Code (IaC): Build and manage infrastructure with Terraform; maintain and support legacy Ansible where needed.
* Kubernetes & Orchestration: Operate and optimize Kubernetes clusters, leveraging Argo CD and Argo Workflows for GitOps.
* CI/CD: Develop GitHub Actions pipelines and oversee the migration away from legacy Octopus Deploy.
* Systems Administration: Manage Linux and Windows Server systems, ensuring performance, reliability, and security.
* Monitoring & Observability: Own monitoring and observability solutions with Prometheus, Grafana, and OpenTelemetry; define and track SLOs/SLIs.
* Databases & Caching: Operate MSSQL, PostgreSQL, and Redis in production environments.
* Networking & Security: Manage WAF and CDN services (Cloudflare) and drive secure infrastructure practices.

Requirements:

* Proven experience as a Site Reliability Engineer, DevOps Engineer, or Infrastructure Engineer with technical leadership responsibilities.
* Strong Cloud platform experience using Azure.
* Strong expertise in Terraform; Ansible familiarity a plus.
* Hands-on with Kubernetes and GitOps workflows (Argo CD/Workflows).
* Skilled in both Linux and Windows Server environments.
* Experienced with CI/CD pipelines, particularly GitHub Actions.
* Deep understanding of monitoring/observability (Prometheus, Grafana, OpenTelemetry).
* Strong incident management and troubleshooting skills in distributed systems.
* Experience maintaining and scaling High-Traffic Web Applications.
* Excellent collaboration and communication skills, with experience mentoring other engineers.
* You are able to work in a full-remote setup.

Benefits:

* Opportunity to grow professionally and technically.
* Flexible working environment.
* Competitive compensation package.

Se candidatar

Criar um alerta

Salvar