We are looking for a Site Reliability Engineer to join the team of our client – a company specialized in the technology sector.
What will be your main tasks and responsibilities?
* Operate and support the production environment, responding to incidents and ensuring systems remain highly available;
* Triage and troubleshoot production issues across services, infrastructure and network layers;
* Monitor systems using observability tools, contributing to alert tuning and service level objectives;
* Collaborate with platform teams to improve reliability, operability, and scalability;
* Execute standard operational procedures (e.g. deployments, rollbacks, failovers);
* Identify common BAU operational tasks and automate them in a safe, auditable and scalable way.
What will be required from you?
* Degree in Computer Science, Engineering, or other similar area;
* At least 2-3 years of experience in a similar role;
* Solid understanding of Linux systems administration (troubleshooting, permissions, system services);
* Experience with AWS services (e.g., VPCs, EC2, S3, IAM, EKS) and Kubernetes;
* Hands-on experience with production environments, preferably in roles such as SRE, Cloud Support Engineer or Production Support Engineer;
* Familiarity with incident response and operational run books.
* Skills in Bash, Go, Python, or similar;
* Familiarity with CI/CD pipelines and deployment automation;
* Knowledge of monitoring/logging tools like Prometheus, Grafana and ELK
* Exposure to security and compliance practices in cloud environments;
* Strong communication and collaboration skills;
* Calm under pressure, particularly during incident response;
* Eagerness to learn and continuously improve operational excellence.
* Fluency in English, written and spoken.
Sounds like you? Send us your CV and let's talk