Overview
System Reliability Engineer role at Vodafone. Join our IoT Platform Engineering team to help ensure resilience, uptime, and scalable performance across Vodafone's IoT platform.
What You’ll Do
* Develop and govern resilience strategies that span system architecture, deployment, monitoring, and incident response
* Define and track stability KPIs (e.g., MTTD, MTTR, error budgets), partnering with performance and operation teams to meet or exceed targets
* Design and implement fault injection testing, chaos engineering practices, and scenario-based simulations to validate platform robustness
* Collaborate with product, infrastructure, architecture and development teams to re-design services with built-in redundancy, failover, and graceful degradation
* Drive automation and observability improvements to reduce noise, increase fault detection speed, and support predictive failure mitigation
* Contribute to the design and maintenance of our Business Continuity and Disaster Recovery Plan (BD/DR), ensuring IoT systems remain resilient and recoverable
* Own the resilience roadmap and continuously assess emerging threats, technologies, and architectural shifts to guide evolution of stability practices
* Evangelize a culture of resilience through internal communication, workshops, and post-incident learning programs
* Engineering excellence – Deliver new capabilities and services efficiently while continuously enhancing the resilience, scalability, and cost-effectiveness of our IoT platform
* Delivery focus – Consistently meet or exceed delivery expectations—ensuring the right customer experience, delivering tangible business outcomes, and achieving financial targets
* Stakeholder management – Foster trusted, transparent, and outcome-driven relationships with business and technical stakeholders
Who You Are
* Degree in Software Engineering or related discipline with Computer Science
* Good understanding of DevSecOps methodology mindset
* Good understanding of information security
* Scripting experience such as Bash, Python, Perl, Groovy, PowerShell
* Proven experience with high-availability system design, chaos engineering principles and proactive failure mitigation strategies
* Experience with ISO 22301
* Good understanding of system monitoring tools and automated testing frameworks
* Industry experience with Software Platforms on Linux, on-premises and cloud server technologies
* Deep understanding of SRE principles including SLOs/SLIs, error budgets, observability, toil reduction, and automation
* Demonstrated ability to balance operational stability with delivery velocity
* Understanding of security principles, practices and standards and how they translate into real-world technical solutions
* Hands-on experience with infrastructure provisioning and configuration management tools such as Terraform or Ansible; scripting to automate manual processes (e.g., Python, Bash)
* Strong command of telemetry, logging, and alerting stacks (e.g., Prometheus, Grafana, ELK, Datadog, Splunk)
* Experience defining meaningful SLIs and building dashboards that drive actionable insight
* Skilled in leading and participating in incident response with a calm, structured approach
* Experience driving blameless postmortems, root cause analysis, and continuous improvement across teams
* Good knowledge of DevSecOps principles
* Expertise in identifying and resolving system bottlenecks, latency issues, and throughput constraints
* Proficient in forecasting demand and managing system growth in a cost-efficient manner
* Proven ability to work closely with software engineers, infrastructure teams, product owners, and business stakeholders to embed reliability into the development lifecycle
* Consultative, customer-focused design mind-set
* Strong presentation and communication skills, to technical, business and (senior) management audiences
* Strong work planning and time management skills
* Willing to learn and a strong sense of ownership and autonomy
Not a perfect fit?
Worried that you don’t meet all the desired criteria exactly? At Vodafone we are passionate about empowering people and creating a workplace where everyone can thrive. If you’re excited about this role but your experience doesn’t align exactly with every part of the job description, we encourage you to still apply as you may be the right candidate for this role or another opportunity.
What’s In It For You
* Hybrid Work Model - Flexible hybrid work model with 8-10 in-office days per month
* Vodafone Products and Services - Mobile phone, free communication plan, data card, and discounts on services and products
* Recognition - Programs for innovative, high-potential employees and exemplary behaviors
* Health and Well-being - Well-being program with nutrition and psychological consultations, webinars, workshops, and discounts
* Learning - Access to Communities of Practice and digital training content
* Local and International Mobility - Internal recruitment with local and international rotation opportunities
Who We Are
We are a leading international Telco. Vodafone believes connectivity is a force for good and aims to improve lives through technology while protecting the planet. We value diversity and foster an inclusive environment. If you require accessibility adjustments during recruitment, refer to the Vodafone careers site for guidance.
Together we can.
Job Details
* Seniority level: Not Applicable
* Employment type: Full-time
* Job function: Other
* Industries: Information Services, IT Services and IT Consulting, and Telecommunications
Referrals increase your chances of interviewing at Vodafone. Get notified about new Reliability Engineer jobs in Lisboa, Portugal.
#J-18808-Ljbffr