Our mission is to improve developers' experience by giving them the tools to manage the entire software lifecycle and to be self-sufficient. To help with this we are building our own internal PaaS using the latest technologies like Kubernetes, Prometheus, Kotlin and others. This platform is an important pillar in Talkdesk's engineering effort and helps us deliver better, faster and more reliable solutions for our customers.ResponsibilitiesDesign, build, harden, and maintain key infrastructure parts of our platform (from the lifecycle of the infrastructure to each one of our Kubernetes clusters)Support the processes that enable the safe upgrade and update of each component of our compute infrastructureWork with GitOps industry-leading tools such as Spacelift and/or AtlantisHelp automate safe deployment practices by using industry-leading tools such as GitHub Actions, ArgoCD, Argo Rollouts, Helm Charts, etcHelp automate infrastructure provisioning and other engineering processes by working on automations built on top of an engineering platform written in GitHub ActionsCoach and up-skill other engineering team membersSolve challenging technical problems and put your skills to the test every day; see an immediate impact of your work and the value you've created for other engineersAutomate every aspect of our infrastructure to remove as much human intervention as possibleDevelop effective tooling, alerts, and responses to both identify and address reliability risksDrive and promote protocols on production readiness and operational excellencePartner with product engineering teams to debug production outages and carry out action items to improve the reliability of those systemsAdvocate for automated testing, continuous integration and delivery, feature toggles, and progressive rolloutsPlan for the growth of Talkdesk's infrastructure.Skills and QualificationsUnderstand large-scale complex systems from a reliability perspectivePassion for producing clean, standards-compliant, secure codeBringing a developer mindset and applying it to infrastructureKnow your way around Linux/Unix systemsExperience with KubernetesExperience with Infrastructure as code tools like Terraform and AnsibleExperience building software with a programming language such as Java, Kotlin, Scala, or any other JVM-based languagesExperience writing scripts for automating the execution of certain tasks with a programming language like Ruby, Python, Bash, or any other scripting languageExperience with at least one relational and non-relational databases (ex.: PostgreSQL, MySQL, MongoDB, Redis, ElasticSearch)Ability to identify time-consuming and error-prone manual tasks and then build/leverage tooling to automate themAbility to identify root causes of instability in a large-scale distributed system across stacksNice to haves / PlusesExperience with cloud-based solutions such as Amazon AWS, Google Cloud, or Microsoft AzureExperience with Go programming languageAdditional Notes This position will follow a hybrid work model.Work Environment and Physical Requirements Primarily office-environment work, extended periods of sitting or standing, computer-based work. Limited lifting, and equipment usage limited to computer-related equipment (keyboards, mouse, etc.)The Talkdesk story hinges on empathy and acceptance. It is the shared goal among all Talkdeskers to empower a new kind of customer hero through our innovative software solution, and we firmly believe that the best path to success for our mission is inclusivity, diversity, and genuine acceptance. To that end, we will hire, promote, work along, cheer for, bond with, and warmly welcome into the Talkdesk family all persons without regard to ethnic and racial identity, indigenous heritage, national origin, religion, gender, gender identity, gender expression, sexual orientation, age, disability, marital status, veteran status, genetic information, or any other legally protected status.#J-18808-Ljbffr