Engineer – Infrastructure SRE
Job Description
Reporting to the Manager – Infrastructure SRE, the role holder will be responsible for building, automating, operating, and continuously improving on-premises, hybrid, and multi-cloud infrastructure platforms using Site Reliability Engineering (SRE) and automation-first principles. The role ensures the availability, performance, scalability, and resilience of critical systems through engineering, automation, and data-driven reliability practices.
Working closely with the Manager – Infrastructure SRE and platform teams, the role focuses on eliminating manual operational toil, improving system reliability, and embedding automation-first and reliability-by-design principles across on-premises, hybrid, and multi-cloud environments (AWS, Azure, GCP, OCI).
Responsibilities
- Uphold the company code of conduct, policies and procedures, ensuring integrity and accountability in every aspect of your work.
- All employees have a responsibility to adhere to safety, health, and wellbeing policies, guidelines and procedures in all actions and decisions
- Implement and support SLIs, SLOs, and error budgets for assigned platforms and services.
- Monitor platform health, availability, latency, and error rates and participate in on-call rotations, incident response, and major incident recovery.
- Design and implement end-to-end infrastructure automation across on-premise data centers, private cloud, and public cloud environments. premise data centers, private cloud, and public cloud environments. premise data centers, private cloud, and public cloud environments.
- Build and maintain Infrastructure as Code (IaC) using Terraform, Ansible, and Helm. Automate infrastructure provisioning, scaling, patching, recovery, and decommissioning.
- Develop scripts and tooling (Bash, Python, PowerShell) to reduce manual operational tasks and contribute to self-healing and auto-remediation workflows.
- Engineer and operate on-premise infrastructure including virtualization, compute, storage, backup, and network platforms. Premise infrastructure premise infrastructure
- Engineer and operate hybrid cloud environments, ensuring seamless integration between data centers and public cloud platforms.
- Engineer and operate infrastructure across AWS, Azure, GCP, and OCI under defined enterprise standards.
- Engineer and operate infrastructure across AWS, Azure, GCP, and hybrid platforms.
- Support Kubernetes platforms (EKS/AKS/GKE/OpenShift), including upgrades, scaling, and reliability tuning.
- Support DevSecOps practices by integrating security checks into pipelines.
- Assist with DR/BCP testing, backup validation, and recovery procedures.
Qualifications
- Bachelor’s degree in computer science, Information Technology, Engineering, or a related technical field.
- Proven hands-on experience supporting production infrastructure and cloud platforms.
- Strong automation mindset with demonstrated reduction of manual operational tasks.
- Experience working within ITIL / DevOps / SRE operating models.
- Preferred Certifications (Added Advantage)
- Cloud Associate or Professional certifications (AWS, Azure, or GCP).
- Kubernetes certifications (CKA / CKAD).
- Linux certifications (RHCSA / RHCE).
- DevOps or SRE-related certifications.
