Context of the role
In this role you will be responsible for designing, implementing, and deployment and maintainance of Ezra Platforms which encompasses the IAAS, containerized platforms and micro-services. You will play a crucial role in managing the infrastructure and driving the DevOps practices.
Key Responsibilities
Architecture and Design
- Design and implement scalable, resilient, and secure platform solutions
- Develop and maintain infrastructure-as-code using tools like Terraform, Cloud-Formation and Ansible
- Create and optimize CI/CD pipelines for efficient software delivery
- Architect cloud-native solutions leveraging containerization and microservices
- Implement disaster recovery and business continuity strategies
Infrastructure Management
- Manage and optimize our Public cloud infrastructure (AWS, Azure, or GCP)
- Manage and optimize private cloud infrastructure in partner premises.
- Implement best practices for cloud security, compliance, and cost optimization
- Design and implement multi-region and multi-cloud strategies
- Design and maintain containerized application environments using Docker
- Architect, deploy, and manage Kubernetes clusters for container orchestration
Automation and DevOps
- Develop automation scripts and tools to streamline operations and reduce manual tasks
- Integrate monitoring, alerting, and logging systems
- Ensure Standardized QA and Production environments through implementation of proper branching strategies
- Configure and manage load balancers (e.g., NGINX, HAProxy, cloud-native solutions)
- Implement and manage service mesh technologies (e.g., Istio, Linkerd) for microservices architectures
Performance Optimization
- Analyse and optimize system performance, identifying and resolving bottlenecks
- Conduct capacity planning and implement auto-scaling solutions
- Optimize container resource allocation and performance
Team Leadership and Collaboration
- Mentor junior engineers and provide technical guidance to the team
- Collaborate with cross-functional teams to align platform capabilities with business needs
- Contribute to technical decision-making and architectural reviews
Documentation and Knowledge Sharing
- Maintain comprehensive technical documentation for platform components and processes
- Contribute to internal knowledge bases and conduct knowledge-sharing sessions
L2 Support and Escalation Management
- Provide expert-level troubleshooting and resolution for critical platform and infrastructure problems
- Analyze recurring issues and implement long-term solutions to prevent future occurrences
- Collaborate with the operations team to improve support processes and knowledge transfer
- Conduct post-incident reviews and implement lessons learned to enhance system reliability
Required Qualifications
- Bachelor's degree in Computer Science, Engineering, or a related field
- 5+ years of experience in platform engineering, DevOps, or similar roles
- Strong proficiency in at least one cloud platform (AWS, Azure, or GCP)
- Expert-level knowledge of containerization technologies (Docker, Kubernetes)
- Extensive experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, pulumi)
- Proficiency in scripting languages (e.g.Bash, )
- Strong understanding of networking concepts, load balancing, and CDNs
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack)
- Excellent problem-solving skills and ability to troubleshoot complex systems
Preferred Qualifications
- Experience with multi-cloud architectures
- Knowledge of service mesh technologies (e.g., Istio, Linkerd)
- Familiarity with serverless computing platforms
- Relevant certifications (e.g., AWS Certified Solutions Architect, CKAD, CKA)