Manage Availability and Capacity on the Core Applications. Provide support for the Applications and ensure their optimal performance. Implement setup of new Applications in the company’s environment.
- Design, implement, and maintain highly available and scalable infrastructure.
- Collaborate with development teams to ensure applications are built with reliability and performance in mind.
- Monitor system performance, identify bottlenecks, and proactively implement optimizations to improve system efficiency.
- Develop and maintain automation tools for deployment, configuration, and monitoring of systems and services.
- Conduct system capacity planning and provide recommendations for scaling resources to meet growing demands.
- Identify and resolve complex technical issues related to infrastructure, networking, and application performance.
- Implement and improve monitoring, alerting, and logging systems to ensure timely detection and resolution of incidents.
- Collaborate with cross-functional teams to define and implement best practices for infrastructure, deployment, and operational processes.
- Participate in on-call rotation and provide timely response and resolution to production incidents.
- Stay up-to-date with industry trends and emerging technologies in cloud computing and infrastructure automation.
- Strong experience with on prem infrastructure and cloud platforms such as AWS, Azure.
- Proficiency in infrastructure-as-code (IaC) tools like Terraform or CloudFormation.
- Solid understanding of containerization technologies such as Docker and orchestration tools like Kubernetes.
- Experience with configuration management tools like Ansible, Puppet, or Chef.
- Strong scripting skills in languages such as Python, Bash, or PowerShell.
- Deep knowledge of Linux systems administration and networking concepts.
- Familiarity with monitoring and logging tools like Prometheus, Grafana, ELK Stack.
EDUCATION AND EXPERIENCE
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience).
Service Management Certifications (e.g. ITIL) is an advantage Experience (Number of relevant years):
Minimum of 3-4 years of experience in a similar SRE or infrastructure engineering role