Senior Site Reliability Engineer

Engineering · Full-time · Tamil Nādu, India

Job description

Who are we?

We're a technology company laser-focused on improving how people learn the language and behavioral skills they need to thrive in intercultural working environments.

Our global presence in 14 countries across three continents and a staff of 30 nationalities help us find solutions for enterprises to break down cultural barriers and unleash their international teams' full potential. Around 400 Learnshippers globally have decided to join us already, and we hope that with your passion for languages, technology, and lifelong learning, you, too, will join us soon.

We are looking for a Senior Site Reliability Engineer in Chennai, India at the earliest possible starting date.

Therefore, please read the below design case of any questions.

We can't wait to meet you!

Who You Are:

You're a seasoned Site Reliability Engineer passionate about building and maintaining highly reliable, scalable, cost-effective, and secure systems. You're experienced in containerization technologies like Docker and adept at infrastructure-as-code tools to automate your workflows.

Your software engineering background shines through your ability to troubleshoot complex issues. You have a keen eye for monitoring and observability, leveraging cloud-based monitoring and observability service platforms to gain insights into system performance and proactively identify potential problems.

You thrive in collaborative environments, working closely with development teams to improve application reliability and operability. You're not afraid to roll up your sleeves and dive into incident response, applying your expertise to resolve issues and prevent future occurrences. You're also a strategic thinker, able to forecast resource needs, conduct capacity planning, and optimize the utilization of cloud infrastructure.

You're passionate about security and implementing best practices to protect systems and data. You're a continuous learner, staying up-to-date with the latest technologies and trends in the SRE space. Ideally, you hold certifications in AWS or Azure Cloud, Kubernetes, and Linux (RHCSA/RHCE), demonstrating your commitment to professional development.

Core Responsibilities:

  • System Reliability: Ensure the high availability, performance, scalability, and security of our AWS/Azure cloud-based production systems.
  • Observability: Develop comprehensive monitoring and alerting systems to identify and address issues proactively.
  • Incident Response: Participate in on-call rotation, troubleshoot complex issues, and drive post-incident reviews for continuous improvement.
  • Performance Optimization: Analyze system performance, identify bottlenecks, and implement optimizations to improve efficiency and user experience.
  • Collaboration: Work closely with development teams to improve the reliability and operability of our applications.
  • Capacity Planning: Forecast resource needs, conduct capacity planning, and ensure optimal utilization of cloud infrastructure.
  • Security: Implement and maintain best practices to protect our systems and data.

Why Join Us:

  • Impact: Make a real difference in users' experience across the globe.
  • Growth: Learn and develop your skills in a cutting-edge tech environment.
  • Culture: Enjoy a collaborative and inclusive workplace with a focus on innovation.

If you're passionate about reliability engineering, love solving complex problems, and want to contribute to a meaningful mission, we encourage you to apply!

We look forward to receiving your application!

Learnship is an equal opportunity employer and values diversity. Therefore, we do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, or disability status.

Visit https://www.learnship.com & Learnship: My Company | LinkedIn to know more.