Infrastructure DevOps Engineer

Engineering · Full-time · Global

Rocket.Chat is the world’s largest open-source communication platform, trusted by 12M+ users in 150 countries. A flexible and transparent hub that allows companies to centralize communication and customer support in a single place, boosting team productivity.

This is a remote position.

(This is a worldwide position, with preference for new rocketeers in the Eastern Europe and APAC timezones)

At Rocket.Chat:

Our mission at Rocket.Chat is to empower organizations to own their communication by developing the world’s most open, flexible, and secure collaboration platform.

We are a technology company that builds an integrated collaboration platform for businesses and organizations of every size. We are committed to open source and giving companies power over their communication to control privacy, security, integrations, and everything that determines how they connect and communicate.

We do this by building the world’s largest and most flexible open-source communication platform, powered by a robust and globally distributed network of independent servers supercharged by a vibrant ecosystem of apps, developers, and service providers.

About this position:

We are excited to grow our infrastructure team!

Come join us and work on some exciting challenges and make a lasting impact on Rocket.Chat’s future.

These are some of the challenges you can expect working at Rocket.Chat in this position:

  • Deliver a cloud experience, designing for service availability, with fault tolerant components
  • Participate in the development process, architecture discussions, and other important designs related to agile IT infrastructure environments
  • Be directly engaged with the improving reliability and management of infrastructure services
  • Working with Infrastructure as Code, Configuration as code and enable infrastructure automation and orchestration
  • Monitor services, identify and remove bottlenecks, manage and balance connections, handle data replication and ensure data resiliency
  • Incident management, metrics collection, alert routing, and writing post-mortems
  • Unpack problems into smaller pieces, identify possible causes and troubleshoot
  • Working with software-defined storage and software-defined networking
  • Secret management, data encryption, track vulnerabilities and meet security standards
  • Build tooling to help other teams safely deploy to production or other critical environments
  • Map organization's security policies to infrastructure solutions
  • Support multi-data center, multi-cloud deployments and implement disaster recovery strategies

About you:

  • We are looking for someone that has a natural drive for problem solving
  • Enjoys designing and improving systems
  • Has vast OS internals knowledge
  • Knows way around networking and its complexities
  • Has enough programming experience to write software to solve infrastructure problems
  • Feels confident working with critical environments, even when they go down
  • Has a focus on automation, orchestration and building resilience into systems so we don’t have to repeatedly fix the same problems
  • Embraces randomness and trickiness, and feels comfortable working in a very agile environment
  • Is a highly autonomous individual and also a team player :)

Desired Skills:

  • Understanding of Unix/Linux operating systems
  • Experience with container and cluster management tools such as Rancher, Kubernetes, Docker
  • Experience with monitoring and visualization platforms such as Prometheus, Grafana, ELK stack
  • Experience automating repetitive operational tasks using Go, Python and/or shell
  • Experience managing database engines such as MongoDB
  • Deep, hands-on experience with foundational infrastructure capabilities, including HA, DR, Network, Routing, Firewall, DNS
  • Experience responding to system outages and monitoring alerts, resolving incidents to ensure system uptime and expected service levels
  • Experience with bare metal servers
  • Experience developing Terraform, Ansible, Puppet and/or Salt
  • Experience analyzing cloud systems issues and provide recommendations for long-term solutions
  • Experience executing change requests that impact production systems used by our diverse software products


Wherever you are our goal is to make your routine as a Rocketeer feel enjoyable, exciting, and comfortable, so if you are remote or working from our office in Porto Alegre (Brazil) you’ll receive a set of benefits to improve your work experience!

About Rocket.Chat

Today one of the largest open-source projects in the world with more than 1000 developers, Rocket.Chat has advanced as a platform that empowers people to collaborate with others, while empowering individual teams to fully customize their platform to meet their unique needs.

As Rocket.Chat we believe in collaborating to create a more collaborative world! See yourself in that? So apply now!


By clicking "Continue" or continuing to use our site, you acknowledge that you accept our Privacy Policy and Terms of Use. We also use cookies to provide you with the best possible experience on our website.