Site Reliability Engineer

Engineering · Full-time · United States · Remote possible

Job description

We are a leading cybersecurity company with multiple offices (San Francisco and Redwood City, CA; Herndon, VA; and Washington, D.C.) working to fundamentally disrupt the way organizations deliver and access the web, as well as conduct digital investigations. The world’s most at-risk organizations rely on Authentic8 to completely eliminate the risk of using the web. More than 700 government agencies and commercial enterprises trust Authentic8’s 100% cloud-native, Silo Web Isolation Platform to protect their most at-risk data and missions. Silo is designed ground-up with zero-trust principles, delivered via globally-scaled infrastructure, and configurable to solve a wide range of security challenges. Our flagship product, Silo for Research, is the market-leading solution for conducting secure and anonymous digital investigations and intelligence work across the surface, deep and dark web.

We have an immediate need for multiple Site Reliability Engineers. The successful candidates will have a passion for automating the day to day operations actions on the infrastructure. We are embarking upon a significant scale expansion that requires a new low touch platform to enable us to scale to our current and future growth rate. With the opportunity to redefine and rebuild everything from our build pipeline to live site management as well as be able to strategically identify, plan, and deploy significant changes to existing processes and infrastructure.

Key Areas of Responsibility:

  • Manage all aspects of our production service: Google Compute Engine, Chef, Kubernetes, Docker, HAProxy, RDP, AWS, Gitlab, Active Directory
  • Participate in the design and implementation of a low touch infrastructure platform expansion to support current and future scale.
  • Participate in a tier one/two on-call rotation.
  • Automate common operator actions to minimize the need for human interaction to resolve common incidents.
  • Develop and implement new monitoring strategies to ensure awareness of any customer impacting incidents.
  • Collaborate with Eng/QA/Operations to support the effort for zero-downtime deployments.
  • Work closely with our Software Developer and QA teams as part of the SDLC to provide bug feedback into future releases.
  • Work closely with our Software Developer teams to generate self service tools to assist in debugging/fixing broken builds and provide proactive visibility into the build/CI service.
  • Work closely with the QA team to integrate/leverage, as needed, existing QA test suites as part of the build pipeline.
  • Integrate our deployment tools (and collaborate to enhance them) into the pipeline to auto build sandbox environments for Engineering, Operations and QA.
  • Review, analyze, and recommend solutions and tools to improve the overall software development process.

Qualifications:

  • BS or MS in Computer Science or equivalent degree.
  • 1 year of industry experience as a Site Reliability Engineer or similar hands on experience automating and managing all aspects of a large scale production web service.
  • Excellent communication and interpersonal skills.
  • Excellent problem solving and debugging skills.
  • UNIX or Linux system administration background.

Experience With:

  • Experience with configuration management systems (Chef,Ansible etc.).
  • Competent scripting experience: Ruby, Python, and Shell.
  • Experience with containerization technology: Docker, Kubernetes, Amazon ECS, AKS.
  • Experience with on prem and cloud based monitoring and visualization: Icinga, Splunk, Grafana, ThousandEyes, Pingdom, New Relic, Datadog.
  • Experience with tools/frameworks: git, Gitlab, Django, Nginx, and Postgresql.
  • Experience with cloud services platforms: GCP, Azure and AWS
  • Experience building Windows, Mac OSX, Linux, and iOS packages.
  • Some background with virtualization technologies: VirtualBox, VMWare, or Citrix.

Salary Range

  • $100,000 - $130,000 plus bonus & equity