Platform Services Engineer Principal

Engineering · HI, United States

Job description

Requisition #: 395 Job Title: Platform Services Engineer Principal Location: 613th AOCHickam AFB, Hawaii96853 Clearance Level: Active DoD - Secret Required Certification(s):  ·         Security Plus or above SUMMARY Maintain and Operate Current and Future platform and platform related services utilizing AOC locations and or DoD Data Centers Outside the Continental United States (OCONUS) as well as within the Continental United States (CONUS) to provide 24/7/365 support for classified workloads.

The Platform Services Engineer shall have senior software engineering experience building and operating hybrid-infrastructure developer platforms (private and commercial cloud infrastructure) at an individual contributor level. The most successful candidate has prior experience building automation for deploying and managing Cloud Foundry or Kubernetes platforms and services such as databases, message queues and authentication providers at enterprise scale. The engineer shall have experience developing pipelines that make common software operations scalable and repeatable. The engineer will feel comfortable responding to and troubleshooting high impact software outages and failures. The engineer will be at least a mid to senior level engineer capable of mentoring junior developers and engineers. The engineer will also possess effective communication skills to interact with various stakeholders internal and external to the organization. The engineer shall be able and willing to regularly work a 2nd or 3rd shift in support of 24/7 operations.

JOB DUTIES AND RESPONSIBILITIES Maintain and Operate Current and Future Platform.  Maintain and Operate Current and Future platform and platform related services utilizing AOC locations and or DoD Data Centers Outside the Continental United States (OCONUS) as well as within the Continental United States (CONUS) to provide 24/7/365 support for classified workloads.  Ability to quickly move personnel to OCONUS and CONUS locations in order to offer continuous support through multiple access points and to cross-train the workforce as required. Utilize containerization solutions such as PCF or Kubernetes, and employ CI/CD tools such as GitLab, Concourse and Vault to host and deploy applications and services. Configure the platform for logging, monitoring and alerting to manage the performance and health of applications and the platform itself. Inform and observe Service Level Agreements and enforce Error Budgets on production systems. Maximize the use of automation in platform support services to the greatest extent possible to support rapid application deployment and management. Facilitate DevSecOps by developing self-service capabilities for platform tenants to operate and interact with applications and services.  Ability to develop and implement policy and technical controls to support a multi-tenant architecture including services such as authentication, databases, database backups, and logs. Manage Outages and Incident Response Contribute to the existing incident management process through the use of playbooks, failover automation and rigorous post mortem investigations. Maintain and improve existing incident detection and reporting automation.  Support Site Reliability Engineers, infrastructure personnel, application developers, and other staff in troubleshooting issues. Manage Transition from Current Platform to Modernized Platform. Manage Transition from Current platform to modernized platform with minimal disruption to platform tenants. Contractor to provide Transition Plan to include supporting Labor Category (LCAT) requirements. Ensure functional service parity for all tenants through current and modernized platform transition and sustainment. Work collaboratively with product managers, designers and documentation leads to support division wide objectives and outcomes. Execute the engineering best practices of Extreme Programming in daily operations.  Communicate clearly through written and verbal mediums to ensure critical information is always shared across the enterprise, with stakeholders, and across overlapping shifts within the team supporting 24/7/365 continuous operations. Demonstrate the KR core values of Ideas Over Rank, Intense Customer Focus, Bias for Action, and Continuous Evolution in daily practice.

QUALIFICATIONS Required Certifications ·         Security Plus or above Education, Background, and Years of Experience ·         At least 2 years of experience mentoring agile teams as team leads or managing personnel. ·         At least 2 years facilitating extreme programming or lean product development. ·         At least 5 years of experience working with multiple Cloud or On Premise infrastructures ·         such as AWS, Azure, Google Cloud, VMware, Openshift, Linux and Windows. ·         At least 5 years deploying, configuring and administering PaaS solutions ·         such as Cloud Foundry, Tanzu Kubernetes Grid or Elastic Kubernetes ·         Service at a global scale. ·         At least 5 years of experience building, maintaining, securing, and/or ·         integrating classified systems and services for DoD Networks including but ·         not limited to SIPR/JWICS/Coalition. ·         Demonstrated experience with software incident response, security best ·         practices and DevOps execution on large scale projects spanning multiple ·         data centers or availability zones. ·         Demonstrated understanding of event sourced, microservice, and multi-tenant ·         architectures. ·         Five-plus (5+) years of experience developing and maintaining containerized services ·         deployed in production on orchestration platforms such as Cloud Foundry or Kubernetes. ·         Five-plus (5+) years of experience responding to and troubleshooting high impact ·         software outages and failures. ·         Five-plus (5+) years of experience building and supporting production platform services ·         using Cloud Infrastructure Providers or Local Infrastructure such as Vsphere. ·         Five-plus (5+) years of experience in Identity and Access Lifecycle Management ·         Operations and Controls. ·         Five-plus (5+) years of experience working in Operations, DevOps, or Site Reliability ·         Engineering. ·         Resume demonstrates deep understanding of modern microservices architectures, cloud ·         native design patterns, resiliency techniques, and delivery optimizations.

ADDITIONAL SKILLS & QUALIFICATIONS Required Skills ·         The Platform Services Engineer shall have senior software engineering experience building and ·         operating hybrid-infrastructure developer platforms (private and commercial cloud infrastructure) ·         at an individual contributor level. The most successful candidate has prior experience building ·         automation for deploying and managing Cloud Foundry or Kubernetes platforms and services ·         such as databases, message queues and authentication providers at enterprise scale. The engineer ·         shall have experience developing pipelines that make common software operations scalable and ·         repeatable. The engineer will feel comfortable responding to and troubleshooting high impact ·         software outages and failures. The engineer will be at least a mid to senior level engineer capable ·         of mentoring junior developers and engineers. The engineer will also possess effective ·         communication skills to interact with various stakeholders internal and external to the ·         organization. The engineer shall be able and willing to regularly work a 2nd or 3rd shift in support ·         of 24/7 operations. Preferred Skills ·           

WORKING CONDITIONS Environmental Conditions  Contractor site with 0%-10% travel possible. Possible off-hours work to support releases and outages. General office environment. Work is generally sedentary in nature, but may require standing and walking for up to 10% of the time. The working environment is generally favorable. Lighting and temperature are adequate, and there are not hazardous or unpleasant conditions caused by noise, dust, etc. Work is generally performed within an office environment, with standard office equipment available. Strength Demands Sedentary – 10 lbs. Maximum lifting, occasional lift/carry of small articles.  Some occasional walking or standing may be required.   Jobs are sedentary if walking and standing are required only occasionally, and all other sedentary criteria are met. Physical Requirements Stand or Sit; Walk; Repetitive Motion; Use Hands / Fingers to Handle or Feel; Stoop, Kneel, Crouch, or Crawl; See; Push or Pull; Climb (stairs, ladders) or Balance (ascend / descend, work atop, traverse).