Senior AI Engineer

Engineering · Full-time · CA, United States

Job description

Observe is a SaaS Observability product that enables businesses to investigate modern distributed applications 10x faster. Observe ingests anything with a timestamp (e.g. system and application logs, metrics, and traces), and then structures that data so that it is correlated and easy to navigate. We enable engineers to spend more time coding features and less time investigating incidents. Finally, because of Observe’s unique elastic architecture, it is priced based on usage, making it cost 10x less than incumbents.

Traditional approaches to this problem have resulted in fragmented tooling and fragile dashboards which, in turn, have resulted in exploding costs and complexity. At Observe, we believe that the core challenge lies in organizing and relating telemetry data emitted by those applications, despite the fact it is constantly changing. Solving this data problem makes observability an order of magnitude easier, faster, and less expensive. At Observe, we didn’t set out to build another monitoring tool company, we set out to build a data company. Observe is founded by top enterprise VC Sutter Hill Ventures and has a founding team from leading Enterprise SaaS companies Snowflake, Splunk, and Wavefront.

To learn more about Observe, visit www.observeinc.com or join the conversation on Twitter @Observe_Inc.

Team Overview The AI team is a small team within Observe that is responsible for leveraging generative AI to help our customers understand their incidents faster. In order to make sense of huge quantities of unstructured data by extracting, summarizing, and retrieving information that is relevant to engineers investigating incidents, we have incorporated recent advances in generative AI into our product.

These use cases include:

  1. A chatbot with hallucination guardrails that allows users to easily, and reliably, ask questions about our product, powered by an in-house RAG solution searching over our documentation.
  2. An AI-powered extraction tool that makes it easy to automatically add structure to unstructured machine data with the click of a button.
  3. Finetuning 7B+ LLMs to power our OPAL copilot (OPAL is our SQL-like temporal query language)
  4. Automatic and efficient incident summarization.

We value collaboration, willingness to learn, and the ability to solve immediate problems quickly while building towards a long-term vision.

The Role:

  • Design, improve, and maintain AI features at Observe.
  • Finetune and serve LLMs in production.
  • Build AI agentic workflows to improve the observability of users’ systems.

Requirements:

  • 3+ years of experience working in generative AI
  • Master’s in Computer Science or related field
  • Experience with RAG
  • Experience evaluating LLMs and RAG systems
  • Experience with Huggingface Transformers
  • Worked on a model that’s been deployed in production
  • Python expertise
  • Great written and spoken English communication skills about technical topics
  • Knowledge of the Python data science stack: (Scikit-Learn, Numpy, Pandas)

Preferred:

  • Experience building AI agents
  • Experience pre-training and/or fine-tuning LLMs at the 1b+ scale
  • Experience building scalable applications with LLMs, using frameworks such as LangChain, LlamaIndex, etc.
  • Experience working with adapters, e.g. LoRA
  • Experience with Pytorch
  • Experience with FastAPI or Flask
  • Experience with LLM-powered synthetic data labelling and evaluation
  • Experience in the observability domain
  • Experience with data processing tools like Apache Spark, Huggingface Datasets, or similar
  • Experience with Go

Feel free to apply even if you don’t meet 100% of the requirements!