🤖

AI/ML Engineer

Build the core intelligence that gives Spirit companions their personality and memory. Design and ship production-grade ML services, LLM pipelines, memory architecture, and safety layers used by real people daily.

Location

Remote (Washington D.C., DMV area preferred)

Job Type

Intern or Full-time

Experience

1+ years of solid AI/ML engineering experience

Education

Bachelor's degree or equivalent experience

About Spirit

Spirit builds emotionally intelligent AI therapists that feel supportive, trustworthy, and human. We combine advanced conversational LLMs, a privacy-first memory engine, expressive 3D/AR experiences, and ethical product design to help users reflect, heal, and grow. Our mission is to make mental and emotional wellness more accessible through technology that is both technically excellent and emotionally responsible.

About the Role

As an AI/ML Engineer at Spirit, you'll be central to building the systems that make our companions feel alive: the LLM pipelines, memory architecture, safety layers, fast inference systems, and model-training workflows. You'll design and ship production-grade ML services used by real people daily, owning the end-to-end lifecycle from data collection and training experiments to deployment, monitoring, and iteration. This role demands practical ML engineering, strong product sense, and careful attention to privacy and latency — because memory, speed, and trust are core to our product experience.

Responsibilities

  • Design and implement the memory architecture — build hierarchical memory systems (episodic, semantic, short-term, long-term) and retrieval logic so companions remember the right things at the right time (use a layered approach — short-term session cache (Redis) for immediate context, semantic memory (vector DB) for user facts and stories, and encrypted long-term stores with explicit user controls for archival or “memorialized” companions)
  • Build and maintain LLM inference pipelines — integrate multiple LLM providers and self-hosted models using a hybrid approach for routing, latency control, and cost optimization
  • Develop RAG and embedding search workflows — create efficient retrieval-augmented generation using embeddings, vector DBs, and smart caching to ground responses in user context (use cloud APIs for general conversational quality, route sensitive or high-frequency requests to local models via OLLAMA, and use caching + RAG to reduce overall calls and cost while keeping latency low)
  • Create model training & fine-tuning pipelines — prepare labeled datasets, run fine-tuning/LoRA, support continual learning experiments, and track model metrics
  • Implement safety, privacy, and consent mechanics — build moderation filters, data access controls, encryption-at-rest/in-transit, and consent-driven memory UI hooks
  • Optimize for latency & cost — design batching, quantization, and local inference fallbacks to deliver sub-second interactions where possible
  • Monitor, evaluate, and iterate — implement observability, A/B tests, and feedback loops with tooling to detect drift, regressions, and user-impacting issues

Tech Stack

LLMs & Inference

  • OpenAI / Anthropic / Google Gemini APIs for rapid iteration
  • Ollama for on-prem/local inference
  • Llama 2/3, Mistral, or other open models for privacy-sensitive workloads

Model Serving & Orchestration

  • Triton / TorchServe / gRPC endpoints for scalable inference
  • Docker + Kubernetes or AWS ECS for multi-model routing

Vector & Memory Stores

  • Pinecone, Milvus, Weaviate, or FAISS for embeddings
  • Redis for short-term session caching
  • Supabase for semantic memory

Training & Experimentation

  • PyTorch, Hugging Face Transformers
  • LoRA adapters for parameter-efficient tuning
  • Weights & Biases for experiment tracking

Backend & APIs

  • Python (FastAPI) or Node.js (NestJS) for model orchestration
  • GraphQL/REST for client access
  • WebSocket for streaming responses

Cloud Infra & Storage

  • AWS (S3, RDS, DynamoDB, ECS/Fargate) or GCP equivalents
  • Terraform for IaC
  • GitHub Actions for CI/CD

Monitoring & Safety Tooling

  • Prometheus / Grafana / OpenTelemetry for observability
  • Policy & moderation frameworks for content filtering and safety checks

Requirements

  • Solid ML engineering experience — practical background in deploying and maintaining machine learning models in production (not just research)
  • Hands-on LLM experience — familiarity with model APIs (OpenAI, Anthropic, etc.) and at least one open-source LLM workflow (Hugging Face or local inference)
  • Experience with embeddings and RAG — you know how to generate embeddings, store & search vectors, and combine retrieval with generation effectively
  • Strong backend skills — experience building reliable API services (Python/Node), WebSockets, and asynchronous systems
  • Performance & optimization skills — understanding quantization, batching, caching, and strategies to drive down latency and inference cost
  • Security & privacy mindset — experience handling PII/GDPR considerations, encryption, consent flows, and secure data lifecycle
  • Collaboration & product sense — you can translate product needs into technical designs, collaborate with designers, and prioritize user-facing impact

Nice to Have

  • Experience with Ollama or similar local inference managers — building local models to reduce latency and increase privacy
  • Prior work on memory systems — building episodic/semantic memory in conversational agents or recommender systems
  • Familiarity with audio & multimodal pipelines — speech-to-text, text-to-speech, and multimodal grounding (vision + text)
  • GPU orchestration or Triton experience — deploying models on multi-GPU nodes with inference optimization
  • Background in safety/moderation — hands-on experience implementing content filters, classifier-based safety checks, or human-in-the-loop flows

Why Join Spirit?

  • You'll build the core intelligence that gives Spirit companions their personality and memory — the most impactful part of the product — and you'll ship directly into user experiences rather than just internal experiments
  • This is a unique opportunity to work at the intersection of LLMs, privacy-preserving engineering, and emotionally resonant product design
  • You'll learn advanced model deployment practices (local inference vs API tradeoffs), design responsible memory systems, and shape safety-first defaults used by real people
  • If you care about product impact, ethical AI, and building systems that are fast, private, and meaningful, you'll have a huge scope to contribute and grow at Spirit
  • You'll be part of a small, multidisciplinary team shipping fast, get direct feedback from real users, and influence product decisions from day one

Compensation Note

We're an angel-backed startup looking to raise seed funding and hence we're mostly bootstrapping initially. Initially unpaid on onset for all members of the founding team but will be highly rewarded and compensated for their time and effort which includes:

  • Reimbursement compensation post-launch
  • Full-time employment options post-launch
  • Competitive salary
  • Equity benefits and stock options will be considered based on performance
  • Will be qualified for additional leadership roles within the company

Ready to Join Spirit?

Click below to fill out our application form