Location
singapore
Job Type
Full-time
Posted
June 17, 2026
Job Description
AI Site Reliability Engineer | Singapore | 1-Year Contract
Our client, a leading
consulting services
organisation, is hiring an AI SRE to own the reliability of their AI platform — from the ground up.
The Role You'll embed SLOs, observability, deployment safety, and incident response into AI platform services as they're built. Own the enterprise AI gateway (LLM + MCP), set reliability standards across all AI products, and partner with platform engineering and security to ensure everything ships production-ready.
What We're Looking For 3–8 years in SRE or software engineering Deep SRE expertise — SLOs, error budgets, chaos engineering, incident management Experience owning a critical gateway or high-throughput API at ≥99.9% availability Hands-on with AI/ML in production — LLM workloads, agent loops, provider outages AWS/Kubernetes, Terraform/CDK, CI/CD pipelines Observability tools — Datadog, Grafana, OpenTelemetry or equivalent Python proficiency
Our client, a leading
consulting services
organisation, is hiring an AI SRE to own the reliability of their AI platform — from the ground up.
The Role You'll embed SLOs, observability, deployment safety, and incident response into AI platform services as they're built. Own the enterprise AI gateway (LLM + MCP), set reliability standards across all AI products, and partner with platform engineering and security to ensure everything ships production-ready.
What We're Looking For 3–8 years in SRE or software engineering Deep SRE expertise — SLOs, error budgets, chaos engineering, incident management Experience owning a critical gateway or high-throughput API at ≥99.9% availability Hands-on with AI/ML in production — LLM workloads, agent loops, provider outages AWS/Kubernetes, Terraform/CDK, CI/CD pipelines Observability tools — Datadog, Grafana, OpenTelemetry or equivalent Python proficiency
Ready to Apply?
Submit your application for AI Site Reliability Engineer- Contract at Argyll Scott
Apply Now