Location
remote
Job Type
Full-time
Posted
June 09, 2026
Job Description
ROLE & RESPONSIBILITIES: Define and build agentic system architectures that leverage Amazon Bedrock Agent Core and agent frameworks to enable multi-step reasoning and automated workflows. Lead technical strategy for model selection, fine‑tuning, and inference, advising on cost vs. performance tradeoffs. Design and implement containerized deployment standards using Docker and Kubernetes to ensure consistent, scalable, and fault‑tolerant ML operations. Architect secure, low‑latency networking for model‑to‑service and service‑to‑service communication across private and public networks. Perform systems‑level performance engineering: select appropriate compute accelerators, run load and stress tests, and conduct capacity planning for production readiness. Establish and operate MLOps and Gen AI Ops practices, including CI/CD pipelines, model versioning, and deployment automation. Implement observability, logging, monitoring, and incident response for production AI systems to ensure operation...