Location
Remote
Job Type
Full-time
Posted
June 08, 2026
Job Description
Responsibilities
- Deploy, monitor, and recover containerized AI training environments.
- Troubleshoot infrastructure bottlenecks and resolve system failures in real time.
- Build and manage resilient systems for stability and performance optimization.
- Collaborate with engineering teams to improve CI/CD pipelines and automation.
- Manage filesystem structures, storage, and process scheduling in containerized environments.
- Execute dynamic replanning during runtime issues and system failures.
- Document system processes, solutions, and best practices.
Requirements
- Strong experience with terminal-based system administration and troubleshooting.
- Expertise in containerized environments such as Docker or Kubernetes.
- Strong Python skills for scripting, automation, and debugging.
- Proficiency in Bash and familiarity with additional programming languages.
Ready to Apply?
Submit your application for Site Reliability Engineer | $70/hr Remote at Crossing Hurdles
Apply Now