Location
singapore
Job Type
Full-time
Posted
June 08, 2026
Job Description
What The Role Entails
- System Monitoring & Incident Response
- a. Monitor production systems using tools like Prometheus/Grafana; identify and troubleshoot outages.
- b. Participate in on-call rotations to resolve real-time incidents (with mentor guidance).
- Automation & DevOps Practices
- a. Develop scripts (Python/Shell) to automate deployment, scaling, and recovery tasks.
- b. Assist in CI/CD pipeline optimization using GitLab, Docker, and Kubernetes.
- Infrastructure Optimization
- a. Analyze system performance metrics; propose solutions to enhance reliability and cost efficiency.
- b. Support cloud infrastructure management (Tencent Cloud/AWS/Azure).
- Collaboration & Documentation
- a. Work with cross-functional teams (Dev, Data, Security) to design SLOs/SLIs for critical services.
- b. Document system configurations, runbooks, an...