Location
kuala lumpur
Job Type
Full-time
Posted
June 02, 2026
Job Description
Preferred Location
- Taiwan
- Malaysia
Responsibilities
- Provide first and second‑line technical support to customers for AI Infrastructure, including GPU/CPU nodes, networking, storage, orchestration, and platform services. Support is delivered via ticketing systems, emails, Slack, or other messaging platforms.
- Support GPU cluster delivery, including system provisioning, image deployment, network validation, BIOS/firmware updates, and GPU driver/runtime installation.
- Monitor system health and service‑level indicators using alerts and dashboards; respond to alerts 24x7 as scheduled.
- Triage incidents by gathering context, verifying scope and impact, and following standard operating procedures and runbooks to perform immediate mitigations.
- Escalate incidents to global SRE engineers with clear, concise incident notes and relevant logs/traces.
- Maintain incident logs, update status p...