Location
san francisco
Job Type
Full-time
Posted
June 03, 2026
Job Description
2 days ago Be among the first 25 applicants
We are developing high-quality training and evaluation datasets to improve how Large Language Models (LLMs) perform on real software engineering problems. The core of this project involves identifying and curating verifiable coding tasks from public GitHub repositories, supported by a human-in-the-loop review process.
As a contractor on this project, you will review code written by AI to solve real software tasks. Your feedback will help improve how future AI models learn to write and understand code.
Key Responsibilities
- Review and compare 3–4 model-generated code responses for each task using a structured ranking framework
- Assess code changes (diffs) for correctness, quality, readability, and performance
- Provide clear, concise explanations for your ranking decisions
- Maintain consistency and fairness across all evaluations
- Identify and document...