Princeton University – SAgE Research Group
The Science of Agent Evaluation (SAgE) group at Princeton studies the systematic evaluation of AI agents. Our work includes benchmark development, building open-source infrastructure for agent evaluations, and research on the impact of AI on science. Our recent projects include the Holistic Agent Leaderboard (HAL), CORE-bench, and research on the limits of inference scaling.
As an IC Research Software Engineer, you will take a leading role in maintaining the Holistic Agent Leaderboard (HAL), including its backend infrastructure, evaluation harness, and public leaderboards. This position also offers the opportunity to work closely with our group on other ongoing projects aiming to shape emerging evaluations for AI systems.
Time Commitment – We estimate the work will take around 20 hours a week, but you’re free to manage your own schedule and workload.
The contractor will be paid $100 per hour, which will be approximately $8,000 per month based on 20 hours per week or 80 hours per month. If fewer than 20 hours are worked in any given week, payment for that month will be prorated accordingly.
Required: Strong programming skills (Python and web development)
Desired:
Email your resume, GitHub, and a brief statement of interest to sayashk AT princeton DOT edu. Please include [HAL application] in the subject line. Applications will be reviewed on a rolling basis, and we will close the search as soon as we find a suitable candidate.