Advanced Agent Evaluation Runner

Instructions:

  1. Log in to your Hugging Face account using the button below.
  2. Click 'Run Evaluation & Submit All Answers' to:
    • Fetch all questions.
    • Run the full agent on every question (This will take a long time!).
    • Save answers to answers.json.
    • Submit all answers and get your score.
  3. Click 'Submit from answers.json (no re-run)' to:
    • Load answers from the answers.json file (if it exists).
    • Submit those answers without re-running the agent. This is much faster.

Questions and Agent Answers