Benchmark #1 - Rails Blog

    • One of the most simple web developer task to date, very well documented online
    • Following the first steps of the GoRails tutorial
    • Learn more about the testing methodology in the Methodology section
    Prompts

    As described in the "issues" section on github

    See on GitHub
    Runs

    All runs are visible as pull requests.

    See on GitHub
    Leaderboard Overall score Pass Approx. Cost Site

    Amazon Q Pro IntelliJ plugin

    72.5% $38
    $2.24 / solved

    SWE-Agent v0.6.0 (docker)(CLI)

    25% $20
    $3.33 / solved

    Aider v0.37.0 GPT-4o

    Non-functional

    ByteDance MarsCode Agent (In IDE)

    No file insertion feature

    OpenDevin v0.6.2 GPT-4o

    Ruby not supported

    OpenCSG StarShip

    Access requested

    Amazon Q Pro
    IntelliJ plugin

    Step #1 Step #2 Step #3 Step #4 Step #5 Step #6 Step #7 Step #8 Step #9 Step #9 Bonus
    Run 1
    Run 2
    Run 3

    SWE-agent
    v0.6.0 GPT-4o

    Step #1 Step #2 Step #3 Step #4 Step #5 Step #6 Step #7 Step #8 Step #9 Step #9 Bonus
    Run 1
    Run 2
    Run 3