SWE-bench/SWE-bench
Benchmark for evaluating LLMs on real-world GitHub issues

View on index · View in 3D Map
// SURVEILLANCE FEED
Discovered repositories from the open source frontier
Benchmark for evaluating LLMs on real-world GitHub issues

View on index · View in 3D Map