An evaluation conducted in April of Anthropic's Claude Mythos Preview indicated that it significantly improved cyber performance, completing a corporate network attack simulation that would typically take a human approximately 20 hours. This evaluation raised questions about whether such advancements were unique to one model or indicative of a broader trend. Results from an early checkpoint of OpenAI's GPT-5.5 suggest the latter, demonstrating comparable performance in cybersecurity tasks. The evaluation used a suite of 95 cyber tasks designed to assess various skills, including vulnerability research and exploitation. GPT-5.5 achieved an average pass rate of 71.4% on advanced tasks, outperforming previous models, with notable strengths in reverse engineering and exploit development.
Evaluation Reveals GPT-5.5's Advanced Cybersecurity Capabilities
More Articles From This Day
OpenAI Unveils Scalable Low-Latency Voice AI Powered by Rebuilt WebRTC Stack
OpenAI has announced the reconstruction of its WebRTC stack, enabling the delivery of low-latency voice AI solutions at a global scale. This advancement facilitates seamless conversational turn-taking, enhancing the user experience in real-time voice interactions. The initiative aims to improve the efficiency and responsiveness of voice AI applications across various platforms.
