Ai

To improve transparency, OpenAI has established a Safety Evaluation Center that regularly publishes the results of evaluations of AI models.

author

xudeyong
release date

2025-05-15
Font size
点击

On May 15, OpenAI announced that it will publish the security evaluation results of its in-house artificial intelligence models more frequently to increase transparency. The company officially launched the "Security Evaluation Center" webpage on Wednesday, aiming to showcase the test performance of its models in areas such as harmful content generation, model jailbreaking, and hallucination phenomena.

OpenAI said it plans to utilize the Safety Evaluation Center to continuously publish model-related metrics and timely update the contents of the webpage when there are major model updates in the future. In a blog post, OpenAI wrote: "As the science of AI evaluation continues to advance, we are committed to sharing our progress in developing more scalable model capabilities and safety evaluation methods." The company also emphasized that by publishing some of the safety evaluation results here, it not only wants to give users a clear understanding of the changes in the safety performance of OpenAI systems over time, but also to support the industry's joint efforts to improve transparency. In addition, OpenAI said it may add more evaluation projects to the center in the future.

OpenAI has previously been criticized by some ethicists for rushing through the safety testing process for some of its flagship models too quickly and not publishing technical reports for others. The company's CEO, Sam Altman, has also been the subject of controversy for allegedly misleading company executives about model safety reviews before being temporarily removed in November 2023.

Late last month, OpenAI had to revert an update to GPT-4o, ChatGPT's default model, after users reported that the model was responding in an overly "flattering" manner and endorsing problematic and even dangerous decisions and ideas. In response to the incident, OpenAI said it would make a series of repairs and improvements to prevent a similar incident from happening again. This includes introducing an optional "alpha phase" for some models, allowing a subset of ChatGPT users to test the models and provide feedback before they are officially released.

share to :