Language Models Track Internal Value and Confidence in Goal Achievement

This research investigates how language models, specifically Qwen3-8B, internally assess the value of their current trajectory, which is defined as the likelihood of achieving their goals. Utilizing synthetic in-context reinforcement learning data, the study constructs a 'value' axis that differentiates between various performance indicators such as verbalized confidence and the outcomes of rollouts with and without backtracking. Findings reveal that steering towards high-value actions can suppress self-correction and verbosity, while low-value steering encourages exploration. The study demonstrates that direct preference optimization increases the internal value associated with rewarded behaviors, enhancing the model's confidence. Additionally, the research applies the value axis in real-world settings, showing that Qwen assigns low value to politically sensitive queries post-training, and highlights the role of supervised fine-tuning in boosting internal confidence within the training domain.

Read Full Article

View All For This Day

Generative AIAI Models+2

Generative AIAI ModelsCollaborationAnthropic

US and Europe Explore AI Model Access Following Anthropic Dispute

The United States and Europe are in discussions regarding a 'trusted partner' scheme aimed at granting US allies the opportunity to test advanced artificial intelligence models. This initiative comes in the wake of a dispute involving Anthropic, highlighting the strategic importance of collaboration between the two regions in the AI sector. The partnership is designed to enhance access to cutting-edge AI technologies for allied nations.

Generative AIAI Safety+2

Generative AIAI SafetyNLPModel Evaluation

OpenAI Unveils Deployment Simulation to Enhance AI Model Safety and Evaluation

OpenAI has launched Deployment Simulation, a novel approach designed to predict the behavior of AI models prior to their release. This method utilizes real conversation data to enhance the safety and accuracy of model evaluations, allowing developers to make informed decisions before deploying AI technologies in real-world scenarios.

OpenAIRead →

Data WarehousingSoftware+2

Data WarehousingSoftwareDatabricksSnowflake

Databricks Reports 100% Growth in Data Warehousing Business, Now at $1.5 Billion Annual Run Rate

Databricks Inc., a software company competing with Snowflake Inc. and Google's Alphabet Inc., announced that its data warehousing business has more than doubled in size over the past year, achieving a $1.5 billion annual run rate. This significant growth underscores Databricks' increasing presence in the data warehousing market as it rivals established players like Snowflake.

Bloomberg TechnologyRead →

Generative AIMistral+2

Generative AIMistralDisinformationOpen Source

Study Reveals Mistral's Vulnerability to Russian Disinformation in AI Models

Estonian researchers have found that open-source generative AI models, including those developed by Europe's AI champion Mistral, are less effective at filtering out false news compared to other models. The study highlights the challenges faced by these generative models in combating disinformation, raising concerns about their reliability in critical information dissemination contexts.

Financial Times TechRead →

Reinforcement LearningLLM+2

Reinforcement LearningLLMMultimodalFine-Tuning

Introducing ContextRL: A Reinforcement Learning Method for Enhanced Multimodal LLM Performance

Large language models (LLMs) often struggle with identifying critical evidence within complex contexts. Researchers propose ContextRL, a context-aware reinforcement learning method designed to enhance long-horizon reasoning and multimodal performance. This approach rewards the model for selecting the most relevant context that supports a given query-answer pair, thereby promoting fine-grained grounding. The study constructs contrastive context data for coding agents and multimodal reasoning, achieving average performance gains of +2.2% over standard GRPO on long-horizon benchmarks and +1.8% on visual question answering tasks. The results indicate that improvements stem from the context-selection objective rather than merely from additional data.

arXiv AIRead →

Startup FundingGenerative AI+2

Startup FundingGenerative AIAI LandscapeM&A

SpaceX Surpasses Amazon in Value Following $60 Billion Cursor Acquisition

SpaceX's valuation has surpassed that of Amazon as the company successfully completes its $60 billion acquisition of Cursor, according to Bloomberg’s Ed Ludlow. In related news, Anthropic is in discussions with US officials to address a national security issue concerning its advanced AI models amid ongoing scrutiny from the Trump administration. Additionally, Sequoia Partner Shaun Maguire shares insights into the AI landscape and expresses a commitment to retain his SpaceX shares.