FOD#50: The Rise of Self-Evolving Language Models

the most complete and structured overview of the week (we read over 170 newsletters so you don’t have to)

Ksenia Se
6 min readMay 1, 2024

Large language models (LLMs) have made astonishing advancements, but their evolution has traditionally relied heavily on external datasets and human guidance. A fascinating shift is underway: the emergence of self-evolving LLMs. This groundbreaking concept is the focus of significant research efforts aimed at pushing LLMs toward a new level of autonomy and intelligence.

Researchers from Peking University, Alibaba Group, and Nanyang Technological University have proposed a comprehensive framework for understanding this evolution (A Survey on Self-Evolution of Large Language Models). The framework outlines a cyclical process consisting of experience acquisition, refinement, updating, and evaluation. At the core of this process is the ability of LLMs to learn from their own experiences and improve their capabilities — a mode of learning inspired by the way humans grow and develop knowledge and skills.

Techniques for Self-Improvement

Several innovative techniques are propelling this self-evolutionary trend, they all have been published just recently:

  • Imagination, Search, and Criticism: LLMs can enhance their reasoning processes by developing imaginative and critical thinking skills through targeted techniques (Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing: Proposes a method for LLMs to autonomously improve their reasoning through imaginative and critical thinking strategies).
  • Self-Play and Reinforcement Learning: Researchers have designed adversarial language games where LLMs play different roles to simulate challenging scenarios (Self-playing Adversarial Language Game Enhances LLM Reasoning). Through reinforcement learning based on game outcomes, LLMs can refine and advance their reasoning abilities, demonstrating significant improvements in various reasoning tasks.
  • Optimizing Inference and Decoding: The LayerSkip framework allows LLMs to perform computationally lighter inferences (LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding). It introduces early exits during the decoding process but maintains accuracy while reducing memory and computational requirements.
  • Reasoning about Code Execution: LLMs can be trained to understand and reason about program execution through the NExT method (NExT: Teaching Large Language Models to Reason about Code Execution). NExT uses self-training to create a synthetic dataset of execution-aware rationales that improve the reasoning capabilities of LLMs, demonstrated by a 26.1% absolute improvement in the program fix rate on Mbpp and 14.3% on HumanEval, even when traces are not available at test time.

Exploring LLM Values and Ethical Alignment

It’s also remarkable, that LLMs are beginning to develop their own value systems. The ValueLex framework, developed by the researchers from Tsinghua University and Microsoft Research Asia, aims to uncover these unique values of LLMs, distinct from human norms. By carefully analyzing LLMs, researchers have discovered value systems with dimensions like competence, character, and integrity. This line of research is crucial for understanding how model design influences value development and ultimately guides ethical considerations in AI development.

The Future of Self-Evolving Systems

The prospect of self-evolving LLMs is both exciting and filled with questions. As these models gain autonomy, their continued alignment with human goals and values will become crucial. Continuous research, interdisciplinary collaboration, and rigorous evaluation will be essential to unlocking the full potential of self-evolving LLMs and ensure their safe and beneficial integration into our world.

It’s also might be a good time to reread John Von Neumann’s Theory of Self-Reproducing Automata

Twitter Library

Last Week Models from the US

(Every week now brings new, powerful models. Last week was especially fruitful. Here is our list of models with additional reading recommendations.)

  • Phi-3 Mini — Developed by Microsoft

Phi-3 Mini, a 3.8 billion parameter model by Microsoft, matches the performance of larger models while being optimized for mobile devices. Trained on a highly curated mix of web and synthetic data, it supports advanced language processing locally on your phone →read the paper

Additional reading: Compare Llama-3 and Phi-3 using RAG (lightning.ai)

  • OpenELM — Developed by Apple

Apple’s OpenELM utilizes a novel layer-wise scaling strategy to efficiently allocate parameters within its architecture, reducing pre-training tokens by half and improving accuracy by 2.36% over similar models. The open-source framework facilitates transparent, reproducible research in natural language processing →read the paper

  • Snowflake Arctic — Developed by Snowflake AI Research

Snowflake Arctic is tailored for enterprise applications, utilizing a Dense-MoE Hybrid transformer architecture to dramatically cut costs and compute resources. It excels in tasks like SQL generation and coding, and is fully open-source, available on multiple platforms →read the paper

Additional reading: Snowflake’s Mission: Demolishing Data Limitations in the Era of Enterprise AI

  • Pegasus-1 — Developed by Twelve Labs

Pegasus-1 is a multimodal LLM designed for video understanding, interpreting spatiotemporal data to enhance comprehension across various video types. It excels in tasks like video conversation and summarization, offering insights into its architecture and capabilities →read the paper

Models from China:

  • SenseNova 5.0 — Developed by SenseTime

SenseNova 5.0, unveiled on April 24, 2024, in Shanghai, is a major update to SenseTime’s large model series. This iteration features enhancements in linguistic, creative, and scientific capabilities and introduces multimodal interactions with over 10TB of token data and supports a 200K context window, enhancing performance in knowledge, math, reasoning, and coding. But the main thing about SenseNova 5.0 is that it matches or exceeds the capabilities of models like GPT-4 Turbo across various benchmarks →more details

  • Tele-FLM — Developed by Beijing Academy of AI and Institute of AI of China Telecom Corp Ltd

Tele-FLM, a 52-billion parameter multilingual LLM, is optimized for factual judgment and low carbon footprint. It provides detailed insights into model design and training dynamics, achieving competitive performance →read the paper

  • InternVL 1.5 — Developed by Shanghai AI Laboratory

InternVL 1.5 aims to bridge the gap to commercial multimodal models, featuring a robust vision encoder and high-quality bilingual dataset. It shows competitive results in OCR and Chinese-related tasks, advancing the open-source sector →read the paper

Enjoyed This Story?

I write a weekly analysis of the AI world in the Turing Post newsletter. Subscribe for free and be the first to read the latest stories.

The goal of Turing Post is to equip you with comprehensive knowledge and historical insights, so you can make informed decisions about AI and ML. Join over 50,000 readers from the main AI labs, forward-thinking startups, and major universities at

Hugging Face’s FineWeb:

Meta: Meta’s executive were left out of the Artificial Intelligence Safety and Security Board

Cohere: Cohere has launched a toolkit designed to simplify AI application development across various platforms, emphasizing ease of use and customization.

Meta and Cohere (and a few other notable institutions) also participated in creating the PRISM dataset. It offers groundbreaking insights into how diverse global participants interact with large language models (LLMs). Developed by a collaboration of international researchers and institutions, PRISM links detailed survey responses with conversation transcripts to analyze and understand user demographics, preferences, and feedback on AI interactions. This dataset highlights the importance of personal and cultural diversity in shaping AI systems and user experiences, demonstrating the nuanced interplay between AI and its human users →read the paper and →check the dataset

OpenAI’s Memory Upgrade: OpenAI has introduced a memory feature for ChatGPT, allowing the AI to maintain context over conversations, potentially enriching user interaction and utility.

Last week, a few exciting research papers were published. We categorize them for your convenience 👇🏼

Thank you for reading! 🤍 Please send this article to your colleagues to help them enhance their understanding of AI and stay ahead of the curve.

--

--

Ksenia Se

I build Turing Post, equipping you with in-depth knowledge and analysis to make smarter decisions about AI & ML -> https://www.turingpost.com/subscribe