Home Nieuws Verder gaan dan het contextvenster: recursieve taalmodellen in actie

Verder gaan dan het contextvenster: recursieve taalmodellen in actie

Door

28 januari 2026

context is echt alles. De kwaliteit van de output van een LLM is nauw verbonden met de kwaliteit en hoeveelheid informatie die u verstrekt. In de praktijk komen veel gebruiksscenario’s uit de echte wereld met enorme contexten: het genereren van code over grote codebases, het bevragen van complexe kennissystemen, of zelfs lange, kronkelende chats terwijl we de perfecte vakantiebestemming onderzoeken (we zijn er allemaal geweest).

Helaas kunnen LLM’s alleen efficiënt werken met een beperkte hoeveelheid context. En dit gaat niet alleen over de harde grenzen van het contextvenster, vooral nu grensmodellen honderdduizenden of zelfs miljoenen tokens ondersteunen. En die grenzen worden steeds groter. De grotere uitdaging is een fenomeen dat bekend staat als context rotwaarbij de modelprestaties afnemen naarmate de contextlengte toeneemt.

Dit effect wordt duidelijk aangetoond in het artikel “RULER: What’s the Real Context Size of Your Long-Context Language Models?” door Hsieh et al. De auteurs introduceren RULER, een nieuwe maatstaf voor het evalueren van prestaties in een lange context, en testen een reeks modellen. De resultaten laten een consistent patroon zien: naarmate de context langer wordt, nemen de prestaties bij alle modellen aanzienlijk af. In veel gevallen bedraagt de effectieve contextlengte (waarbij de prestaties vergelijkbaar blijven met die van Lama 2–7B) slechts ongeveer 50% van het geadverteerde contextvenster, of zelfs minder.

Figuur uit het artikel Hsieh et al, 2024 | bron

In hun recente artikel “Recursieve taalmodellen”, Zhang et al. een veelbelovende aanpak voorstellen om het contextrotprobleem aan te pakken. In dit artikel wil ik dit idee nader bekijken en onderzoeken hoe het in de praktijk werkt, waarbij ik gebruik maak van de onlangs toegevoegde ondersteuning van DSPy voor deze gevolgtrekkingsstrategie.

Recursieve taalmodellen

Er zijn recursieve taalmodellen (RLM’s) geïntroduceerd om de achteruitgang van de prestaties aan te pakken naarmate de contextlengte toeneemt, en om LLM’s in staat te stellen met zeer grote contexten te werken (tot twee ordes van grootte buiten het oorspronkelijke contextvenster van het model). Dit wordt steeds belangrijker nu we LLM’s adopteren voor taken waarbij honderden miljoenen tokens moeten worden verwerkt.

Er zijn al verschillende benaderingen voor het uitvoeren van taken met een lange horizon. De meest voorkomende (iets dat je waarschijnlijk bent tegengekomen bij het gebruik van codeagents zoals Cursor) is samenvatting van de context. In deze opzet wordt de conversatie- of werkcontext herhaaldelijk samengevat zodra deze de contextlimiet van het model nadert. In de praktijk schiet dit echter vaak tekort: belangrijke details en subtiele nuances uit eerdere stappen gaan vaak verloren omdat het model oude informatie ‘vergeet’ om ruimte te maken voor nieuwe tokens.

Recursieve taalmodellen hanteren een andere, en verrassend elegante, benadering. In plaats van elke keer de volledige prompt door te geven aan de LLM, wordt de prompt behandeld als een reeks variabelen die beschikbaar zijn in een Python REPL-omgeving (Read–Eval–Print Loop). Het model is uitgerust met tools waarmee het deze variabelen kan inspecteren of zichzelf recursief kan aanroepen op programmatisch geselecteerde fragmenten.

Figuur uit het artikel Zhang et al, 2025 | bron

Met andere woorden, RLM’s moedigen de LLM aan om code te schrijven die zijn eigen subtaken construeert, en deze subtaken vervolgens op te lossen door zichzelf recursief aan te roepen. Dit verschuift de last van het samenpersen van alles in één enkele prompt naar het structureren van het probleem op een manier dat het model betrouwbaarder door grote contexten kan navigeren.

De auteurs evalueerden hun aanpak voor verschillende taken met een lange horizon, waarbij RLM’s sterke resultaten opleverden op zowel GPT-5- als Qwen-3-modellen.

De resultaten zien er veelbelovend uit. RLM’s (met of zonder subcalls) presteren consequent beter dan andere benaderingen bij alle taken. Laten we eens kijken hoe dit in de praktijk werkt.

RLM gebruiken in de praktijk

Een veelvoorkomend scenario waarin de lange context echt tot zijn recht komt, is het beantwoorden van vragen over een grote kennisbank. Dus dat is precies wat we hier gaan proberen. Ik heb alle artikelen die ik op Towards Data Science heb gepubliceerd, in één Markdown-bestand geladen. Enigszins verrassend komt dat neer op bijna 1,5 MB tekst. Hopelijk is dat genoeg voor het model om door te graven.

Gelukkig heeft DSPy onlangs een implementatie van de inferentiebenadering van Recursive Language Models geïntroduceerd. Dit betekent dat we zelf niets hoeven te bouwen; we hebben alleen DSPy versie 3.1.2 of nieuwer nodig.

pip install dspy --upgrade

Laten we beginnen met het laden van de dataset.

with open('articles.md', 'r') as f:
    articles = f.read()

Laten we vervolgens controleren hoeveel tokens het bevat.

import anthropic

client = anthropic.Anthropic(api_key=config('ANTHROPIC_API_KEY'))

# Count tokens for messages
token_count = client.messages.count_tokens(
  model="claude-sonnet-4-5",
  messages=(
    {"role": "user", "content": articles}
  )
)

print(f"Input tokens: {token_count.input_tokens}")
# Input tokens: 386768

Er zijn bijna 400.000 tokens in onze context. Claude Sonnet 4.5 ondersteunt een contextvenster van 200 KB, dus het verwerken van de volledige dataset in één enkele prompt is niet haalbaar. Dit is precies waar RLM nuttig wordt.

Om DSPy te gebruiken, moeten we eerst het taalmodel configureren. Zoals eerder besproken, vertrouwt RLM op code voor interactie met de prompt, dus werkt het het beste met modellen met sterke codeermogelijkheden. Om deze reden zal ik Claude in dit voorbeeld gebruiken, omdat bekend is dat het goed presteert bij codegerelateerde taken.

lm = dspy.LM('anthropic/claude-sonnet-4-5', api_key=config('ANTHROPIC_API_KEY'))
dspy.configure(lm=lm)

Vervolgens initialiseren we de RLM door de handtekening ervan op te geven. In dit gebruiksscenario geef ik de volledige set artikelen door, samen met een vraag, en verwacht ik dat het model de belangrijkste trends eruit haalt en deze als een lijst met tekenreeksen retourneert.

rlm = dspy.RLM('articles, question -> trends: list(str)')

Hier gebruik ik Claude Sonnet 4.5 voor zowel het hoofdmodel als de recursieve suboproepen. Met DSPy kunt u ook een kleiner model gebruiken voor subgesprekken via de sub_lm parameter, wat kan helpen de kosten te verlagen.

Nu kunnen we de RLM uitvoeren en toegang krijgen tot de trends veld in de uitvoer.

output = rlm(
  articles = articles, 
  question = '''What were the main AI trends of 2025 based on provided 
    articles? Pay attention to the content not only the titles.'''
)

print('n'.join(output.trends))

# Agentic AI and Autonomous Systems: Self-reflection patterns, iterative improvement through feedback loops (Self-Refine, Reflexion, CRITIC), and LLMs as reasoning engines that can autonomously plan and execute tasks
# Multi-Agent AI Systems: Evolution from single agents to collaborative teams of specialized agents with distinct roles, using frameworks like LangGraph, CrewAI, and AutoGen for orchestration
# Context Engineering and RAG: Moving beyond static prompting to dynamic retrieval-augmented generation (RAG), adaptive memory systems, and self-improving contexts that learn from experience without retraining
# Standardization Protocols: Emergence of Model Context Protocol (MCP) for standardizing LLM-tool integrations and Agent Communication Protocol (ACP) for inter-agent communication, reducing integration complexity
# Tool-Using LLMs and Function Calling: LLMs equipped with ability to invoke external tools, execute SQL queries, browse web, and interact with APIs through structured function calling mechanisms
# Production-Ready AI Frameworks: Mature ecosystem including LangGraph, DSPy, LangChain, NeMo Agent Toolkit, CrewAI, and AutoGen, focused on moving from prototype to production with built-in observability
# LLM Evaluation and Observability: LLM-as-judge evaluation patterns, comprehensive metrics frameworks (Ragas, DeepEval, Evidently), trajectory evaluation, and continuous monitoring as essential production infrastructure
# Programming Over Prompting: Shift toward declarative, code-based AI development with frameworks like DSPy and configuration-driven approaches (YAML-based) replacing prompt engineering
# Framework Interoperability: Tools designed to integrate across multiple frameworks rather than create silos, enabling composable AI architectures that leverage best features from different ecosystems
# Local and Cost-Effective LLM Deployment: Running smaller efficient models locally (Llama, Ollama) to reduce API costs and enable experimentation, with focus on cost-quality-latency tradeoffs
# SQL Agents and Data Analysis Automation: LLM agents specialized in data analysis tasks, generating and executing SQL queries, with applications in replacing or augmenting traditional data analyst workflows
# Production Quality and Accuracy Enhancement: Techniques for improving LLM accuracy including chain-of-thought reasoning, structured outputs through function calling, and iterative refinement for business-critical applications

De uitvoering duurde ongeveer drie minuten en het resultaat was een verrassend plausibele samenvatting van de hoofdthema’s in mijn artikelen. We zijn hier echter niet om meer te weten te komen over de trends zelf. De interessantere vraag is hoe RLM dit in de eerste plaats heeft kunnen doen. Dus laten we dieper graven.

RLM onder de motorkap

Het meest fascinerende is natuurlijk het begrijpen van wat er feitelijk onder de motorkap gebeurt.

RLM-implementatie

Zoals eerder besproken is het kernidee achter recursieve taalmodellen dat lange contexten worden behandeld als onderdeel van een externe omgeving, in plaats van rechtstreeks als een enkele prompt in het model te worden ingevoerd. In plaats daarvan schrijft de LLM Python-code om sub-LLM’s programmatisch te inspecteren, te ontleden en recursief aan te roepen over kleinere fragmenten van de gegevens.

Op hoog niveau heeft de implementatie een aantal kernkenmerken:

Het maakt gebruik van een Python REPL (Read-Eval-Print Loop) in een sandbox waarmee de LLM grote contexten kan verkennen door middel van code-uitvoering.
De LLM werkt in een bekende agentische lus: het schrijft Python-code, observeert de uitvoer en besluit vervolgens wat er vervolgens moet gebeuren.
Het kan recursieve subaanroepen uitvoeren (in feite zichzelf aanroepen) met behulp van tools zoals llm_query() En llm_query_batched() om kleinere delen semantisch te analyseren.
Zodra het model tevreden is met het resultaat, rondt het het proces af door aan te roepen SUBMIT() met de uitgang.

Prompts

Om echt te begrijpen hoe dit werkt, vind ik het nuttig om de daadwerkelijke berichten te inspecteren die naar en van de LLM worden verzonden. DSPy maakt dit eenvoudig met de volgende opdracht.

# Inspect the raw LLM calls - shows the actual prompts sent to the model
dspy.inspect_history(n=39)

Dit geeft ons volledig inzicht in wat er bij elke stap met het model is gedeeld.

Laten we beginnen met het systeembericht. Het definieert de invoervariabelen die beschikbaar zijn voor het model (zoals gespecificeerd in de RLM-handtekening) en somt de functies op die het model kan aanroepen, zoals printrecursieve LLM-aanroepen en standaard bibliotheekhulpprogramma’s.

Met name kunt u met de RLM-module ook aangepaste functies beschikbaar stellen aan de Python REPL via de tools parameter bij het initialiseren van de RLM. In dit voorbeeld vertrouwde ik alleen op de standaardfuncties, maar in meer geavanceerde instellingen kan dit een krachtig uitbreidingspunt zijn.

Your input fields are:
1. `variables_info` (str): Metadata about the variables available in the REPL
2. `repl_history` (REPLHistory): Previous REPL code executions and their outputs
3. `iteration` (str): Current iteration number (1-indexed) out of max_iterations
Your output fields are:
1. `reasoning` (str): Think step-by-step: what do you know? What remains? Plan your next action.
2. `code` (str): Python code to execute.
All interactions will be structured in the following way, with the appropriate values filled in.

(( ## variables_info ## ))
{variables_info}

(( ## repl_history ## ))
{repl_history}

(( ## iteration ## ))
{iteration}

(( ## reasoning ## ))
{reasoning}

(( ## code ## ))
{code}

(( ## completed ## ))
In adhering to this structure, your objective is: 
Given the fields `articles`, `question`, produce the fields `trends`.
        
You are tasked with producing the following outputs given the inputs `articles`, `question`:
- {trends}        # note: the value you produce must adhere to the JSON schema: {"type": "array", "items": {"type": "string"}}
        
You have access to a Python REPL environment. Write Python code and it will be executed. You will see the output, then write more code based on what you learned. This is an iterative process.

Available:
- Variables: `articles`, `question` (your input data)
- `llm_query(prompt)` - query a sub-LLM (~500K char capacity) for semantic analysis
- `llm_query_batched(prompts)` - query multiple prompts concurrently (much faster for multiple queries)
- `print()` - ALWAYS print to see results
- `SUBMIT(trends)` - submit final output when done
- Standard libraries: re, json, collections, math, etc.
        
IMPORTANT: This is ITERATIVE. Each code block you write will execute, you'll see the output, then you decide what to do next. Do NOT try to solve everything in one step.
        
1. EXPLORE FIRST - Look at your data before processing it. Print samples, check types/lengths, understand the structure.
2. ITERATE - Write small code snippets, observe outputs, then decide next steps. State persists between iterations.
3. VERIFY BEFORE SUBMITTING - If results seem wrong (zeros, empty, unexpected), reconsider your approach.
4. USE llm_query FOR SEMANTICS - String matching finds WHERE things are; llm_query understands WHAT things mean.
5. MINIMIZE RETYPING (INPUTS & OUTPUTS) - When values are long, precise, or error-prone (IDs, numbers, code, quotes), re-access them via variables and parse/compute in code instead of retyping. Use small, targeted prints to sanity-check, but avoid manual copying when variables can carry the exact value.
6. SUBMIT ONLY AFTER SEEING OUTPUTS - SUBMIT ends the current run immediately. If you need to inspect printed output, run it in one step, review the result, then call SUBMIT in a later step.
        
You have max 50 sub-LLM calls. When done, call SUBMIT() with your output.

Laten we ook eens kijken naar het eerste gebruikersbericht dat naar de LLM is verzonden. Dit geeft het model een overzicht van de invoervariabelen: hun typen, lengtes (zodat het model weet hoeveel het in één keer kan lezen) en een voorbeeld van de inhoud om de structuur te begrijpen.

(( ## variables_info ## ))
(1) «««
    Variable: `articles` (access it in your code)
    Type: str
    Total length: 1,392,921 characters
    Preview:
    ```
    ---
    title: "Agentic AI from First Principles: Reflection"
    date: "2025-10-24T12:00:00-05:00"
    category: "Agentic AI"
    read_time: "21 min read"
    url: "https://towardsdatascience.com/agentic-ai-from-first-principles-reflection/"
    ---
    
    Arthur C. Clarke's Third Law says that "*any sufficiently advanced technology is indistinguishable from magic*". That's exactly how a lot of today's AI frameworks feel. Tools like GitHub Copilot, Claude Desktop, OpenAI Operator, and Perplexity Comet are automating everyda...
    ```
»»»
(2) «««
    Variable: `question` (access it in your code)
    Type: str
    Total length: 114 characters
    Preview:
    ```
    What were the main AI trends of 2025 based on provided articles? Pay attention to the content not only the titles.
    ```
»»»
(3) «««
    Variable: `verbose` (access it in your code)
    Type: bool
    Total length: 4 characters
    Preview:
    ```
    True
    ```
»»»

(( ## repl_history ## ))
You have not interacted with the REPL environment yet.

(( ## iteration ## ))
1/20

Respond with the corresponding output fields, starting with the field `(( ## reasoning ## ))`, then `(( ## code ## ))`, and then ending with the marker for `(( ## completed ## ))`.

Hier is de eerste reactie van het model, in overeenstemming met het formaat dat is opgegeven in het gebruikersbericht.

(( ## reasoning ## ))
I need to analyze articles to identify main AI trends of 2025. Let me start by exploring the data:

1. First, I'll examine the structure and content of the `articles` variable to understand what I'm working with
2. I'll need to look at multiple articles, not just titles, to understand the actual content
3. Then I'll use semantic analysis (llm_query) to extract and identify trends
4. Finally, I'll compile the trends into a list

Let me start by exploring the articles data to see its structure and get a sense of what's available.

(( ## code ## ))
# First, let's understand the structure of the articles
print("Type of articles:", type(articles))
print("nLength of articles:", len(articles))
print("n" + "="*80)
print("First 2000 characters:")
print(articles(:2000))
print("n" + "="*80)

# Let's see if we can identify the structure - looks like articles are separated
# Let's try to split and count them
article_separator = "---ntitle:"
article_parts = articles.split(article_separator)
print(f"nNumber of article parts found: {len(article_parts)}")
print(f"First part length: {len(article_parts(0))} (likely empty or header)")
print(f"Second part length: {len(article_parts(1)) if len(article_parts) > 1 else 0}")

(( ## completed ## ))

De onbewerkte logboeken gaven ons een goed inzicht in hoe de LLM interageert met de omgeving onder de motorkap. Laten we vervolgens eens kijken naar het traject dat de LLM volgde om zijn uiteindelijke output te bereiken.

Traject

Voor een gestructureerde weergave van het traject kunnen wij inspecteren output.trajectory. Dit geeft toegang tot elke stap van de redenering van het model, de code die het genereert en de resulterende output. In wezen biedt het een uitgebreid overzicht van alle REPL-interacties.

# Examine the full trajectory - this shows all REPL interactions
for i, step in enumerate(output.trajectory):
  print(f"n{'='*60}")
  print(f"STEP {i+1}")
  print(f"{'='*60}")
  print(f"n📝 REASONING:n{step('reasoning')}")
  print(f"n💻 CODE:n{step('code')}")
  print(f"n📤 OUTPUT:n{step('output')(:1000)}{'...' if len(step('output')) > 1000 else ''}")

Je kunt het volledige logboek ook vinden op GitHub.

Door het traject te beoordelen, kunnen we de belangrijkste patronen begrijpen die de LLM gebruikt om het probleem op te lossen. In dit geval waren er 13 stappen nodig om tot het uiteindelijke antwoord te komen.

De eerste stap was, zoals verwacht, het verkennen van de articles variabele: de structuur ervan onderzoeken en een idee krijgen van de beschikbare gegevens. Deze eerste verkenning vormt de basis voor alle daaropvolgende redeneringen en suboproepen.

# First, let's understand the structure of the articles
print("Type of articles:", type(articles))
print("nLength of articles:", len(articles))
print("n" + "="*80)
print("First 2000 characters:")
print(articles(:2000))
print("n" + "="*80)

# Let's see if we can identify the structure - looks like articles are separated
# Let's try to split and count them
article_separator = "---ntitle:"
article_parts = articles.split(article_separator)
print(f"nNumber of article parts found: {len(article_parts)}")
print(f"First part length: {len(article_parts(0))} (likely empty or header)")
print(f"Second part length: {len(article_parts(1)) if len(article_parts) > 1 else 0}")

Vervolgens realiseerde de LLM zich dat er in totaal 40 artikelen waren, elk gescheiden door ---ntitle:en dat elk artikel metadatavelden bevatte zoals title, date, category, read_timeEn urlgevolgd door de hoofdinhoud. Alleen al het identificeren van deze structuur uit een tekstbestand van 1,5 MB is al een belangrijke prestatie.

Voortbouwend op dit inzicht analyseerde de LLM vervolgens de artikelen en genereerde een samenvatting van de artikeltitels en hoofdthema’s.

# Parse articles into structured format
import re

# Split by the article separator
raw_articles = articles.split("---ntitle:")

# Process each article (skip the first empty element)
parsed_articles = ()
for i, raw_article in enumerate(raw_articles(1:), 1):  # Skip first empty element
  # Add back the "title:" that was removed by split
  raw_article = "title:" + raw_article
  
  # Extract metadata using regex
  title_match = re.search(r'title:s*"((^")*)"', raw_article)
  date_match = re.search(r'date:s*"((^")*)"', raw_article)
  category_match = re.search(r'category:s*"((^")*)"', raw_article)
  
  # Find where content starts (after the second "---")
  content_start = raw_article.find("---", 10)  # Skip the first part
  content = raw_article(content_start+3:).strip() if content_start != -1 else ""
  
  parsed_articles.append({
    'index': i,
    'title': title_match.group(1) if title_match else "Unknown",
    'date': date_match.group(1) if date_match else "Unknown",
    'category': category_match.group(1) if category_match else "Unknown",
    'content': content,
    'content_length': len(content)
  })

print(f"Successfully parsed {len(parsed_articles)} articlesn")
print("="*80)
print("Sample of articles (titles and categories):")
print("="*80)
for i, article in enumerate(parsed_articles(:10), 1):
  print(f"{i}. ({article('category')}) {article('title')}")
  print(f"   Content length: {article('content_length')} chars")
  print()

Na het analyseren van de artikelen besloot de LLM zich uitsluitend te concentreren op AI-gerelateerde inhoud, omdat de vraag specifiek naar AI-trends ging.

# Let's focus on AI-related articles (AI, Agentic AI, LLM Applications, ChatGPT categories)
# and sample strategically to understand content-based trends

ai_related = (a for a in parsed_articles if a('category') in (
  'Artificial Intelligence', 'Agentic AI', 'LLM Applications', 'ChatGPT', 'Deep Learning'
))

Vervolgens identificeerde de LLM de belangrijkste artikelen waarop zij zich wilde concentreren. Het was een subjectieve selectie op basis van relevantie en inhoud. Om deze in detail te analyseren, werd gebruik gemaakt van recursieve suboproepen, waarbij elk artikel programmatisch werd verwerkt.

Om de werklast efficiënt te beheren, splitste de LLM de artikelen in twee batches en verwerkte ze opeenvolgend. Voor elk artikel werd de inhoud ook ingekort tot de eerste 15.000 tekens. Dankzij deze strategie kon het model een grote hoeveelheid tekst verwerken zonder het contextvenster te overschrijden, terwijl toch de belangrijkste inzichten uit elk artikel werden vastgelegd.

# Create prompts to extract AI trends from article content
# I'll include a substantial portion of each article's content for analysis

prompts_batch1 = ()

for i, article in enumerate(selected_for_analysis(:8)):  # First batch of 8
  # Take first ~15K chars of content to stay within LLM limits while getting substance
  content_sample = article('content')(:15000)
  
  prompt = f"""Analyze this article about AI from 2025 and identify the main AI trends, technologies, methodologies, or concepts it discusses.

Article Title: {article('title')}

Article Content (excerpt):
{content_sample}

Based on the content above, list the key AI trends, technologies, or concepts discussed in this article. Focus on:
- New frameworks, tools, or methodologies mentioned
- Emerging patterns in AI development (e.g., agentic AI, multi-agent systems)
- Important techniques or approaches
- Production/deployment considerations

Provide a concise list of 3-5 main trends/concepts from this article."""

  prompts_batch1.append(prompt)

print(f"Created {len(prompts_batch1)} prompts for first batch")
print("nProcessing first batch of articles...")

# Query the first batch
results_batch1 = llm_query_batched(prompts_batch1)

print(f"nReceived {len(results_batch1)} results")
print("nResults from first batch:")
print("="*80)

for i, (article, result) in enumerate(zip(selected_for_analysis(:8), results_batch1), 1):
  print(f"n{i}. {article('title')}")
  print("-"*80)
  print(result)
  print()

Ten slotte heeft de LLM de volledige lijst met artikelen nogmaals doorgenomen, aanvullende relevante onderwerpen geïdentificeerd en deze op dezelfde manier verwerkt met behulp van recursieve suboproepen. Nadat alle informatie was opgehaald, werden de resultaten samengevoegd in de trends_2025 variabele (een lijst met tekenreeksen die de belangrijkste AI-trends vertegenwoordigen) en heeft deze als eindantwoord ingediend.

##### STEP 12 #####

# Now let me synthesize all findings into comprehensive AI trends
# I'll review all the results and create a structured summary

print("SYNTHESIZING ALL FINDINGS FROM 20 ANALYZED ARTICLES")
print("="*80)
print("nBased on deep content analysis of 20 AI-focused articles from 2025,")
print("here are the main AI trends identified:n")

# Compile the comprehensive list of trends
trends_2025 = (
  "Agentic AI and Autonomous Systems: Self-reflection patterns, iterative improvement through feedback loops (Self-Refine, Reflexion, CRITIC), and LLMs as reasoning engines that can autonomously plan and execute tasks",
  "Multi-Agent AI Systems: Evolution from single agents to collaborative teams of specialized agents with distinct roles, using frameworks like LangGraph, CrewAI, and AutoGen for orchestration",
  "Context Engineering and RAG: Moving beyond static prompting to dynamic retrieval-augmented generation (RAG), adaptive memory systems, and self-improving contexts that learn from experience without retraining",
  "Standardization Protocols: Emergence of Model Context Protocol (MCP) for standardizing LLM-tool integrations and Agent Communication Protocol (ACP) for inter-agent communication, reducing integration complexity",
  "Tool-Using LLMs and Function Calling: LLMs equipped with ability to invoke external tools, execute SQL queries, browse web, and interact with APIs through structured function calling mechanisms",
  "Production-Ready AI Frameworks: Mature ecosystem including LangGraph, DSPy, LangChain, NeMo Agent Toolkit, CrewAI, and AutoGen, focused on moving from prototype to production with built-in observability",
  "LLM Evaluation and Observability: LLM-as-judge evaluation patterns, comprehensive metrics frameworks (Ragas, DeepEval, Evidently), trajectory evaluation, and continuous monitoring as essential production infrastructure",
  "Programming Over Prompting: Shift toward declarative, code-based AI development with frameworks like DSPy and configuration-driven approaches (YAML-based) replacing prompt engineering",
  "Framework Interoperability: Tools designed to integrate across multiple frameworks rather than create silos, enabling composable AI architectures that leverage best features from different ecosystems",
  "Local and Cost-Effective LLM Deployment: Running smaller efficient models locally (Llama, Ollama) to reduce API costs and enable experimentation, with focus on cost-quality-latency tradeoffs",
  "SQL Agents and Data Analysis Automation: LLM agents specialized in data analysis tasks, generating and executing SQL queries, with applications in replacing or augmenting traditional data analyst workflows",
  "Production Quality and Accuracy Enhancement: Techniques for improving LLM accuracy including chain-of-thought reasoning, structured outputs through function calling, and iterative refinement for business-critical applications"
)

print("MAIN AI TRENDS OF 2025:")
print("="*80)
for i, trend in enumerate(trends_2025, 1):
  print(f"n{i}. {trend}")

print("n" + "="*80)
print(f"nTotal trends identified: {len(trends_2025)}")
print("nThese trends are based on content analysis of articles covering:")
print("- Agentic AI systems and design patterns")
print("- Multi-agent architectures and frameworks")
print("- Production deployment and evaluation")
print("- Emerging protocols and standardization")
print("- Tool use and external system integration")
print("- Cost-effective and accurate LLM applications")

##### STEP 13 #####

# Submit the final list of AI trends from 2025
SUBMIT(trends_2025)

We kunnen zien dat het model behoorlijk creatief was: het begreep de structuur van onze inhoud, ontleedde de artikelen en gebruikte recursieve aanroepen om ze allemaal afzonderlijk te verwerken. Er was echter één belangrijke leemte in deze analyse. Het filteren op jaar werd volledig over het hoofd gezien, ook al bevatte de metadata van elk artikel duidelijk de publicatiedatum.

Om dit aan te pakken herhaalde ik het verzoek, waarbij ik deze keer het belang van filteren voor 2025 benadrukte, om ervoor te zorgen dat het model zich alleen op artikelen uit dat jaar concentreerde.

output2 = rlm(
  articles = articles, 
  question = '''What were the main AI trends of 2025 based on provided 
    articles? Pay attention to the content not only the titles. 
    Don't forget to filter by year, this information is present in 
    metadata for each article.'''
)

Deze keer hield het model rekening met het jaar en voerde het een geavanceerdere parsering uit, waarbij gebruik werd gemaakt van reguliere expressies om de publicatiedatum voor elk artikel te extraheren.

import re
from datetime import datetime

# Split articles by the separator "---" at the beginning of metadata blocks
# The pattern is: --- followed by metadata, followed by --- followed by content
article_blocks = re.split(r'n---n', articles)

print(f"Total article blocks found: {len(article_blocks)}")
print("n" + "="*80)

# Let's examine the structure - articles seem to start with "---ntitle:"
# Let me try a different approach - find all articles by their metadata pattern
article_pattern = r'---ntitle: "(.*?)"ndate: "(.*?)"ncategory: "(.*?)"nread_time: "(.*?)"nurl: "(.*?)"n---n(.*?)(?=n---ntitle:|$)'

matches = re.findall(article_pattern, articles, re.DOTALL)
print(f"Articles found with regex: {len(matches)}")

if len(matches) > 0:
  print("nFirst article metadata:")
  print(f"  Title: {matches(0)(0)}")
  print(f"  Date: {matches(0)(1)}")
  print(f"  Category: {matches(0)(2)}")
  print(f"  Content preview (first 200 chars): {matches(0)(5)(:200)}")

Vervolgens filterde het de artikelen op jaar en identificeerde het 13 artikelen uit 2025.

# Parse articles and filter for 2025
articles_2025 = ()

for match in matches:
  title, date_str, category, read_time, url, content = match
  # Extract year from date string (format: "2025-10-24T12:00:00-05:00")
  year = date_str.split('-')(0)
  
  if year == "2025":
    articles_2025.append({
      'title': title,
      'date': date_str,
      'category': category,
      'content': content.strip()
    })

print(f"Total articles: {len(matches)}")
print(f"Articles from 2025: {len(articles_2025)}")
print("n" + "="*80)
print("2025 Articles:")
print("="*80)

for i, article in enumerate(articles_2025, 1):
  print(f"n{i}. {article('title')}")
  print(f"   Date: {article('date')}")
  print(f"   Category: {article('category')}")
  print(f"   Content length: {len(article('content'))} chars")

Vervolgens analyseerde het deze 13 artikelen met behulp van recursieve suboproepen en produceerde een bijgewerkte lijst met trends. Deze keer werd de taak correct uitgevoerd. Net als bij veel andere voorbeelden benadrukt het het belang van het stellen van duidelijke, expliciete vragen en het specificeren van de criteria en acties die het model moet volgen.

Agentic AI and Multi-Agent Systems: Building autonomous AI agents capable of multi-step reasoning, tool use, planning, and reflection, often with multiple specialized agents collaborating on complex tasks
Code Agents: AI agents that execute tool calls using actual code (Python) instead of JSON-based tool calling, enabling dynamic function creation and achieving higher success rates with fewer steps
Model Context Protocol (MCP): Anthropic's standardization protocol for connecting AI applications to external tools and data sources, reducing integration complexity from M*N to M+N and enabling reusable, framework-agnostic components
Agent Communication Protocol (ACP): Emerging open protocol under the Linux Foundation for standardizing communication between AI agents via RESTful APIs, enabling interoperability across different frameworks
Reflection and Self-Refinement Patterns: LLMs reviewing and improving their own outputs through iterative feedback loops, including self-feedback, verbal reinforcement learning, and tool-interactive critiquing, achieving 10-30% accuracy improvements
Framework Ecosystem Proliferation: Multiple competing frameworks including LangGraph, smolagents, CrewAI, DSPy, and NeMo Agent Toolkit, with growing emphasis on interoperability and declarative configuration approaches
Production-Ready LLM Infrastructure: Moving beyond prototypes to address 'day 2' problems like API exposure, observability, monitoring, evaluation frameworks, and deployment at scale
Parameter-Efficient Fine-Tuning (PEFT) and LoRA: Techniques for customizing LLMs by updating only small subsets of parameters, enabling task-specific optimization while reducing computational costs and enabling on-premises deployment
Advanced Fine-Tuning with Memory Experts: Lamini's Mixture of Memory Experts (MoME) using ~1 million LoRA adapters for near-perfect factual accuracy (95%) with zero loss on specific information
Shift from Prompting to Programming Paradigm: Frameworks like DSPy treating LLM tasks as modular programming rather than manual prompt crafting, with structured signatures and reusable components
LLM Evaluation and Quality Assurance: Comprehensive evaluation frameworks (Evidently, DeepEval, MLFlow, LangSmith) for testing, monitoring, and ensuring reliability in production, especially for regulated industries
RAG (Retrieval-Augmented Generation): Providing relevant context and knowledge bases to enhance LLM precision and enable specialized capabilities in agents
Tool Use and Orchestration: AI systems dynamically selecting and executing tools from multiple sources, with standardized tool definitions and safety controls
YAML-Based Declarative Configuration: Defining LLM workflows, models, and agent behavior through configuration files rather than purely code-based implementations
Interpretable AI and Rule-Based Systems: Using decision trees, extracted rules, and transparent models as alternatives to black-box approaches for compliance, explainability, and rapid deployment in regulated industries

Samenvatting

Het is tijd om de zaken af te ronden en na te denken over wat we hebben geleerd. In dit artikel hebben we RLM (Recursive Language Models) onderzocht: een nieuwe inferentiestrategie waarmee LLM’s contexten kunnen verwerken die tot twee ordes van grootte groter zijn dan hun standaard contextvenster, terwijl het probleem van contextrot wordt verzacht.

Ik ben oprecht gefascineerd door deze aanpak. Het artikel stelt een eenvoudige maar elegante methode voor om aanwijzingen als variabelen in een Python-omgeving te behandelen, wat me doet denken aan mijn favoriete agentische raamwerk, smolagents van HuggingFace. Ik geloof dat deze methode vooral efficiënt is omdat LLM’s in zoveel code zijn getraind dat programmeren voor hen als een moedertaal aanvoelt. Het gebruik van code als interface voor redenering en recursie is zowel praktisch als krachtig.

Over het geheel genomen bieden recursieve taalmodellen een praktische en elegante manier om de grenzen van de contextlengte te verleggen, waardoor LLM’s beter in staat zijn om complexe, grootschalige taken uit te voeren. Duidelijke instructies en doordachte begeleiding zijn echter nog steeds de sleutel tot het verkrijgen van de beste resultaten.

Bedankt voor het lezen. Ik hoop dat dit artikel inzichtelijk was. Denk aan het advies van Einstein: “Het belangrijkste is om niet te stoppen met vragen stellen. Nieuwsgierigheid heeft zijn eigen reden van bestaan.” Moge uw nieuwsgierigheid u naar uw volgende grote inzicht leiden.

Referentie

Dit artikel is gebaseerd op het artikel van Zhang et al., “Recursive Language Models”, gepubliceerd op 31 december 2025.

Nieuwsbron

Verder gaan dan het contextvenster: recursieve taalmodellen in actie

Recursieve taalmodellen

RLM gebruiken in de praktijk

RLM onder de motorkap

RLM-implementatie

Prompts

Traject

Samenvatting

Referentie

LAAT EEN REACTIE ACHTER Annuleer reactie

EDITOR PICKS

Verontrustende thriller uit de jaren 70 met een R-rating ziet het kwaad voordat het...

Na een jaar van beledigingen, invallen, arrestaties en ballingschap, een viering van de Californische...

Zijn de eindtijden nabij? Frank Turek’s waarschuwing en openbaring over waarom de wederkomst ‘versluierd’...

Dingen waar ik nooit zonder reis, van een vrouw die alle 50 staten heeft...