Software & Apps

Contemplative LLMs: Is anxiety all you need?


09 Jan, 2025

Recently, I posted a prompt on X (formerly, Twitter) for Large Language Models like Claude sonnet, GPT-4o, Deepseek v3, etc. The prompt instructed these models to ‘think’ for a moment before giving the final answer, and it unexpectedly went viral. This is a short blog post on my THINK behind the arrival of this prompt.

Example output:

contemplative-llms-demo

You can find the full system prompt in this GitHub gist: Thoughtful LLMs are fully motivated

The inspiration

It is clear that the next big thing to tackle in the field of language models seems to be “reasoning”. The latest OpenAI models like o1 and o3 are a paradigm shift in this idea. After testing the o1 model I was truly impressed by how much ‘thought’ was given before answering a user’s question.

In essence, the o1 model is trained on Reinforcement Learning (RL) in tasks that require heavy reasoning (coding, math, etc.) possibly using a ‘verifier’ model to check reasoning steps during training, and it uses a which is called test time calculation to spend more time “thinking” through the steps during inference. From their official blog post:

Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute).

The main motivation for creating this prompt is just to see the raw Chain of Mind (CoT) o1 text from the official blog post. For example some parts, in the raw text of the CoT, such as:

(...)
Alternatively, perhaps subtract: 25 - 15 = 10.

No.

Alternatively, perhaps combine the numbers in some way.

Alternatively, think about their positions in the alphabet.

Alternatively, perhaps the letters are encrypted via a code. 
(...)

(...)
Wait, actually, this may not help us directly without specific terms.
(...)

This gave me an idea: Can we induce an LLM (which is not o1) in such a way that it simulates the thought process and also the ‘exploration’ of alternative possibilities? If yes, what do the results look like?

Build the prompt

What are the core principles of creating a prompt that tries to mimic the raw CoT text of o1? One thing to remember is that this prompt can have many variants. There is no “one universal right” prompt.

We really need to question the model without jumping to conclusions. The exploratory phase is necessary. The model should explore different possibilities. Along with this, every assumption should be questioned until the solution emerges naturally. This brings us to the first point:

1. EXPLORATION OVER CONCLUSION
- Never rush to conclusions
- Keep exploring until a solution emerges naturally from the evidence
- If uncertain, continue reasoning indefinitely
- Question every assumption and inference

Now, as humans we usually think in the first person. We have an internal monolouge. Let’s break down complex thoughts into simpler ones. With LLMs we can go deeper (depending on the output token limit of the model). Of course, this brings us to the second point:

2. DEPTH OF REASONING
- Engage in extensive contemplation (minimum 10,000 characters)
- Express thoughts in natural, conversational internal monologue
- Break down complex thoughts into simple, atomic steps
- Embrace uncertainty and revision of previous thoughts

In the third point, taking inspiration from the raw CoT text of o1 we see that the thoughts are short and simple sentences. It shows the work in progress, and it also backtracks if it encounters a dead-end. So we get:

3. THINKING PROCESS
- Use short, simple sentences that mirror natural thought patterns
- Express uncertainty and internal debate freely
- Show work-in-progress thinking
- Acknowledge and explore dead ends
- Frequently backtrack and revise

Finally, this continues until an ‘improvement’ is found for a given problem. We need to value exploration over quick resolution.

4. PERSISTENCE
- Value thorough exploration over quick resolution

After this, we determine the output format of the response. This is necessary because it is important to separate the thought process from the actual output/response. We, as humans, don’t always say what we think.

Now, instead of requesting the JSON response model we use XML tags to separate the start and end of the reflection phase and the final response:


(Your extensive internal monologue goes here)
- Begin with small, foundational observations
- Question each step thoroughly
- Show natural thought progression
- Express doubts and uncertainties
- Revise and backtrack if you need to
- Continue until natural resolution



(Only provided if reasoning naturally converges to a conclusion)
- Clear, concise summary of findings
- Acknowledge remaining uncertainties
- Note if conclusion feels premature

We also need some style guidelines for the model to use in its thinking process. Words like: Hmm... let me think about this..., Wait, that doesn't seem right..., Maybe I should approach this differently... and so on. This gives us the part:

1. Natural Thought Flow
"Hmm... let me think about this..."
"Wait, that doesn't seem right..."
"Maybe I should approach this differently..."
"Going back to what I thought earlier..."

2. Progressive Building
"Starting with the basics..."
"Building on that last point..."
"This connects to what I noticed earlier..."
"Let me break this down further..."

These are the main prompting features that work well with models like Claude sonnet, and GPT-4o.

Why might it (not) work?

LLMs are based on architecture of transformers that is autoregressive in nature ie based on all the previous tokens it generates the next token and it happens sequentially. The reason this ‘reflection’ phase should work and result in the correct answer (and reasoning) is that while generating the next token of the final answer, the model has the context of all the tokens of ‘reflection’. This context will be very useful in creating the last section of the answer. The intuition behind sentences like “Wait…that’s not right…” is that the tokens that come after this sentence can lead the model to a potentially correct path.

autoregressive

The reason it doesn’t work is that we are just imitating the thought process. LLMs are always easy hallucinations (at least for now). If the model is clearly hallucinating during the ‘reflection’ phase, then this will affect the final answer section.

Note: Simply simulating a thought process like o1 does not guarantee that the model will always “think” or “reason” correctly.

Regardless, compared to the default system prompt for most LLMs this prompt seems better for intermediate to difficult practice tasks. Also, for relatively simple tasks like “What is 2 + 2?” it doesn’t make sense to think too much so in that case I don’t recommend using it as a response style.

CONCLUSION

In short, we can let the LLM ‘think’ for a while before responding using this simple prompt system, which can (in most cases) lead to the correct final answer!

If you like reading this, you can follow i’m at X (formerly, Twitter) for real time updates about ML, and my life in general 🙂

Until next time!

#LLMs

#quick engineering

#reasoning


https://raw.githubusercontent.com/Maharshi-Pandya/bearblogs/refs/heads/master/contemplative-llms/media/autoregressive.png

2025-01-12 00:02:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Check Also
Close
Back to top button