Re:source

Some Thoughts on Reasoning Models

At the beginning of the year, deepseek launched the r1 model, and OpenAI subsequently released the o3-mini model. After briefly reviewing the deepseek-r1 paper, I have some thoughts and questions about reasoning models.

Translation Notice

This content is automatically translated from Chinese by AI. While we strive for accuracy, there might be nuances that are lost in translation.

First, a disclaimer: I don’t have a deep understanding of the principles behind LLMs, so if there are any misunderstandings, please feel free to correct me.

From the deepseek-r1 paper, it can be seen that the model is based on the deepseek-v3 as the foundation model, which is then fine-tuned through RL to obtain a reasoning model. Other distilled models are also built on top of other open-source models, using the output of r1 as training data.

This raises some questions for me:

1.Are these so-called reasoning models truly ‘reasoning’? 2.Is it possible to enable reasoning capabilities in the base model through Prompt Engineering? For example, by using CoT + few-shot methods?