
Hse
Add a review FollowOverview
-
Founded Date July 29, 1980
-
Posted Jobs 0
-
Viewed 14
Company Description
DeepSeek’s First-generation Reasoning Models
DeepSeek’s first-generation thinking models, accomplishing performance equivalent to OpenAI-o1 throughout mathematics, code, and reasoning tasks.
Models
DeepSeek-R1
Distilled designs
group has demonstrated that the reasoning patterns of bigger designs can be distilled into smaller designs, leading to much better performance compared to the thinking patterns found through RL on small models.
Below are the models created via fine-tuning against a number of dense models extensively utilized in the research community utilizing reasoning information generated by DeepSeek-R1. The evaluation results show that the distilled smaller sized thick designs carry out remarkably well on criteria.
DeepSeek-R1-Distill-Qwen-1.5 B
DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Llama-8B
DeepSeek-R1-Distill-Qwen-14B
DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Llama-70B
License
The model weights are accredited under the MIT License. DeepSeek-R1 series support industrial usage, permit any adjustments and acquired works, including, but not restricted to, distillation for training other LLMs.