Run Deepseek V3 (FP8) on your data

Deepseek V3 is an LLM. It excels in handling very large context windows and advanced reasoning tasks due to its Mixture-of-Experts (MoE) architecture, which activates 37 billion parameters per token from a total of 671 billion. This design allows for more efficient scaling, high training efficiency, and significant reductions in computational and memory cost. Its FP8 quantization further optimizes inference speed and resource usage.

Some other noteworthy features of Deepseek V3 include support for multi-token prediction, which accelerates inference, and an architecture designed to efficiently leverage modern GPU hardware for large-scale workloads.

Metric	Value
Parameter Count	671 billion
Mixture of Experts	Yes
Active Parameter Count	37 billion
Context Length	Unknown
Multilingual	Unknown
Quantized*	Yes
Precision*	FP8

*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.

		Modality		Price (1M tokens)
Model	Inference provider	Input	Output	Input	Output
Deepseek R1	Fireworks AI	text	text	$3.00	$8.00
Deepseek R1	DeepSeek	text	text	$0.55	$2.19
Deepseek R1 (FP8)	Together.ai	text	text	$7.00	$7.00
Deepseek R1 Distill Llama 70B	Groq	text	text	$0.59	$0.79
Deepseek V3	DeepSeek	text	text	$0.27	$1.10
Deepseek V3	Fireworks AI	text	text	$0.75	$3.00
Deepseek V3 (FP8)	Together.ai	text	text	$1.25	$1.25

DeepSeek / Deepseek V3 (FP8)

DeepSeek models available on Oxen.ai