Compute
High-performance, scalable computing resources for your critical workloads. Orchestrate your cloud-native applications with our modern container solutions.
Discover the Compute offer
Virtual machines
VM Instances
An on-demand, flexible and secure virtual machine solution on a shared infrastructure.
Dedicated servers
OpenSource IaaS
Open source virtualised infrastructure in a trusted SecNumCloud-qualified cloud environment for complete technological sovereignty.
VMWare IaaS
Your VMware virtual machines in a trusted SecNumCloud-qualified and HDS-certified cloud environment.
Bare Metal
Dedicated, fully customisable servers for total autonomy over your sovereign infrastructure.
Containers
PaaS OpenShift
The unified platform for creating, modernising and deploying your large-scale applications in a sovereign cloud.
Managed Kubernetes
Managed container orchestration solution offering security, resilience and advanced automation on sovereign infrastructure.
Storage
Adaptable, high-performance storage solutions for all your needs. Optimise your data with our highly available block and object solutions.
Discover our Storage offer
Storage
Block storage
The adaptable block storage solution for optimum storage performance in a sovereign cloud.
Object storage
The scalable, cost-effective storage solution for your unstructured data in a sovereign cloud.
Backup
Backup solutions
Differentiated backup solutions tailored to your challenges and environments
Network
Advanced network solutions to connect and secure your infrastructures. Deploy your private networks automatically and securely.
Discover the Network offer
Network
Virtual Private Cloud
Deploy and manage your private networks 100% automatically and securely.
Private Backbone
Take full control of your network with extended Layer 2 connectivity, designed for hybrid architectures and bespoke configurations.
Firewall
Managed Firewall
Advanced security solutions for complete insulation and enhanced protection
Accommodation Dry
Housing - Dedicated space
Secure hosting for your equipment in a dedicated or shared environment, depending on your needs.
Security
Advanced security solutions to protect your critical infrastructures. Control access and defend against online threats.
Discover the Security offer
Detection
Managed SIEM
A centralised platform for collecting and correlating security logs, combining AI-based automation and advanced detection rules (MITRE ATT&CK).
Sovereign SOC
A sovereign SOC offering operated 24/7, deployable from our marketplace, on SecNumCloud-qualified infrastructure.
Protection
Anti DDoS
The shield against online attacks
Bastion host
Transparent, centralised access control for robust protection of your infrastructure
Managed KMS
Sovereign cryptographic key management, with HSM hardware root of trust, to protect your most sensitive data on SecNumCloud infrastructure.
AI
Artificial intelligence solutions to transform your data into insights and accelerate your business processes.
Discover the AI offer
AI
LLMaaS
Access cutting-edge language models on a sovereign, SecNumCloud-qualified and HDS-certified infrastructure for high-performance, secure AI applications.
GPU
NVIDIA GPU instances to accelerate your artificial intelligence and high-performance computing in a sovereign cloud.
Data
Data solutions to manage, analyse and exploit your critical data.
Discover the Data offer
Databases
Managed MariaDB
A fully managed MariaDB relational database and PITR backup on SecNumCloud sovereign infrastructure.
Managed PostGreSQL
The fully managed relational database solution on SecNumCloud sovereign infrastructure
Big Data
Managed Kafka
The open-source distributed platform for streaming data in real time
Managed File System
A managed, sovereign, high-availability distributed file system, accessible via NFS and SMB on the SecNumCloud infrastructure.
Management & Governance
Coaching and support services to help you with your cloud transformation.
Support
Support levels
Discover the 3 levels of support available to help you meet your challenges.
Professional services
From design to optimisation, Cloud Temple is with you every step of the way.
Governance
Console - API - Terraform Provider
A single interface for viewing and managing your products and services
Observability
Infrastructure metrics available in market standards

Our Large Language Model as a Service (LLMaaS) offering gives you access to cutting-edge language models, inferred using SecNumCloud-qualified infrastructure, HDS-certified for healthcare data hosting, and therefore sovereign, calculated in France. Benefit from high performance and optimal security for your AI applications. Your data remains strictly confidential, and is neither exploited nor stored after processing.

Simple, transparent pricing
1,8 €
per million input tokens
8 €
per million tokens issued
8 €
per million reasoning tokens
4 €
per million reranking tokens
0,9 €
per million batch tokens received
4 €
per million batch tokens output
0,01 €
per minute of transcribed audio *
Calculated on an infrastructure based in France, SecNumcloud qualified and HDS certified.
Note on the "Reasoning" price: This price applies specifically to models classified as "reasoners" or "hybrids" (models with the "Reasoning" capability activated) when reasoning is active and only on tokens linked to this activity.
* any minute started is counted

Chat & Reasoning

Our large models offer state-of-the-art performance for the most demanding tasks. They are particularly well-suited to applications requiring a deep understanding of language, complex reasoning or the processing of long documents.

80 tokens/second

qwen3.6:27b

Generalist reference model with a native context of 1M tokens. Excels at reasoning, following instructions and multilingualism.
Significant improvements in following instructions, reasoning, reading comprehension, mathematics, coding and tool use. Its context of 1M tokens enables the analysis of entire documents without truncation.
94 tokens/second

gpt-oss:120b

OpenAI's state-of-the-art open-weight model with configurable reasoning and transparent chain of thought.
Mixture-of-Experts model with 120 billion parameters offering configurable reasoning and full access to the chain of thought. Ideal for scenarios requiring a permissive licence (Apache 2.0).
10 tokens/second

llama3.3:70b

Multilingual Meta model, excellent in natural dialogue and nuanced understanding in 8 languages.
Supports English, French, German, Spanish, Italian, Portuguese, Hindi and Thai. Its 132k tokens window enables analysis of complex documents and long conversations.
72 tokens/second

nemotron-3-super:120b

NVIDIA model optimised for collaborative agents, long reasoning and high-volume workloads. 1M tokens context.
Ideal for agentic workflows, long-context reasoning, high-volume automation (support tickets, mass analyses), the use of tools and RAG.
56 tokens/second

qwen3-2507:235b

The most powerful model in the catalogue (235B parameters, 22B active). Excels in maths, coding and logical reasoning.
Ultra-sparse Mixture-of-Experts architecture combining the power of a very large model with the efficiency of a smaller model.
100 tokens/second

mistral-small4:119b

High-performance Mistral model (119B) with vision, integrated security and context of 262K tokens. Fast (100 t/s).
Large version of the Mistral Small family. Combines power, speed and reliability with an extended context. Native security filters.
55 tokens/second

qwen3-2507-think:4b

Compact model optimised for deep reasoning (logic, maths, science, code). Context of 250K tokens.
Thinking" version with enhanced reasoning capability. Combines compactness, speed and advanced reasoning.

Programming & Agents

Our programming and agent models are specially optimised for agentic software engineering, large-scale code generation and development workflow automation.

121 tokens/second

qwen3.6:35b

Leader in agentic software engineering (SWE-bench 73.4%). Context of 1M tokens, integrated vision and tool calling.
Includes entire code repositories thanks to its 1M token context. Supports multi-step reasoning and vision (screenshots, diagrams). Optimised for IDEs and CI/CD pipelines.
97 tokens/second

qwen-coder-next:80b

State-of-the-art model for complex code and reasoning. Context of 250K tokens.
Excels at large-scale code generation and analysis. Designed for advanced software engineering tasks.
67 tokens/second

qwen3-next:80b

Versatile 80B model optimised for large contexts, function calling and structured reasoning.
Context of 250K tokens with support for function calling and guided decoding.
33 tokens/second

devstral-small-2:24b

State-of-the-art agentic model for software engineering. Performance close to >100B models for code. Integrated vision.
Optimised for codebase exploration, multi-file editing and the use of tools. Native vision support. Context of 200K tokens.
40 tokens/second

functiongemma:270m

Micro-model specialising in function call detection. Ideal as a router in an agentic architecture.
Ultra-compact, optimised for identifying and formatting function calls quickly.

Vision & Multimodal

Our Vision & Multimodal models can analyse images, videos and visual documents. They excel in OCR, object detection, structured extraction and spatio-temporal reasoning.

24 tokens/second

qwen3-vl:235b

The most powerful multimodal model in the catalogue. Advanced visual understanding and exceptional reasoning.
Excels in complex document analysis, multilingual OCR, 3D spatial reasoning and video understanding.
39 tokens/second

qwen3-vl:30b

High-performance multimodal model for OCR, object detection, video analysis and spatio-temporal reasoning.
Incorporates innovations in image and video analysis. Excels in complex OCR, graphics and structured extraction (JSON).
57 tokens/second

qwen3-vl:4b

Compact, fast vision model for document analysis and video comprehension.
Excellent compromise between performance and footprint. Supports structured extraction and visual reasoning.
59 tokens/second

gemma4:31b

Google's dense multimodal model, ranked 3rd in the world on Arena AI. Advanced vision, reasoning and coding. Context 250K tokens.
Google's most powerful open-source model. Native function calling, advanced visual understanding (OCR, graphics, documents, UI). Multilingual (35+ languages).
42 tokens/second

gemma4:12b-it-qat

An intermediate multimodal model from Google, with integrated vision, reasoning and native function calling. Very large context of 250K tokens.
Variant 12B of the Gemma 4 family, offering a good balance between multimodal capabilities and footprint. Advanced reasoning, visual understanding (OCR, graphics, documents, UI) and multilingual support (35+ languages).

Embedding

Our embedding models transform text into vector representations for semantic search, clustering and RAG (Retrieval-Augmented Generation) pipelines.

171 tokens/second

bge-m3:567m

State-of-the-art multilingual embedding (100+ languages). Supports dense, sparse and multi-vector searches.
Context of 8192 tokens with three complementary search methods.

qwen3-embedding:4b

High-performance embedding with deep semantic understanding and extended context (40K tokens).
Ideal for processing large documents in RAG pipelines.

qwen3-embedding:8b

High-capacity embedding with the best semantic understanding of the Qwen3 family. Extended context (40K tokens).
The most powerful version of the Qwen3 embedding family. Ideal for tasks requiring contextual understanding.

qwen3-embedding:0.6b

Ultra-light and fast embedding for low-latency semantic search.
Excellent compromise between semantic performance and speed of execution.
196.3 tokens/second

granite-embedding:278m

Ultra-compact IBM embedding for semantic search with minimal latency.
The fastest embedding model in the catalogue. Ideal for clustering and high-frequency searching.
175 tokens/second

embeddinggemma:300m

Multilingual Google embedding (100+ languages), optimised for search and semantic retrieval.
Produces vector representations of text for classification, clustering and similarity search.

Reranking

Our reranking models reorder search results by relevance to refine the quality of RAG pipelines. Compatible with the Cohere API.

nvidia/llama-nemotron-rerank-vl-1b-v2

Cohere API-compatible reranking model (/v1/rerank and /v2/rerank). Orders documents by relevance to a query.
Cohere v1/v2 SDK compatible. The relevance score is a raw logit (relative order is guaranteed). Ideal as a complement to the RAG stack (embedding + retrieval + rerank).

qwen3-reranker:4b

Powerful reranking model with a high level of contextual understanding.
Excellent rescheduling quality thanks to its 4B parameters. Ideal for demanding RAG pipelines.

qwen3-reranker:0.6b

Compact and efficient reranking model for rapid rescheduling.
Lightweight version for use cases requiring low reranking latency.

bge-reranker-large

High-performance multilingual reranking model from the BGE family.
Complementary to the BGE-M3 embedding model for complete RAG pipelines.

Security

Our security models specialise in detecting problematic content, preventing jailbreaks and ensuring regulatory compliance (RGPD, HDS). They can be used as pre-filters or post-filters in your workflows.

45 tokens/second

granite3-guardian:8b

Granite Guardian 4.1 (v3 upgrade) — detection of problematic content, jailbreak, BYOC and hybrid thinking.
Version 4.1 (April 2026). Designed to filter sensitive content and ensure GDPR/HDS compliance. Can be used as a pre-filter or post-filter in your workflows. Hybrid thinking (reasoning) enabled.
60 tokens/second

granite3-guardian:2b

Granite Guardian 4.1 compact (upgrade v3:2b) — also known as version 8B with hybrid thinking.
Same filtering capabilities as the 8B version, but with a smaller footprint. Ideal for high-frequency workflows. Hybrid thinking (reasoning) enabled.

Translation

Our translation models offer high fidelity in 55 languages, respecting the grammar, cultural nuances and technical specificities of the documents.

17 tokens/second

translategemma:27b

High-performance translation for 55 languages. Superior quality for complex and technical content.
Captures literary and cultural nuances with exceptional fidelity.

Audio & Image

Our Audio & Image models enable real-time voice transcription (ASR streaming) and image generation from text descriptions, compatible with the OpenAI API.

voxtral

Real-time audio transcription via WebSocket. Low-latency streaming speech recognition.
Operates in Realtime mode via the /v1/realtime endpoint (WebSocket). Transcribes streaming audio.

z-image:16b

Image generation from text prompts, OpenAI API compatible /v1/images/generations.
Supports image size and number of images. Compatible with the OpenAI ecosystem.

Model comparison

This comparison table will help you choose the model best suited to your needs, based on various criteria such as context size, performance and specific use cases.

Table comparing the characteristics and performance of the different AI models available, grouped by category.
Model Publisher Parameters Context (tokens) Vision Agent Reasoning Security Quick * Energy efficiency *
Chat & Reasoning
qwen3.6:27b Qwen Team 27B 1 000 000
gpt-oss:120b OpenAI 120B 120 000
llama3.3:70b Meta 70B 132 000
nemotron-3-super:120b NVIDIA 120B 1 000 000
qwen3-2507:235b Qwen Team 235B 200 000
mistral-small4:119b Mistral AI 119B 262 144
qwen3-2507-think:4b Qwen Team 4B 250 000
Programming & Agents
qwen3.6:35b Qwen Team 35B 1 000 000
qwen-coder-next:80b Qwen Team 80B 250 000
qwen3-next:80b Qwen Team 80B 250 000
devstral-small-2:24b Mistral AI & All Hands AI 24B 200 000
functiongemma:270m Google 270M 32 768
Vision & Multimodal
qwen3-vl:235b Qwen Team 235B 200 000
qwen3-vl:30b Qwen Team 30B 250 000
qwen3-vl:4b Qwen Team 4B 250 000
gemma4:31b Google 31B 250 000
gemma4:12b-it-qat Google 12B 250 000
Embedding
bge-m3:567m BAAI 567M 8 192
qwen3-embedding:4b Qwen Team 4B 40 000
qwen3-embedding:8b Qwen Team 8B 40 000
qwen3-embedding:0.6b Qwen Team 0.6B 32 768
granite-embedding:278m IBM 278M 512
embeddinggemma:300m Google 300M 2 048
Reranking
nvidia/llama-nemotron-rerank-vl-1b-v2 NVIDIA 1B 4 096 N.C.
qwen3-reranker:4b Qwen Team 4B 4 096 N.C.
qwen3-reranker:0.6b Qwen Team 0.6B 4 096 N.C.
bge-reranker-large BAAI 335M 512 N.C.
Security
granite3-guardian:8b IBM 8B 8 192
granite3-guardian:2b IBM 2B 8 192
Translation
translategemma:27b Google 27B 120 000
Audio & Image
voxtral Mistral AI 4B 32 768 N.C.
z-image:16b Community 16B N.C. N.C.
Legend and explanation
Functionality or capacity supported by the model
Functionality or capability not supported by the model
* Energy efficiency Indicates particularly low energy consumption (< 2.0 kWh/Mtoken)
* Quick Model capable of generating more than 50 tokens per second
Note on performance measures
The speed values (tokens/s) represent performance targets in real-life conditions. Energy consumption (kWh/Mtoken) is calculated by dividing the estimated power of the inference server (in Watts) by the measured speed of the model (in tokens/second), then converted into kilowatt-hours per million tokens (division by 3.6). This method offers a practical comparison of the energy efficiency of different models, to be used as a relative indicator rather than an absolute measure of power consumption.

Recommended use cases

Here are some common use cases and the most suitable models for each. These recommendations are based on the specific performance and capabilities of each model.

Multilingual dialogue

Chatbots and assistants able to communicate in several languages with automatic detection and context maintenance
Recommended models
  • nemotron-3-super:120b
  • qwen3.6:27b
  • gpt-oss:120b

Analysis of long documents

Processing of large documents (>100 pages) with extraction of key information, summaries and answers to questions
Recommended models
  • nemotron-3-super:120b
  • qwen3.6:27b
  • qwen3-2507:235b

Programming and development

Code generation, optimisation and debugging in multiple languages, refactoring and test creation
Recommended models
  • qwen3.6:35b
  • qwen-coder-next:80b
  • devstral-small-2:24b
  • nemotron-3-super:120b

Visual analysis

Image and visual document processing, OCR, interpretation of graphs and tables
Recommended models
  • qwen3-vl:235b
  • gemma4:31b
  • qwen3-vl:30b

Safety and compliance

Sensitive content filtering, jailbreak detection, RGPD/HDS compliance
Recommended models
  • granite4.1-guardian:8b
  • granite3-guardian:8b
  • granite3-guardian:2b
  • mistral-small4:119b

Light deployments

Applications requiring a minimal footprint, low latency and low power consumption

RAG (Retrieval-Augmented Generation)

Complete semantic search, reordering and retrieval-enhanced generation pipelines
Recommended models
  • bge-m3:567m
  • nvidia/llama-nemotron-rerank-vl-1b-v2
  • qwen3.6:27b
Follow the development of the LLMaaS offering

Discover all our IA research papers

 

Cookie policy

We use cookies to give you the best possible experience on our site, but we do not collect any personal data.

Audience measurement services, which are necessary for the operation and improvement of our site, do not allow you to be identified personally. However, you have the option of objecting to their use.

For more information, see our privacy policy.