Getting Started with the API

Registration takes minutes. Visit platform.moonshot.ai, create an account with email verification, and navigate to the API keys section. The platform provides documentation in both English and Chinese, with code examples covering common integration patterns.
- Register at platform.moonshot.ai and verify your email address.
- Navigate to the API Keys section in the developer dashboard.
- Generate your first API key. Store it securely as it cannot be retrieved after creation.
- Install the OpenAI Python SDK or use cURL directly. The Kimi API accepts standard OpenAI-format requests.
from openai import OpenAI
client = OpenAI(
api_key="your_moonshot_api_key",
base_url="https://api.moonshot.cn/v1"
)
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the MoE architecture."}
],
temperature=0.7
)
print(response.choices[0].message.content)
This code works identically to OpenAI API calls. Switching from GPT models requires changing only the base_url and api_key parameters. Existing error handling, retry logic, streaming implementations, and response parsing transfer without modification.
Alternative Access Methods

Beyond the official API, Kimi K2.5 is accessible through several third-party platforms and self-hosting options.
- OpenRouter (openrouter.ai): Aggregated access with unified billing across multiple AI providers. Useful for applications that need fallback routing between models.
- Together AI (together.ai): Optimized inference infrastructure with competitive pricing and low-latency serving.
- NVIDIA NIM: Enterprise deployment through NVIDIA's inference microservices platform.
- Self-hosted: Download from Hugging Face (moonshotai/Kimi-K2.5) in block-fp8 format. Deploy via vLLM, SGLang, Transformers, or Docker. Requires significant GPU resources for the full 1T parameter model.
Kimi API FAQ
Is the Kimi API compatible with OpenAI SDK?
Fully compatible. Change the base_url to api.moonshot.cn/v1 and use your Moonshot API key. Chat completions, function calling, streaming, and structured outputs work identically. No code changes beyond the connection parameters are needed.
What is the maximum context window supported?
Kimi K2.5 supports a massive context window of up to 262,144 tokens (256K), allowing for processing of entire codebases or long documents without chunking.
How do I save costs with context caching?
Automatic context caching activates when the same prefix (system prompt, documents) appears in consecutive requests. Cached tokens cost $0.15 per million instead of $0.60, a 75% reduction.
Can I process images and video with the API?
Yes, Kimi K2.5 is natively multimodal. You can send images and video URLs directly in the chat messages for analysis, chart reading, and data extraction.
Where can I get an API key?
Register at platform.moonshot.ai, verify your email, and generate a key in the API Keys section of the developer dashboard.
Are there third-party providers for Kimi API?
Yes, you can access Kimi K2.5 through OpenRouter, Together AI, and NVIDIA NIM, providing options for unified billing or optimized inference.
Is Kimi K2.5 open-source?
Yes, the model weights are available on Hugging Face under a Modified MIT License. You can deploy them using vLLM, SGLang, or Docker.
How are rate limits determined?
The official API uses a tiered system based on your cumulative account recharge amount. Higher tiers unlock increased concurrency and higher requests per minute.
What SDKs are available?
Since it is OpenAI-compatible, official SDKs for Python and Node.js work out of the box. Any standard library that supports REST requests can be used.
Is there a lighter version of the model for self-hosting?
Yes, Kimi Linear (48B MoE, 3B active) is a lighter alternative for resource-constrained deployments compared to the full 1T parameter model.




