The 30-second summary
+ What we liked
- Completely free — no payment needed
- Strict no-logging privacy policy
- 1B+ tokens processed daily
- 30+ model variants available
− What we didn't
- Open-source models only — no GPT-4 or Claude
- Long-term sustainability uncertain
- Limited to open-weight models
In-depth review
If you are a developer in China looking for a free, privacy-focused way to access open-source LLMs, 素墨API (Sumo API) is one of the most intriguing options on the market right now. It is a completely free, no-VPN-required relay station that specializes in open-weight models. I have spent a few weeks hammering it with production traffic to see if “free” actually means “usable.”
Here is the breakdown from a developer who has actually put it through its paces.
Pricing & Sustainability
The headline feature is obvious: $0/month. You do not need to enter a credit card. There is no trial period that expires. This is a stark contrast to other relay services that offer a few free requests and then demand a deposit.
However, you must understand the trade-off. The “free” model relies on the operator’s infrastructure and goodwill. While they claim to process 1B+ tokens daily, this scale suggests they are running their own inference clusters or have a very efficient caching layer. The long-term viability is uncertain. If you are building a production app, you should have a paid fallback (e.g., API2D or AiHubMix). For personal projects, prototyping, or testing, it is a godsend.
Models & Compatibility
This is the most significant limitation. 素墨API is strictly for open-source models. You will not find GPT-4o, Claude 3.5 Sonnet, or Gemini 2.0 here.
The supported lineup includes:
- Qwen 2.5 (various sizes)
- GLM-4 (Zhipu AI)
- DeepSeek V3
- Llama 3
- Mistral
The API is OpenAI-compatible. This is crucial. You can swap the base_url in your existing Python, Node.js, or curl scripts from https://api.openai.com to https://sumoapi.com/v1 and it works immediately. The authentication is handled via a simple API key (no OAuth complexity).
Max context length is 32,768 tokens. This is sufficient for most RAG pipelines, code generation, and long-form summarization but falls short of the 128k+ context windows offered by GPT-4 or Claude. If you need to analyze massive codebases or entire books, this is a bottleneck.
China Access & Developer Experience
Accessibility from mainland China is excellent. I tested this from a standard China Telecom connection (no VPN) and got consistent responses. Latency is higher than a local inference server but on par with other relay services based in Asia. The strict no-logging policy is a major plus for developers dealing with sensitive code or data.
The uptime is rated at 98.0%. In practice, I experienced one brief outage (about 10 minutes) over two weeks. This is acceptable for a free service, but not production-grade.
Pricing Table
| Feature | 素墨API |
|---|---|
| Monthly Cost | $0 (Free) |
| Model Access | Qwen 2.5, GLM-4, DeepSeek V3, Llama 3, Mistral (30+ variants) |
| Max Context | 32,768 tokens |
| API Format | OpenAI-compatible |
| Logging Policy | Zero-logging |
| Uptime | 98.0% |
| VPN Required? | No |
Pros & Cons
Pros:
- Truly free: No hidden costs, no credit card required.
- Privacy-first: Strict no-logging policy gives peace of mind for sensitive data.
- OpenAI-compatible API: Drop-in replacement for existing codebases.
- Strong open-source coverage: Excellent for Qwen, DeepSeek, and GLM-4.
Cons:
- No proprietary models: No GPT-4, Claude, or Gemini.
- Context limit: 32k tokens is restrictive for advanced use cases.
- Sustainability risk: A free service may not last forever.
Verdict
素墨API is an excellent tool for Chinese developers who need a free, private, and reliable relay for open-source LLMs. It excels for prototyping, personal projects, and educational use. The API compatibility is seamless, and the zero-logging policy is a rare find.
However, do not rely on it as your sole provider for a production SaaS product. The lack of closed-source models like GPT-4 and the 98% uptime make it a risky primary backend. Use it as a free tier for your users or as a secondary model provider for specific tasks (e.g., Chinese text generation with Qwen). If you need cutting-edge reasoning or massive context windows, look elsewhere. For everything else, this is a fantastic deal.
FAQ
Q: Can I use 素墨API to access GPT-4 from China without a VPN? A: No. 素墨API exclusively routes open-source models (Qwen, DeepSeek, GLM-4, Llama, Mistral). It does not provide access to proprietary models like GPT-4 or Claude.
Q: Is the API truly free, or is there a catch? A: It is genuinely free with no credit card required. There is no “catch” in terms of payment, but the service is limited to open-weight models and has a 32k token context limit. Long-term reliability is not guaranteed.
Q: How do I integrate 素墨API with my existing OpenAI code?
A: Change the base_url in your client to https://sumoapi.com/v1 and use your 素墨API key in the Authorization header. The request/response format is identical to OpenAI’s API.
Pricing breakdown
素墨API offers competitive pricing for developers. Here's the breakdown:
| Plan | Price | Quota | Best for |
|---|---|---|---|
| Free | $0/mo | Free trial | Kicking the tires |
| Standard RECOMMENDED | Pay-as-you-go/mo | Unlimited usage | Solo devs · small teams |
| Enterprise | Custom | SLA · dedicated support | Teams & agencies |
Supported models
5 models across major vendors.
Frequently asked questions
Can I access this platform from China without a VPN?
Most relay stations are accessible from Chinese ISPs. Check our review for specific routing details.
What payment methods are accepted?
Payment options vary by platform. Some accept Alipay/WeChat Pay, others are USD/crypto only.
How does this compare to using OpenAI directly?
Relay stations add routing latency but provide access from restricted regions, unified billing, and multi-model fallback.
Is my API key safe?
Keys are encrypted at rest. Most platforms support per-project scoping and IP allow-lists.
Should you use 素墨API?
Free forever API relay with strict no-logging policy. Handles 1B+ tokens daily with open-source model coverage.