In-depth review 素墨API By hu-qian · Shenzhen Last tested May 22, 2026 4 min read

素墨API Review 2026: Best AI Token Relay for Chinese Developers? — Free forever API relay with strict no-logging policy. Handles 1B+ tokens daily …

素墨API in-depth review 2026: pricing, model coverage, China availability, uptime, and developer experience. Is it worth it?

Composite score
98/ 100
Recommended. Free forever API relay with strict no-logging policy. Handles 1B+ tokens daily with open-source …
Security5/5 AAA
Uptime98%
PriceFree / PAYG
Model coverage5 models
China accessGood
Payment支付宝 · 微信支付

The 30-second summary

+ What we liked

  • Completely free — no payment needed
  • Strict no-logging privacy policy
  • 1B+ tokens processed daily
  • 30+ model variants available

What we didn't

  • Open-source models only — no GPT-4 or Claude
  • Long-term sustainability uncertain
  • Limited to open-weight models

In-depth review

If you are a developer in China looking for a free, privacy-focused way to access open-source LLMs, 素墨API (Sumo API) is one of the most intriguing options on the market right now. It is a completely free, no-VPN-required relay station that specializes in open-weight models. I have spent a few weeks hammering it with production traffic to see if “free” actually means “usable.”

Here is the breakdown from a developer who has actually put it through its paces.

Pricing & Sustainability

The headline feature is obvious: $0/month. You do not need to enter a credit card. There is no trial period that expires. This is a stark contrast to other relay services that offer a few free requests and then demand a deposit.

However, you must understand the trade-off. The “free” model relies on the operator’s infrastructure and goodwill. While they claim to process 1B+ tokens daily, this scale suggests they are running their own inference clusters or have a very efficient caching layer. The long-term viability is uncertain. If you are building a production app, you should have a paid fallback (e.g., API2D or AiHubMix). For personal projects, prototyping, or testing, it is a godsend.

Models & Compatibility

This is the most significant limitation. 素墨API is strictly for open-source models. You will not find GPT-4o, Claude 3.5 Sonnet, or Gemini 2.0 here.

The supported lineup includes:

  • Qwen 2.5 (various sizes)
  • GLM-4 (Zhipu AI)
  • DeepSeek V3
  • Llama 3
  • Mistral

The API is OpenAI-compatible. This is crucial. You can swap the base_url in your existing Python, Node.js, or curl scripts from https://api.openai.com to https://sumoapi.com/v1 and it works immediately. The authentication is handled via a simple API key (no OAuth complexity).

Max context length is 32,768 tokens. This is sufficient for most RAG pipelines, code generation, and long-form summarization but falls short of the 128k+ context windows offered by GPT-4 or Claude. If you need to analyze massive codebases or entire books, this is a bottleneck.

China Access & Developer Experience

Accessibility from mainland China is excellent. I tested this from a standard China Telecom connection (no VPN) and got consistent responses. Latency is higher than a local inference server but on par with other relay services based in Asia. The strict no-logging policy is a major plus for developers dealing with sensitive code or data.

The uptime is rated at 98.0%. In practice, I experienced one brief outage (about 10 minutes) over two weeks. This is acceptable for a free service, but not production-grade.

Pricing Table

Feature素墨API
Monthly Cost$0 (Free)
Model AccessQwen 2.5, GLM-4, DeepSeek V3, Llama 3, Mistral (30+ variants)
Max Context32,768 tokens
API FormatOpenAI-compatible
Logging PolicyZero-logging
Uptime98.0%
VPN Required?No

Pros & Cons

Pros:

  • Truly free: No hidden costs, no credit card required.
  • Privacy-first: Strict no-logging policy gives peace of mind for sensitive data.
  • OpenAI-compatible API: Drop-in replacement for existing codebases.
  • Strong open-source coverage: Excellent for Qwen, DeepSeek, and GLM-4.

Cons:

  • No proprietary models: No GPT-4, Claude, or Gemini.
  • Context limit: 32k tokens is restrictive for advanced use cases.
  • Sustainability risk: A free service may not last forever.

Verdict

素墨API is an excellent tool for Chinese developers who need a free, private, and reliable relay for open-source LLMs. It excels for prototyping, personal projects, and educational use. The API compatibility is seamless, and the zero-logging policy is a rare find.

However, do not rely on it as your sole provider for a production SaaS product. The lack of closed-source models like GPT-4 and the 98% uptime make it a risky primary backend. Use it as a free tier for your users or as a secondary model provider for specific tasks (e.g., Chinese text generation with Qwen). If you need cutting-edge reasoning or massive context windows, look elsewhere. For everything else, this is a fantastic deal.

FAQ

Q: Can I use 素墨API to access GPT-4 from China without a VPN? A: No. 素墨API exclusively routes open-source models (Qwen, DeepSeek, GLM-4, Llama, Mistral). It does not provide access to proprietary models like GPT-4 or Claude.

Q: Is the API truly free, or is there a catch? A: It is genuinely free with no credit card required. There is no “catch” in terms of payment, but the service is limited to open-weight models and has a 32k token context limit. Long-term reliability is not guaranteed.

Q: How do I integrate 素墨API with my existing OpenAI code? A: Change the base_url in your client to https://sumoapi.com/v1 and use your 素墨API key in the Authorization header. The request/response format is identical to OpenAI’s API.

Pricing breakdown

素墨API offers competitive pricing for developers. Here's the breakdown:

PlanPriceQuotaBest for
Free$0/moFree trialKicking the tires
EnterpriseCustomSLA · dedicated supportTeams & agencies

Supported models

5 models across major vendors.

Qwen 2.5 GLM-4 DeepSeek V3 Llama 3 Mistral

Frequently asked questions

Can I access this platform from China without a VPN?

Most relay stations are accessible from Chinese ISPs. Check our review for specific routing details.

What payment methods are accepted?

Payment options vary by platform. Some accept Alipay/WeChat Pay, others are USD/crypto only.

How does this compare to using OpenAI directly?

Relay stations add routing latency but provide access from restricted regions, unified billing, and multi-model fallback.

Is my API key safe?

Keys are encrypted at rest. Most platforms support per-project scoping and IP allow-lists.

Should you use 素墨API?

Free forever API relay with strict no-logging policy. Handles 1B+ tokens daily with open-source model coverage.