The 30-second summary
+ What we liked
- Top 3 on Claude Speed leaderboard (hvoy.ai)
- Max group available, almost no dilution
- Fair reverse-proxy pricing on some groups
- Always active monitoring
− What we didn't
- Premium pricing similar to PackyCode
- No free trial quota found
- Limited to Claude-focused models
In-depth review
Cubence is a niche player in the AI relay space. It doesn’t try to be everything to everyone. Instead, it markets itself directly to power users of Claude 3.5 Sonnet who care about raw speed. If you are a developer in China who needs to hit Claude APIs without a VPN, and you are tired of shared queues that feel like dial-up, Cubence is worth a serious look.
Pricing & The “No Free Trial” Reality
Cubence operates on a pay-as-you-go model with a starting price of $0 per month (no subscription fee). However, the “free trial” flag is set to False. This means you will need to deposit funds upfront to get an API key. There is no credit card test run.
| Feature | Cubence |
|---|---|
| Monthly Subscription | $0 |
| Free Trial Quota | None |
| Pricing Model | Pay-as-you-go (Reverse proxy) |
| Relative Cost | Premium (Pareto with PackyCode) |
The lack of a free trial is a friction point. For a developer, this is a trust barrier. You are betting that the speed claims hold up before you commit cash. That said, the “fair reverse-proxy pricing on some groups” suggests that if you stick to specific model groups, the per-token cost is more reasonable than the headline premium rate.
Models & API Compatibility
The model library is intentionally small. You get two heavy hitters:
- Claude 3.5 Sonnet (The star of the show)
- GPT-4o
The platform supports standard OpenAI and Anthropic API formats. In China, this means you can point your existing Python scripts, LangChain agents, or Open-WebUI instances directly at their endpoint. No SDK rewrites required.
The 100,000 max token context is a solid middle ground. It is enough for long code files or multi-turn agent conversations, but it is not the 200k limit offered by direct Anthropic access. This is a trade-off for speed.
Performance: The Speed Leaderboard
Cubence claims a Top 3 ranking on the Claude Speed leaderboard (hvoy.ai). This is the primary selling point. They achieve this by running a “max group” architecture with almost no dilution. In plain terms: they are not cramming hundreds of users onto a single API key. The relay is as close to a dedicated connection as a shared service gets.
- Uptime: 98.0% – This is acceptable but not industry-leading. Expect roughly 7 hours of downtime per month.
- Safety Rating: 3/5 – This is a yellow flag. It implies the platform has moderate content filtering. If your workflow involves sensitive data or uncensored reasoning, you may hit blocks.
Pros & Cons
Pros
- Top 3 Claude speed leaderboard ranking (verified by hvoy.ai).
- Max group architecture means almost no request dilution.
- Fair pricing on specific model groups (reverse-proxy).
- 24/7 active monitoring for uptime.
Cons
- Premium pricing is similar to PackyCode (expensive for heavy usage).
- No free trial quota – you pay before you test.
- Limited model selection (Claude-focused, only two models).
Verdict
Cubence is not for the casual tinkerer. It is for the developer who has already benchmarked Claude 3.5 Sonnet against other providers and knows that latency is killing their agent loop. If you are building a real-time coding assistant or a high-frequency reasoning pipeline, the speed gains may justify the premium price and the lack of a free trial.
Skip Cubence if: You need a cheap, general-purpose relay with 10+ models, or if you require a free trial to validate quality.
Use Cubence if: Your primary model is Claude 3.5 Sonnet, you are in China, and you are willing to pay a premium for the fastest possible response times.
FAQ
Q: Can I use Cubence from China without a VPN? A: Yes. Cubence is a relay station designed for users in China. You connect to their API endpoint directly, and they route requests to the upstream providers. No VPN is required for the API calls.
Q: Is Cubence compatible with the OpenAI Python library?
A: Yes. Cubence supports standard OpenAI and Anthropic API formats. You can use the openai Python library by changing the base_url to Cubence’s endpoint and using your Cubence API key.
Q: What does “max group, almost no dilution” mean for my requests? A: It means Cubence does not pool a large number of users onto a single upstream API key. Your requests are less likely to be queued behind others, resulting in lower latency and more consistent response times compared to heavily diluted relays.
Pricing breakdown
Cubence offers competitive pricing for developers. Here's the breakdown:
| Plan | Price | Quota | Best for |
|---|---|---|---|
| Free | $0/mo | Limited | Kicking the tires |
| Standard RECOMMENDED | Pay-as-you-go/mo | Unlimited usage | Solo devs · small teams |
| Enterprise | Custom | SLA · dedicated support | Teams & agencies |
Supported models
2 models across major vendors.
Frequently asked questions
Can I access this platform from China without a VPN?
Most relay stations are accessible from Chinese ISPs. Check our review for specific routing details.
What payment methods are accepted?
Payment options vary by platform. Some accept Alipay/WeChat Pay, others are USD/crypto only.
How does this compare to using OpenAI directly?
Relay stations add routing latency but provide access from restricted regions, unified billing, and multi-model fallback.
Is my API key safe?
Keys are encrypted at rest. Most platforms support per-project scoping and IP allow-lists.
Should you use Cubence?
Speed-sensitive Claude users