Using huggingface as a hosting / CDN for a pretrained model

tools-4all · November 29, 2024, 12:54pm

Hello!

I have a question regarding using the Hugging Face website as a CDN or hosting service for a pretrained model on my website, as well as for others who may want to use my model.

I have already trained my model using my own GPUs, but I would like to integrate it with Transformers.js and share it with the world. After reviewing the Terms of Service and the pricing page, I’m unsure whether I need to subscribe to a paid tier for this purpose. Since all computations will be performed client-side and not on Hugging Face’s servers, I’m also concerned about potential bandwidth limits.

Could you please provide clarification on whether a paid tier is necessary for hosting my model under these conditions and inform me about any bandwidth restrictions?

Thank you in advance for your assistance!

Yatohem · October 2, 2025, 11:17am

I ran into a similar issue and ended up hosting my model files elsewhere since Hugging Face has size limits unless you’re on a paid plan. Depending on your setup, I’d also compare nextcloud pricing—sometimes it works out better if you’re already using it for private file sharing or want more control over access without worrying about throttling or public links expiring.

John6666 · October 3, 2025, 10:14pm

While upload rate limits may be encountered and recommended file sizes exist, there is generally no strict size cap when making models or datasets public.

Since Transformer.js performs inference on the user’s GPU via WebGPU within the browser, it imposes no load on Hugging Face beyond disk usage, making hosting feasible.

You can host a public model on the Hugging Face Hub and load it in the browser with Transformers.js without any paid tier. Paid plans are only needed if you want higher request rates or more private storage. Docs current on Oct 4, 2025; the rate-limit page states “current rate limits (in September ’25)”. (Hugging Face)

What actually limits you

Request rate, not bandwidth/GB. File downloads use the Hub “Resolver” endpoints. Quotas per 5-minute window: Anonymous 3,000 • Free 5,000 • PRO 12,000 • Team 20,000 • Enterprise 50,000 • Enterprise Plus 100,000. If you exceed this you’ll get HTTP 429. No published per-GB egress caps. (Hugging Face)
Storage and file sizes. Public repos: best-effort “unlimited” for hosting; private storage is 100 GB Free, 1 TB+ on paid plans. Per-repo typical cap ~300 GB unless approved. Hard single-file max 50 GB, 20 GB recommended for better CDN caching. Files are served via CloudFront. (Hugging Face)

Implications

For a browser app that fetches N files per load, your max concurrent loads in 5 minutes ≈ quota ÷ N. Example: Free plan, 5 files per session ⇒ about 1,000 sessions per 5 minutes before 429s. Use an HF token so requests aren’t counted as anonymous. Upgrade only if you hit these ceilings. (Hugging Face)

If you outgrow the Free tier

Upgrade to PRO/Team to raise Resolver limits and private storage. Plans list “higher storage, bandwidth, and API rate limits,” but pricing relates to features, not per-GB egress. (Hugging Face)
Or mirror weights to your own CDN and point the app at it. Transformers.js supports custom model paths or disabling remote Hub fetches:


// https://huggingface.co/docs/transformers.js/en/custom_usage

import { env } from '@huggingface/transformers';

env.localModelPath = '/models/'; // your CDN/static path

env.allowRemoteModels = false; // optional: avoid Hub fetches

(Hugging Face)

Keep it smooth

Shard large weights to <20 GB per file; keep repos ≤300 GB by default. This improves CloudFront cache hit rates and user downloads. (Hugging Face)

Bottom line

No paid tier required for public hosting + client-side inference. Watch Resolver request limits and file/repo size rules; there are no stated per-GB bandwidth caps. Upgrade only for higher request headroom or more private storage. (Hugging Face)

Quick references

Hub Rate limits (5-minute windows, quotas by plan). (Hugging Face)
Hub Storage limits and file size guidance. (Hugging Face)
CloudFront delivery for /resolve downloads. (Hugging Face)
Transformers.js custom model locations. (Hugging Face)
Example GitHub thread on local/static hosting with Transformers.js. (github.com)

Topic		Replies	Views
Huggingface hosting cost calculation 🤗Transformers	2	916	September 12, 2023
Request API access? Remote model access Beginners	3	42	August 25, 2025
What is the difference between transformers and huggingface_hub libraries? Beginners	1	1559	February 10, 2024
What are the limits on saving private models and datasets on the hub? Intermediate	4	1610	April 29, 2024
What is best way to serve huggingface model with API? Beginners	11	43516	August 29, 2023

Using huggingface as a hosting / CDN for a pretrained model

Related topics