GLM (General Language Model) — GLM-130B
Open, bilingual 100B-scale language model (English + Chinese) optimized for research and accessible inference
See https://github.com/THUDM/GLM-130B and the ICLR 2023 paper (arXiv:2210.02414).
Features
- 130 billion parameters; dense bidirectional architecture trained with the GLM objective
- Bilingual pre-training (≈400B tokens total: ~200B Chinese + ~200B English)
- Strong zero-shot and few-shot performance on both English and Chinese benchmarks
- INT4 quantization support enabling inference on consumer GPUs (e.g., 4×RTX 3090)
- Optimized inference through SAT / FasterTransformer backends for large-GPU setups
- Released weights, training scripts, and evaluation code for reproducibility (research-only license)
Superpowers
GLM-130B was designed to democratize access to a 100B-scale model with research-friendly releases and pragmatic engineering choices. It combines strong bilingual understanding with engineering optimizations (quantization, multi-backend inference) so researchers can run or reproduce large-model experiments without requiring hyperscaler-only infrastructure. Key benefits:
- Competitive performance vs contemporaries (GPT-3/OPT/BLOOM) on language benchmarks, with particularly strong Chinese performance
- Practical deployability: INT4 quantization and careful sharding allow running inference on far cheaper GPU clusters than many other 100B models
- Open tooling: training logs, scripts, and packaging helpers are provided to help others reproduce evaluations and adapt the model for research tasks
Model specs (high level)
- Parameters: 130B (dense)
- Pre-training corpus: ~400B tokens (balance of English and Chinese)
- Objective / architecture: GLM pretraining objective (generalized language modeling), supports autoregressive and infilling usages in the GLM family
- Checkpoint format: sharded weights (multiple tar shards); repo includes scripts to repackage for different GPU counts
Training & data
- Large bilingual dataset with explicit balancing between English and Chinese sources
- Training challenges documented in the paper: loss spikes, divergence, and stability concerns — authors share engineering mitigations
- Trained at-scale with attention to cross-platform compatibility (NVIDIA, Hygon DCU, Ascend, Sunway)
Quantization & hardware
- INT4 quantization techniques demonstrated with negligible performance loss for many tasks
- Reference hardware for inference:
- Large setup: single A100 (40G × 8) or V100 (32G × 8) server for standard (FP16) inference
- Consumer-friendly with quantization: 4×RTX 3090 (24G) or 8×RTX 2080 Ti (11G)
- Supports optimized inference engines (FasterTransformer, SAT) with reported speedups vs baseline
Benchmarks & performance
- LAMBADA, MMLU and other English benchmarks: competitive with GPT-3 (175B) and comparable large models of the period
- Chinese benchmarks (CLUE, FewCLUE): substantial wins over contemporaneous Chinese models (reported large gains vs ERNIE TITAN 3.0 in zero-shot/few-shot settings)
- Results across ~30+ tasks are reproducible using the provided evaluation code in the repo
Usage examples (research-focused)
- Zero-shot prompting for classification, QA, and summarization across English/Chinese inputs
- Few-shot adaptation for new tasks (prompting-based) without fine-tuning
- Reproducible evaluation: use the repo’s scripts and checkpoint layout to run published benchmarks locally or on a cluster
- Research experiments: analyze bilingual transfer, quantization effects, and training stability behaviors using provided logs and training configs
Limitations & license
- License: research / non-commercial restrictions — check the repository license and access form before downloading weights
- Not intended for unrestricted commercial deployment without separate agreements
- Hardware and operational complexity: while more accessible than many 100B models, running or fine-tuning still requires multi-GPU setups and careful engineering
- Safety & governance: as with any large LLM, GLM-130B can produce unsafe or biased outputs and should be used with appropriate guardrails in place
Related / lineage
- GLM-130B (THUNLP) is an early open 100B-scale academic release that influenced subsequent GLM-family work
- GLM-4 / GLM-4.x (ZhipuAI / Z.ai) are later, commercially-driven GLM-family models with multimodal and agent-oriented variants (e.g., GLM-4.5, GLM-4.6, GLM-4.5V). These are separate projects sharing the GLM name/lineage but are produced by different organizations and may have different licensing and access models
Pricing / access
- Weights: free for approved research use (subject to the repo license and access form)
- Hosted APIs / commercial access: third-party providers and downstream vendors may charge for hosted GLM-based services (pricing varies); commercial license for the original repo requires negotiation
Practical tips
- When downloading weights, follow the repo’s recommended sharding / unpacking steps and set CHECKPOINT_PATH to the outermost extracted directory
- If you change the target GPU count from the repo default, repackage shards using the provided scripts
- Validate quantized models on your target tasks — INT4 works well in many cases but always test for edge-case degradation
- Reproduce a published evaluation (one of the repo’s benchmark scripts) as a verification step after setup
References & links
- GLM-130B GitHub: https://github.com/THUDM/GLM-130B
- GLM (ICLR 2023 / arXiv): arXiv:2210.02414
- GLM-family ecosystem: ZhipuAI / Z.ai GLM-4 releases (GLM-4.5, GLM-4.6, GLM-4.5V) — separate projects with different licenses and hosting options