GLM (General Language Model)

GLM (General Language Model) — GLM-130B

Open, bilingual 100B-scale language model (English + Chinese) optimized for research and accessible inference

See https://github.com/THUDM/GLM-130B and the ICLR 2023 paper (arXiv:2210.02414).

Features

130 billion parameters; dense bidirectional architecture trained with the GLM objective
Bilingual pre-training (≈400B tokens total: ~200B Chinese + ~200B English)
Strong zero-shot and few-shot performance on both English and Chinese benchmarks
INT4 quantization support enabling inference on consumer GPUs (e.g., 4×RTX 3090)
Optimized inference through SAT / FasterTransformer backends for large-GPU setups
Released weights, training scripts, and evaluation code for reproducibility (research-only license)

Superpowers

GLM-130B was designed to democratize access to a 100B-scale model with research-friendly releases and pragmatic engineering choices. It combines strong bilingual understanding with engineering optimizations (quantization, multi-backend inference) so researchers can run or reproduce large-model experiments without requiring hyperscaler-only infrastructure. Key benefits:

Competitive performance vs contemporaries (GPT-3/OPT/BLOOM) on language benchmarks, with particularly strong Chinese performance
Practical deployability: INT4 quantization and careful sharding allow running inference on far cheaper GPU clusters than many other 100B models
Open tooling: training logs, scripts, and packaging helpers are provided to help others reproduce evaluations and adapt the model for research tasks

Model specs (high level)

Parameters: 130B (dense)
Pre-training corpus: ~400B tokens (balance of English and Chinese)
Objective / architecture: GLM pretraining objective (generalized language modeling), supports autoregressive and infilling usages in the GLM family
Checkpoint format: sharded weights (multiple tar shards); repo includes scripts to repackage for different GPU counts

Training & data

Large bilingual dataset with explicit balancing between English and Chinese sources
Training challenges documented in the paper: loss spikes, divergence, and stability concerns — authors share engineering mitigations
Trained at-scale with attention to cross-platform compatibility (NVIDIA, Hygon DCU, Ascend, Sunway)

Quantization & hardware

INT4 quantization techniques demonstrated with negligible performance loss for many tasks
Reference hardware for inference:
- Large setup: single A100 (40G × 8) or V100 (32G × 8) server for standard (FP16) inference
- Consumer-friendly with quantization: 4×RTX 3090 (24G) or 8×RTX 2080 Ti (11G)
Supports optimized inference engines (FasterTransformer, SAT) with reported speedups vs baseline

Benchmarks & performance

LAMBADA, MMLU and other English benchmarks: competitive with GPT-3 (175B) and comparable large models of the period
Chinese benchmarks (CLUE, FewCLUE): substantial wins over contemporaneous Chinese models (reported large gains vs ERNIE TITAN 3.0 in zero-shot/few-shot settings)
Results across ~30+ tasks are reproducible using the provided evaluation code in the repo

Usage examples (research-focused)

Zero-shot prompting for classification, QA, and summarization across English/Chinese inputs
Few-shot adaptation for new tasks (prompting-based) without fine-tuning
Reproducible evaluation: use the repo’s scripts and checkpoint layout to run published benchmarks locally or on a cluster
Research experiments: analyze bilingual transfer, quantization effects, and training stability behaviors using provided logs and training configs

Limitations & license

License: research / non-commercial restrictions — check the repository license and access form before downloading weights
Not intended for unrestricted commercial deployment without separate agreements
Hardware and operational complexity: while more accessible than many 100B models, running or fine-tuning still requires multi-GPU setups and careful engineering
Safety & governance: as with any large LLM, GLM-130B can produce unsafe or biased outputs and should be used with appropriate guardrails in place

GLM-130B (THUNLP) is an early open 100B-scale academic release that influenced subsequent GLM-family work
GLM-4 / GLM-4.x (ZhipuAI / Z.ai) are later, commercially-driven GLM-family models with multimodal and agent-oriented variants (e.g., GLM-4.5, GLM-4.6, GLM-4.5V). These are separate projects sharing the GLM name/lineage but are produced by different organizations and may have different licensing and access models

Pricing / access

Weights: free for approved research use (subject to the repo license and access form)
Hosted APIs / commercial access: third-party providers and downstream vendors may charge for hosted GLM-based services (pricing varies); commercial license for the original repo requires negotiation

Practical tips

When downloading weights, follow the repo’s recommended sharding / unpacking steps and set CHECKPOINT_PATH to the outermost extracted directory
If you change the target GPU count from the repo default, repackage shards using the provided scripts
Validate quantized models on your target tasks — INT4 works well in many cases but always test for edge-case degradation
Reproduce a published evaluation (one of the repo’s benchmark scripts) as a verification step after setup

References & links

GLM-130B GitHub: https://github.com/THUDM/GLM-130B
GLM (ICLR 2023 / arXiv): arXiv:2210.02414
GLM-family ecosystem: ZhipuAI / Z.ai GLM-4 releases (GLM-4.5, GLM-4.6, GLM-4.5V) — separate projects with different licenses and hosting options

ThirdBrAIn.tech

Explorer

GLM (General Language Model) — GLM-130B

GLM (General Language Model) — GLM-130B

Features

Superpowers

Model specs (high level)

Training & data

Quantization & hardware

Benchmarks & performance

Usage examples (research-focused)

Limitations & license

Pricing / access

Practical tips

References & links

Filter Videos

Tags

Channels

Shopping Cart

Table of Contents

Recent Updates

Cursor 2.0 Consolidated youtube reviews

Cursor 2.0 Consolidated youtube reviews

Robotics

AI Tooling

Video topics

Pomelli

Camunda

Vibe for WordPress

Elementor

Mo Gawdat

Backlinks

Explorer

GLM (General Language Model) — GLM-130B

GLM (General Language Model) — GLM-130B

Features

Superpowers

Model specs (high level)

Training & data

Quantization & hardware

Benchmarks & performance

Usage examples (research-focused)

Limitations & license

Related / lineage

Pricing / access

Practical tips

References & links

Filter Videos

Tags

Channels

Shopping Cart

Table of Contents

Recent Updates

Backlinks