GPT-1

OpenAI’s first generative pre-trained transformer model.

Overview

GPT-1 was the foundational model that launched the GPT series and demonstrated the viability of using transformers for large-scale language modeling.

Key Information

  • Released: June 2018
  • Model Size: 117 million parameters
  • Architecture: Transformer-based decoder-only architecture
  • Training: Unsupervised pre-training on large text corpus with supervised fine-tuning
  • Significance: Proof-of-concept showing how transformers could be adapted for language modeling tasks

Historical Impact

GPT-1 established the foundation for the transformer-based language modeling paradigm that would dominate NLP for the following years. It showed that large-scale pre-trained models could be effectively fine-tuned for diverse downstream tasks.

See Also