GPT-1
OpenAI’s first generative pre-trained transformer model.
Overview
GPT-1 was the foundational model that launched the GPT series and demonstrated the viability of using transformers for large-scale language modeling.
Key Information
- Released: June 2018
- Model Size: 117 million parameters
- Architecture: Transformer-based decoder-only architecture
- Training: Unsupervised pre-training on large text corpus with supervised fine-tuning
- Significance: Proof-of-concept showing how transformers could be adapted for language modeling tasks
Historical Impact
GPT-1 established the foundation for the transformer-based language modeling paradigm that would dominate NLP for the following years. It showed that large-scale pre-trained models could be effectively fine-tuned for diverse downstream tasks.