Day 4 Balancing Cost and Quality in FMware, Kirill Vasilevski
AI Summary
Balancing Cost and Quality in FM Deployments
Introduction
- Speaker: Kol, Applied AI Researcher
- Focus: Real-world considerations for deploying Foundation Models (FM).
Choosing the Correct Foundation Model (FM)
- Considerations in selecting FMs for software applications.
- Key factors:
- Cost vs. Quality: OpenAI, AWS, and other FMs can be expensive.
- Model capabilities and licensing.
- Various models available (700,000+ on Hugging Face).
Deploying FM with Firmware
- Techniques to reduce costs and improve decision-making.
Key Methods for FM Deployment
- Model Enhancement: Improving model performance through:
- Parameterization (fine-tuning, reinforcement learning with human feedback).
- Architectural methods (e.g., mixture of experts, prompt engineering).
- Outputs effectiveness requires tuning specific to each model and context.
- Synthesis: Combining outputs from multiple models to improve results.
- Example: Blender combines multiple outputs and ranks them.
- Computationally intensive, potentially high latency.
- Routing: Selecting models based on input:
- Predictive vs. Non-predictive routing.
- Non-predictive: Sequentially test models until satisfactory results.
- Predictive: Choose a model based on input characteristics, saving initial inference costs.
Challenges in Deployment
- Reliance on the quality of training datasets for classifiers.
- Complexity of updating models and the need for retraining.
- Achieving good generalization across different tasks.
Conclusion
- Mix and match methods for optimal model deployment.
- No one-size-fits-all solution; considerations vary based on application and environment.
- Treat FM as a black box to optimize outputs while maintaining quality.