"Tiny Models AI" in Transformers: An approach to very small models, under 1 billion parameters.
Abstract: In recent times, the deep learning community has been inundated with large-scale models boasting billions of parameters. While these models demonstrate impressive results, their computational requirements often surpass the capabilities of everyday applications. Our company has shifted its focus towards "Tiny Models", Transformer architectures designed with 10 to 50 million parameters. By training these models on datasets spanning from 5 million to 100 million tokens, we aim to derive substantial benefits for narrow and specific tasks. In this study, we demonstrate the power of these models in predicting HVAC failure types based on customer calls.
1. Introduction
Transformer architectures, since their inception, have shown exceptional results across various Natural Language Processing tasks. Traditionally, the success of these architectures has often been associated with the scale: the bigger the better. However, we argue that there exists a niche where smaller models, when trained effectively, can provide competitive, if not superior, results for specific domains. Our emphasis on "Tiny Models" is not just a nod to their reduced size but also to their focused efficacy.
2. Model Architecture
Our initial architecture is an encoder-only Transformer with the following notable characteristics:
- Embedding: Size of 30522 with 256 dimensions.
- Positional Encoding: Standard encoding to capture the order of tokens.
- Transformer Blocks: Four blocks each equipped with MultiHead Attention with 8 heads, and MLPs with GELU activation. The model employs a dropout rate of 0.4 to prevent overfitting.
- Classification Head: Originally designed for a vocabulary size of 30522.
This model encapsulates 18MM parameters, placing it squarely in our defined "Tiny Model" range.
3. Pre-Training
The model underwent self-supervised training for a Masked Language Model task on a dataset derived from Wikipedia, consisting of 150K articles. After three epochs, we observed the following metrics:
- Loss: 3.6454
- Accuracy: 41.38%
- Average Loss: 3.6865
- Perplexity: 39.9068
- Top-5 Accuracy: 58.62%
4. Fine-Tuning for HVAC Failure Prediction
Post pre-training, a new instance of the model was created with a classification head adjusted to predict 29 classes, representing various HVAC failure types.
The model was then fine-tuned on a synthetic dataset containing 35K samples of customer calls reporting issues with their HVAC systems. For context, here is a sample call:
TEXT: "There seems to be a problem with the air circulation in our facility...it's definitely affecting everyone's comfort and productivity."
FAILURE TYPE: Airflow Issues
After 40 epochs of training, we achieved an accuracy of 69.17%. Notably, when considering the model's top-2 predictions, the accuracy surged to 83.90%.
5. Discussion
Our results are indicative of the potential of "Tiny Models" in delivering impressive outcomes on specialized tasks. By capitalizing on the benefits of pre-training on a general corpus and then fine-tuning on domain-specific data, we can leverage the power of Transformers without the associated computational overhead.
Furthermore, the substantial increase in top-2 accuracy suggests that, while the model may not always pinpoint the exact failure type, it often comes close. This is particularly valuable in real-world scenarios where a close second guess can still guide effective problem-solving.
6. Conclusion
In the ever-evolving landscape of NLP, there remains a significant place for compact models, especially in domain-specific applications. Our "Tiny Models" initiative confirms that, with the right training strategy, smaller Transformer models can still be giants in their domain-specific tasks.
Acknowledgements: We would like to thank our dedicated team of researchers and engineers who have worked tirelessly on the "Tiny Models" project. We also express our gratitude to the open-source community for their continuous contributions which have been invaluable to our research.