DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin
NLP Applications Short Paper
Session 4A: Jul 6
(17:00-18:00 GMT)
Session 5B: Jul 6
(21:00-22:00 GMT)
Abstract:
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Our approach allows samples to exit earlier without passing through the entire model. Experiments show that DeeBERT is able to save up to ~40% inference time with minimal degradation in model quality. Further analyses show different behaviors in the BERT transformer layers and also reveal their redundancy. Our work provides new ideas to efficiently apply deep transformer-based models to downstream tasks. Code is available at https://github.com/castorini/DeeBERT.
You can open the
pre-recorded video
in a separate window.
NOTE: The SlidesLive video may display a random order of the authors.
The correct author list is shown at the top of this webpage.
Similar Papers
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou,
data:image/s3,"s3://crabby-images/b972a/b972adeb97c6d3a3b444bb259ce76b3e6475d3a6" alt="A representative figure from paper main.195"
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian,
data:image/s3,"s3://crabby-images/deec8/deec865e6483e92a8bc987a35359cf52a6a9b020" alt="A representative figure from paper main.411"
Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning
Joongbo Shin, Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung,
data:image/s3,"s3://crabby-images/5c34e/5c34e8a7db47af6c21cd0b7af6ba5e5c2501cf2e" alt="A representative figure from paper main.76"
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, QI JU,
data:image/s3,"s3://crabby-images/8e194/8e1949917c4efa3363f80b81ef79fe4fab3e05ab" alt="A representative figure from paper main.537"