Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Trans

PreviousAiluRus: A Scalable ViT Framework for Dense Prediction NextDynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation

Last updated 1 year ago