Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Trans
PreviousAiluRus: A Scalable ViT Framework for Dense PredictionNextDynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation
Last updated