Cv | Veritable Tech Blog

[Notes] Understanding XCiT - Part 2

Photo Credit In Part 1, we introduced the XCiT architecture and reviewed the implementation of the Cross-Covariance Attention(XCA) block. In this Part 2, we’ll review the implementation of the Local Patch Interaction(LPI) block and the Class Attention layer. from [1] Local Patch Interaction(LPI) Because there is no explicit communication between patches(tokens) in XCA, a layer consisting of two depth-wise 3×3 convolutional layers with Batch Normalization with GELU non-linearity is added to enable explicit communication. ...

[Notes] Understanding XCiT - Part 1

credit Overview XCiT: Cross-Covariance Image Transformers[1] is a paper from Facebook AI that proposes a “transposed” version of self-attention that operates across feature channels rather than tokens. This cross-covariance attention has linear complexity in the number of tokens (the original self-attention has quadratic complexity). When used on images as in vision transformers, this linear complexity allows the model to process images of higher resolutions and split the images into smaller patches, which are both shown to improve performance. ...

[Tensorflow] Training CV Models on TPU without Using Cloud Storage

Photo Credit Introduction Recently I was asked this question (paraphrasing): I have a small image dataset that I want to train on Google Colab and its free TPU. Is there a way to do that without having to upload the dataset as TFRecord files to Cloud Storage? First of all, if your dataset is small, I’d say training on GPU wouldn’t be much slower than on TPU. But they were adamant that they wanted to see how fast training on TPU can be. That’s fine, and the answer is yes. There is a way to do that. ...

Self-Supervised Domain Adaptation

Photo Credit Introduction Self-supervised learning made transfer learning possible in NLP [1] (by using language modeling as the pre-training task) and has started to show some potential in CV as well [2, 3, 4]. They make the downstream tasks more label efficient, that is, requires fewer labeled examples to achieve good prediction accuracies. In CV, we are already quite familiar with transfer learning from models pre-trained on the labeled Imagenet dataset. However, if the dataset used in the downstream task is significantly different from the Imagenet, transfer learning/fine-tuning usually would not be very helpful. ...

[Learning Note] Single Shot MultiBox Detector with Pytorch — Part 3

(Reminder: The SSD paper and the Pytorch implementation used in this post. Also, the first and second part of the series.) Training Objective / Loss Function Every deep learning / neural network needs a differentiable objective function to learn from. After pairing ground truths and default boxes, and marking the remaining default boxes as background, we’re ready to formulate the objective function of SSD: Overall Objective — Formula (1) from the original paper ...