[Notes] Understanding XCiT - Part 1

credit Overview XCiT: Cross-Covariance Image Transformers[1] is a paper from Facebook AI that proposes a “transposed” version of self-attention that operates across feature channels rather than tokens. This cross-covariance attention has linear complexity in the number of tokens (the original self-attention has quadratic complexity). When used on images as in vision transformers, this linear complexity allows the model to process images of higher resolutions and split the images into smaller patches, which are both shown to improve performance. ...

July 24, 2021 · Ceshine Lee

[Tensorflow] Training CV Models on TPU without Using Cloud Storage

Photo Credit Introduction Recently I was asked this question (paraphrasing): I have a small image dataset that I want to train on Google Colab and its free TPU. Is there a way to do that without having to upload the dataset as TFRecord files to Cloud Storage? First of all, if your dataset is small, I’d say training on GPU wouldn’t be much slower than on TPU. But they were adamant that they wanted to see how fast training on TPU can be. That’s fine, and the answer is yes. There is a way to do that. ...

October 11, 2020 · Ceshine Lee

Self-Supervised Domain Adaptation

Photo Credit Introduction Self-supervised learning made transfer learning possible in NLP [1] (by using language modeling as the pre-training task) and has started to show some potential in CV as well [2, 3, 4]. They make the downstream tasks more label efficient, that is, requires fewer labeled examples to achieve good prediction accuracies. In CV, we are already quite familiar with transfer learning from models pre-trained on the labeled Imagenet dataset. However, if the dataset used in the downstream task is significantly different from the Imagenet, transfer learning/fine-tuning usually would not be very helpful. ...

July 6, 2020 · Ceshine Lee

[Learning Note] Single Shot MultiBox Detector with Pytorch — Part 3

(Reminder: The SSD paper and the Pytorch implementation used in this post. Also, the first and second part of the series.) Training Objective / Loss Function Every deep learning / neural network needs a differentiable objective function to learn from. After pairing ground truths and default boxes, and marking the remaining default boxes as background, we’re ready to formulate the objective function of SSD: Overall Objective — Formula (1) from the original paper ...

July 27, 2017 · Ceshine Lee

[Learning Note] Single Shot MultiBox Detector with Pytorch — Part 2

In the previous post we discussed the network structure and the prediction scheme of SSD. Now we move on to combine default boxes and the ground truth, so the quality of the prediction can be determined (and be improved via training). (Reminder: The SSD paper and the Pytorch implementation used in this post) Map Default Boxes to Coordinates On Input Images Parameters of default boxes for each feature map are pre-calculated and hard-coded in data/config.py: ...

July 26, 2017 · Ceshine Lee