Speed Up Your Python Scripts with Rust: A Levenshtein Distance Case Study

Cover image generated by Nano Banana Disclaimer: A 50x speedup is not guaranteed. Actual performance depends on the nature of the dataset and the hardware on which the code is run. Please refer to the Benchmarks section below for more information. Introduction Recently, I finally found some time to learn the Rust programming language. I find its memory safety guarantee quite elegant, although it comes with the trade-off of a steep learning curve, especially when it comes to Rust’s ownership and lifetime system. It is very appealing to someone like me, who mainly uses a scripting language and who writes low-level code only from time to time. Writing C/C++ code can easily lead to unstable runtime behavior or unexpected results in such circumstances. ...

December 21, 2025 · Ceshine Lee

Building Gemini CLI Usage Analyzer

Gemini CLI Usage Analyzer Project Banner Introduction Last week, I developed a lightweight command-line tool for analyzing Gemini CLI token usage and open-sourced it on GitHub. You can find the project at ceshine/gemini-cli-usage-analyzer. This post outlines why I built the tool, the technical challenges encountered during development, and the solutions implemented to resolve them. Note: Currently, the tool focuses on single-project analysis. Unlike Claude Code, which centralizes logs (e.g., in ~/.claude on Linux) to analyze overall cross-project usage by default, Gemini CLI lacks a built-in mechanism for unified log management across different projects. Support for aggregating statistics across multiple projects is on the development roadmap. ...

December 6, 2025 · Ceshine Lee

[Notes] MaxViT: Multi-Axis Vision Transformer

Photo Credit MaxViT: Multi-Axis Vision Transformer(1) is a paper jointly produced by Google Research and University of Texas at Austin in 2022. The paper proposes a new attention model, named multi-axis attention, which comprises a blocked local and a dilated global attention module. In addition, the paper introduces MaxViT architecture that combines multi-axis attentions with convolutions, which is highly effective in ImageNet benchmarks and downstream tasks. Multi-Axis Attention Source: [2] ...

July 16, 2023 · Ceshine Lee

[Notes] PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions

Photo Credit Introduction Recall that an one-dimensional Taylor series is an expansion of a real function $f(x)$ about a point $x = a$ [2]: $$f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + .. + \frac{f^{n}(a)}{n!}(x-a)^n + ...$$ We can approximate the cross-entropy loss using the Taylor series (a.k.a. Taylor expansion) using $a = 1$: $$f(x) = -log(x) = 0 + (-1)(1)^{-1}(x-1) + (-1)^2(1)^{-2}\frac{(x-1)^2}{2} + ... \\ = \sum^{\infty}_{j=1}(-1)^j\frac{(j-1)!}{j!}(x-1)^{j} = \sum^{\infty}_{j=1}\frac{(1-x)^{j}}{j} $$ We can get the expansion for the focal loss simply by multiplying the cross-entropy loss series by $(1-x)^\gamma$: ...

May 15, 2022 · Ceshine Lee

[Notes] Understanding Visual Attention Network

credit Introduction At the start of 2022, we have a new pure convolution architecture (ConvNext)[1] that challenges the transformer architectures as a generic vision backbone. The new Visual Attention Network (VAN)[2] is yet another pure and simplistic convolution architecture that its creators claim to have achieved SOTA results with fewer parameters. Source: [2] What ConvNext tries to achieve is modernizing a standard ConvNet (ResNet) without introducing any attention-based modules. VAN still has attention-based modules, but the attention weights are obtained from a large kernel convolution instead of a self-attention block. To overcome the high computation costs brought by a large kernel convolution, it is decomposed into three components: a spatial local convolution (depth-wise convolution), a spatial long-range convolution (depth-wise dilation convolution), and a channel convolution (1x1 point-wise convolution). ...

March 14, 2022 · Ceshine Lee