Technical Notes from Ceshine Lee

Practical deep dives into machine learning and data science. Expect code, critical analysis, and real-world tips. No fluff.

Building Gemini CLI Usage Analyzer

Gemini CLI Usage Analyzer Project Banner Introduction Last week, I developed a lightweight command-line tool for analyzing Gemini CLI token usage and open-sourced it on GitHub. You can find the project at ceshine/gemini-cli-usage-analyzer. This post outlines why I built the tool, the technical challenges encountered during development, and the solutions implemented to resolve them. Note: Currently, the tool focuses on single-project analysis. Unlike Claude Code, which centralizes logs (e.g., in ~/.claude on Linux) to analyze overall cross-project usage by default, Gemini CLI lacks a built-in mechanism for unified log management across different projects. Support for aggregating statistics across multiple projects is on the development roadmap. ...

December 6, 2025 · Ceshine Lee

Reading the State of AI in 2025 Report from McKinsey

taken from the PDF Preamble and a bit of personal story I recently ended a multi-year consulting engagement. I might publish a reflection on this experience sometime in the future, but for now, let’s just say it was quite mentally draining. I didn’t have any energy left to write public blog posts during my tenure there, as evidenced by the lack of new posts here over the past two years. ...

November 23, 2025 · Ceshine Lee

[Notes] MaxViT: Multi-Axis Vision Transformer

Photo Credit MaxViT: Multi-Axis Vision Transformer(1) is a paper jointly produced by Google Research and University of Texas at Austin in 2022. The paper proposes a new attention model, named multi-axis attention, which comprises a blocked local and a dilated global attention module. In addition, the paper introduces MaxViT architecture that combines multi-axis attentions with convolutions, which is highly effective in ImageNet benchmarks and downstream tasks. Multi-Axis Attention Source: [2] ...

July 16, 2023 · Ceshine Lee

[Notes] PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions

Photo Credit Introduction Recall that an one-dimensional Taylor series is an expansion of a real function $f(x)$ about a point $x = a$ [2]: $$f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + .. + \frac{f^{n}(a)}{n!}(x-a)^n + ...$$ We can approximate the cross-entropy loss using the Taylor series (a.k.a. Taylor expansion) using $a = 1$: $$f(x) = -log(x) = 0 + (-1)(1)^{-1}(x-1) + (-1)^2(1)^{-2}\frac{(x-1)^2}{2} + ... \\ = \sum^{\infty}_{j=1}(-1)^j\frac{(j-1)!}{j!}(x-1)^{j} = \sum^{\infty}_{j=1}\frac{(1-x)^{j}}{j} $$ We can get the expansion for the focal loss simply by multiplying the cross-entropy loss series by $(1-x)^\gamma$: ...

May 15, 2022 · Ceshine Lee

[Notes] Understanding Visual Attention Network

credit Introduction At the start of 2022, we have a new pure convolution architecture (ConvNext)[1] that challenges the transformer architectures as a generic vision backbone. The new Visual Attention Network (VAN)[2] is yet another pure and simplistic convolution architecture that its creators claim to have achieved SOTA results with fewer parameters. Source: [2] What ConvNext tries to achieve is modernizing a standard ConvNet (ResNet) without introducing any attention-based modules. VAN still has attention-based modules, but the attention weights are obtained from a large kernel convolution instead of a self-attention block. To overcome the high computation costs brought by a large kernel convolution, it is decomposed into three components: a spatial local convolution (depth-wise convolution), a spatial long-range convolution (depth-wise dilation convolution), and a channel convolution (1x1 point-wise convolution). ...

March 14, 2022 · Ceshine Lee