Building Gemini CLI Usage Analyzer

Gemini CLI Usage Analyzer Project Banner Introduction Last week, I developed a lightweight command-line tool for analyzing Gemini CLI token usage and open-sourced it on GitHub. You can find the project at ceshine/gemini-cli-usage-analyzer. This post outlines why I built the tool, the technical challenges encountered during development, and the solutions implemented to resolve them. Note: Currently, the tool focuses on single-project analysis. Unlike Claude Code, which centralizes logs (e.g., in ~/.claude on Linux) to analyze overall cross-project usage by default, Gemini CLI lacks a built-in mechanism for unified log management across different projects. Support for aggregating statistics across multiple projects is on the development roadmap. ...

December 6, 2025 · Ceshine Lee

[Notes] MaxViT: Multi-Axis Vision Transformer

Photo Credit MaxViT: Multi-Axis Vision Transformer(1) is a paper jointly produced by Google Research and University of Texas at Austin in 2022. The paper proposes a new attention model, named multi-axis attention, which comprises a blocked local and a dilated global attention module. In addition, the paper introduces MaxViT architecture that combines multi-axis attentions with convolutions, which is highly effective in ImageNet benchmarks and downstream tasks. Multi-Axis Attention Source: [2] ...

July 16, 2023 · Ceshine Lee

[Notes] PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions

Photo Credit Introduction Recall that an one-dimensional Taylor series is an expansion of a real function f(x) about a point x=a [2]: f(x)=f(a)+f(a)(xa)+f(a)2!(xa)2+..+fn(a)n!(xa)n+... We can approximate the cross-entropy loss using the Taylor series (a.k.a. Taylor expansion) using a=1: f(x)=log(x)=0+(1)(1)1(x1)+(1)2(1)2(x1)22+...=j=1(1)j(j1)!j!(x1)j=j=1(1x)jj We can get the expansion for the focal loss simply by multiplying the cross-entropy loss series by (1x)γ: ...

May 15, 2022 · Ceshine Lee

[Notes] Understanding Visual Attention Network

credit Introduction At the start of 2022, we have a new pure convolution architecture (ConvNext)[1] that challenges the transformer architectures as a generic vision backbone. The new Visual Attention Network (VAN)[2] is yet another pure and simplistic convolution architecture that its creators claim to have achieved SOTA results with fewer parameters. Source: [2] What ConvNext tries to achieve is modernizing a standard ConvNet (ResNet) without introducing any attention-based modules. VAN still has attention-based modules, but the attention weights are obtained from a large kernel convolution instead of a self-attention block. To overcome the high computation costs brought by a large kernel convolution, it is decomposed into three components: a spatial local convolution (depth-wise convolution), a spatial long-range convolution (depth-wise dilation convolution), and a channel convolution (1x1 point-wise convolution). ...

March 14, 2022 · Ceshine Lee

[Notes] Understanding ConvNeXt

credit Introduction Hierarchical Transformers (e.g., Swin Transformers[1]) has made Transformers highly competitive as a generic vision backbone and in a wide variety of vision tasks. A new paper from Facebook AI Research — “A ConvNet for the 2020s”[2] — gradually and systematically “modernizes” a standard ResNet[3] toward the design of a vision Transformer. The result is a family of pure ConvNet models dubbed ConvNeXt that compete favorably with Transformers in terms of accuracy and scalability. ...

January 28, 2022 · Ceshine Lee