[Notes] PPO, GRPO, and GSPO
Cover image generated by Nano Banana 2 Introduction This blog post provides an overview of the core concepts of the Proximal Policy Optimization (PPO) algorithm and its two variants — the Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) algorithms. This post assumes basic knowledge of reinforcement learning fundamentals, meaning it does not explain terms such as policy, on-policy/off-policy learning, and state value function in detail. ...
A Linux-first Claude Code Status Line Powered by Python
Cover image generated by Nano Banana 2 What This Package Provides I’ve released a Python package on GitHub (ceshine/claude-statusline-for-linux) that provides a two-line status line layout for the Claude Code CLI. The package is highly extensible and customizable, so feel free to fork the repository and adapt it to your own preferences! The package currently provides the following information: Line 1: Session Overview Model name: the active Claude model (e.g., “Sonnet 4.6”) Context window usage: a 16-segment visual bar plus a percentage, color-coded green/yellow/red as usage grows; switches to a prominent warning at ≥90% Session cost: cumulative USD cost for the current Claude Code session Per-call token breakdown: token counts from the latest API call, broken down into four categories: Input tokens (i) Output tokens (o) Cache creation tokens (cw) Cache read tokens (cr) Line 2: Workspace & Limits Working directory: the base name of the current project directory Git status: branch name plus counts of staged (+), unstaged (~), and untracked (?) files; hidden when not in a Git repository 5-hour rate limit usage: percentage of the 5-hour API quota consumed, with a countdown to reset 7-day rate limit usage: percentage of the 7-day API quota consumed, with a countdown to reset Vim mode: current Vim keybinding mode (NORMAL, INSERT, etc.); hidden when vim mode is inactive (Note: According to Claude Code’s documentation, the rate limit information is only available after the first API call.) ...
Ansible-managed systemd Timers
Cover image generated by Nano Banana 2 TL;DR In this post, I introduce a simple Ansible playbook that manages scheduled tasks with systemd timers and provides an AI-friendly interface for running recurring jobs. It offers much greater flexibility than cron. As a bonus, I also present a Python-based orchestrator for Rclone sync jobs, which is useful for setting up automatic cloud backups and works well with the Ansible playbook. ...
Developing an AI-assisted Hacker News Reader
Cover image generated by Nano Banana Pro Motivation Reading trending threads on Hacker News is one of my favorite ways to discover interesting stories and read (mostly) thought-provoking discussions. Since reading all the top stories would be very time-consuming, I use web apps such as Gemini, Google AI Studio, and Claude to have an LLM agent automatically fetch web pages and summarize their content for me. I then quickly browse the summaries and decide which threads I want to read in full. This approach has been quite effective for me. ...
[Notes] Uncovering the Hidden Preprocessing Logic of ColPali
Cover image generated by Nano Banana Pro Introduction I recently came across a course called “Multi-Vector Image Retrieval” by DeepLearning.ai. The course mainly introduces ColPali [1], a vision-language model that generalizes the late-interaction retrieval paradigm pioneered by ColBERT [2], extending it from covering only text tokens to covering both text and visual tokens. It also contains a few tutorials on performance optimization techniques using Qdrant’s Python SDK. It is a great introductory resource, and I recommend it to anyone interested in visual document understanding and retrieval. ...