Apache Airflow is great for managing scheduled workflows, but in a lot of cases, it is an overkill and brings unnecessary complexity to the overall solution. Cron jobs are much easier to set up, have built-in support in most systems, and have a very flat learning curve. However, the lack of monitoring features and the consequential silent failures can be the bane of system admins' lives.
We want a simple solution that can help admins monitor the health of cron jobs in simple scenarios that do not warrant Airflow. The simple scenarios have the following characteristics:
- Only a handful of jobs to monitor. A good rule of thumb: fewer than 12 cumulative job runs per day.
- No dependencies between jobs.
- Compute resources are limited. We don’t want to host another back-end server just to monitor the cron jobs.
- The job is straight-forward or well-tested, so we rarely have to read the log file.
- The job is not mission-critical, which typically requires you to respond to 100% of the execution errors and respond within an hour. Examples include periodic maintenance or backup scripts.
We also only focus on running python scripts as cron jobs. You can always wrap shell scripts inside Python, so this shouldn’t be a big problem.
Honorable Mention: Cronhub
Cronhub is a SaaS that lets you get instant alerts when any of your background jobs fail silently or run longer than expected. You only need to append a Ping API call at the end of your cron command. It’s simple yet powerful.
There are several advantages of the Telegram solution proposed below over Cronhub:
- Python-specific messages. Cronhub is designed for generic jobs, and there is no way to pass additional information about the job execution(e.g., the error messages or the return values.).
- Not depending on external services (other than the Telegram API).
- The monitoring scheme is stored in the codebase.
Solution: A Python Decorator
My solution is to create a decorator called
telegram_wrapper that is heavily based on the
telegram_sender from huggingface/knockknock. The knockknock implementation is designed for training machine learning models. I tweaked the message templates and added a few options to make it more suitable for monitoring background tasks.
You need to have a Telegram client and create a Telegram bot. Send the first message to the bot, and then use the token of your bot to find your chat_id by visiting https://api.telegram.org/bot
First, you need to install
pip install https://github.com/ceshine/cronhelpers/archive/master.zip
And wrapper the function that contains the cron job with
from cronhelpers import telegram_wraooer ## This one only sends a message when the cron job failed @telegram_wrapper( "your_token", "your_chat_id", name="jobName", send_at_start=False, send_on_success=False) def simple_func(arg): return arg if __name__ == main(): simple_func()
Then set up the crontab as usual:
10 0,12 * * * /path/to/python some_script.py
Usually, I would containerize the python environment, and the crontab entry would look like this:
10 0,12 * * * docker run --rm somecontainer >> /home/ceshine/somejob.log 2>&1
send_at_start=Trueto get a message when a job starts. This is usually not necessary but could be useful if it’s a long-running job and you’d like to see the job starts at say 8 pm sharp.
send_on_success=Falseto only get a message when a job crashes. Beware that if the job is killed or the machine is shut down you still get no message at all. So use it only when your machine is stable and not crowded.
telegram_wrappershould be able to catch any exceptions raised by the wrapped function, but it is impossible to handle hard crashes (job killed or machine shut down) inside Python. The only way to detect a hard crash is to read the Telegram chat log to see if the expected message shows up. Hopefully, this happens only occasionally. And please make sure your job is tested on the target machine before writing it into the crontab.