Logging for ML Systems
Repository
📬 Receive new lessons straight to your inbox (once a month) and join 30K+ developers in learning how to responsibly deliver value with ML.
Intuition
Logging is the process of tracking and recording key events that occur in our applications for the purpose of inspection, debugging, etc. They're a whole lot more powerful than print
statements because they allow us to send specific pieces of information to specific locations with custom formatting, shared interfaces, etc. This makes logging a key proponent in being able to surface insightful information from the internal processes of our application.
Components
There are a few overarching concepts to be aware of:
Logger
: emits the log messages from our application.Handler
: sends log records to a specific location.Formatter
: formats and styles the log records.
There is so much more to logging such as filters, exception logging, etc. but these basics will allows us to do everything we need for our application.
Levels
Before we create our specialized, configured logger, let's look at what logged messages look like by using the basic configuration.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
DEBUG:root:Used for debugging your code. INFO:root:Informative messages from your code. WARNING:root:Everything works but there is something to be aware of. ERROR:root:There's been a mistake with the process. CRITICAL:root:There is something terribly wrong and process may terminate.
These are the basic levels of logging, where DEBUG
is the lowest priority and CRITICAL
is the highest. We defined our logger using basicConfig
to emit log messages to stdout (ie. our terminal console), but we also could've written to any other stream or even a file. We also defined our logging to be sensitive to log messages starting from level DEBUG
. This means that all of our logged messages will be displayed since DEBUG
is the lowest level. Had we made the level ERROR
, then only ERROR
and CRITICAL
log message would be displayed.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
ERROR:root:There's been a mistake with the process. CRITICAL:root:There is something terribly wrong and process may terminate.
Configuration
First we'll set the location of our logs in our config.py
script:
1 2 3 |
|
Next, we'll configure the logger for our application:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
[Lines 6-11]
: define two different Formatters (determine format and style of log messages), minimal and detailed, which use various LogRecord attributes to create a formatting template for log messages.[Lines 12-35]
: define the different Handlers (details about location of where to send log messages):console
: sends log messages (using theminimal
formatter) to thestdout
stream for messages above levelDEBUG
(ie. all logged messages).info
: send log messages (using thedetailed
formatter) tologs/info.log
(a file that can be up to1 MB
and we'll backup the last10
versions of it) for messages above levelINFO
.error
: send log messages (using thedetailed
formatter) tologs/error.log
(a file that can be up to1 MB
and we'll backup the last10
versions of it) for messages above levelERROR
.
[Lines 36-40]
: attach our different handlers to our root Logger.
We chose to use a dictionary to configure our logger but there are other ways such as Python script, configuration file, etc. Click on the different options below to expand and view the respective implementation.
Python script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
Configuration file
-
Place this inside a
logging.config
file:[formatters] keys=minimal,detailed [formatter_minimal] format=%(message)s [formatter_detailed] format= %(levelname)s %(asctime)s [%(name)s:%(filename)s:%(funcName)s:%(lineno)d] %(message)s [handlers] keys=console,info,error [handler_console] class=StreamHandler level=DEBUG formatter=minimal args=(sys.stdout,) [handler_info] class=handlers.RotatingFileHandler level=INFO formatter=detailed backupCount=10 maxBytes=10485760 args=("logs/info.log",) [handler_error] class=handlers.RotatingFileHandler level=ERROR formatter=detailed backupCount=10 maxBytes=10485760 args=("logs/error.log",) [loggers] keys=root [logger_root] level=INFO handlers=console,info,error
-
Place this inside your Python script:
1 2 3 4 5 6 7 8
import logging import logging.config from rich.logging import RichHandler # Use config file to initialize logger logging.config.fileConfig(Path(CONFIG_DIR, "logging.config")) logger = logging.getLogger() logger.handlers[0] = RichHandler(markup=True) # set rich handler
We can load our logger configuration dict like so:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
DEBUG Used for debugging your code. config.py:71 INFO Informative messages from your code. config.py:72 WARNING Everything works but there is something to be aware of. config.py:73 ERROR There's been a mistake with the process. config.py:74 CRITICAL There is something terribly wrong and process may terminate. config.py:75
We use RichHandler for our console
handler to get pretty formatting for the log messages. This is not a preinstalled library so we'll need to install and add to requirements.txt
:
pip install rich==12.4.4
# Add to requirements.txt
rich==12.4.4
Our logged messages become stored inside the respective files in our logs directory:
logs/
├── info.log
└── error.log
And since we defined a detailed formatter, we would see informative log messages like these:
INFO 2020-10-21 11:18:42,102 [config.py:module:72] Informative messages from your code.
Application
In our project, we can replace all of our print statements into logging statements:
1 |
|
1 2 |
|
All of our log messages are at the INFO
level but while developing we may have had to use DEBUG
levels and we also add some ERROR
or CRITICAL
log messages if our system behaves in an unintended manner.
-
what: log all the necessary details you want to surface from our application that will be useful during development and afterwards for retrospective inspection.
-
where: a best practice is to not clutter our modular functions with log statements. Instead we should log messages outside of small functions and inside larger workflows. For example, there are no log messages inside any of our scripts except the
main.py
andtrain.py
files. This is because these scripts use the smaller functions defined in the other scripts (data.py, evaluate.py, etc.). If we ever feel that we the need to log within our other functions, then it usually indicates that the function needs to be broken down further.
The Elastic stack (formerly ELK stack) is a common option for production level logging. It combines the features of Elasticsearch (distributed search engine), Logstash (ingestion pipeline) and Kibana (customizable visualization). We could also simply upload our logs to a cloud blog storage (ex. S3, Google Cloud Storage, etc.).
To cite this content, please use:
1 2 3 4 5 6 |
|