Organizing a Code Repository for ML Applications
To have organized code is to have readable, reproducible, scalable and efficient code. We'll cover all of these concepts throughout the scripting lessons.
There are several ways to organize our code from the notebooks but they're all based on utility. For example, we're organizing our code based on pipeline components (data processing, training, evaluation, prediction, etc.):
1 2 3 4 5 6 7 8 9 10 11 12 13
Don't worry about what all these different scripts do just yet! We'll be creating and going through them in the subsequent lessons.
Organizing our code base this way also makes it easier for us to understand (or modify) the code base. We could've also assumed a more granular stance for organization, such as breaking down
preprocess.py, etc. This might make more sense if we have multiple ways of splitting, preprocessing, etc. but for our task, it's sufficient to be at a higher level.
Another way to supplement organized code is through documentation.
So what's the best way to read a code base like this? We could look at the documentation but that's usually useful if you're looking for specific functions or classes within a script. What if you want to understand the overall functionality and how it's all organized? Well, we can start with the operations defined in
tagifai/main.py and dive deeper into the specific workflows (training, optimization, etc.).
For example, if we inspect the
run() function that's responsible for training, we inspect the various steps involved. We can dive as deep as we'd like which really depends on your task (general understanding, modifying or extend the code base, etc.). Similarly, we can also zoom out and see which modules use this
run() function, such as CLI/API endpoints, etc.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
When looking a code base for the first, it's a good item to create a mental model of the entire application and writing it down for yourself so you easily navigate it in the future.
To cite this lesson, please use:
1 2 3 4 5 6