Organizing our code when moving from notebooks to Python scripts.
To have organized code is to have readable, reproducible, scalable and efficient code. We'll cover all of these concepts throughout the scripting lessons.
Let's look at what organizing a code base looks like for our application.
There are several ways to organize our code from the notebooks but they're all based on utility. For example, we're organizing our code based on the part of the pipeline (data, training, prediction, etc.):
app/ ├── api.py - FastAPI app └── cli.py - CLI app tagifai/ ├── config.py - configuration setup ├── data.py - data processing utilities ├── models.py - model architectures ├── predict.py - inference utilities ├── train.py - training utilities └── utils.py - supplementary utilities
Organizing our code base this way also makes it easier for readers to understand (or modify) the code base. We could've also assumed a more granular stance for organization, such as breaking down
preprocess.py, etc. This might make more sense if we have multiple ways of splitting, preprocessing, etc. but for our task, it's sufficient to be at a higher level.
Another way to supplement organized code is through documentation, which we'll cover in the next lesson.
So what's the best way to read a code base like this? We could look at the documentation but that's usually useful if you're looking for specific functions or classes within a script. What if you want to understand the overall functionality and how it's all organized? Well, we can start with the options in
app/cli.py and dive deeper into the specific utilities. Let's say we wanted to see how a single model is trained, then we'd go to the
train_model function and inspect each line and build a mental model of the process. For example, when you reach the line:
# Train artifacts = train.run(args=args)
runto see it's operations:
Operations for training. 1. Set seed 2. Set device 3. Load data 4. Clean data 5. Preprocess data 6. Encode labels 7. Split data 8. Tokenize inputs 9. Create dataloaders 10. Initialize model 11. Train model 12. Evaluate model
When looking a code base for the first, it's a good item to create a mental model of the entire application and writing it down for yourself so you easily navigate it in the future.