# Organization

Organizing our code when moving from notebooks to Python scripts.

## Intuition

To have organized code is to have readable, reproducible, scalable and efficient code. We'll cover all of these concepts throughout the scripting lessons.

## Application

Let's look at what organizing a code base looks like for our application.

### Organizing

There are several ways to organize our code from the notebooks but they're all based on utility. For example, we're organizing our code based on the part of the pipeline (data, training, prediction, etc.):

app/
├── api.py        - FastAPI app
└── cli.py        - CLI app
tagifai/
├── config.py     - configuration setup
├── data.py       - data processing utilities
├── models.py     - model architectures
├── predict.py    - inference utilities
├── train.py      - training utilities
└── utils.py      - supplementary utilities


Organizing our code base this way also makes it easier for readers to understand (or modify) the code base. We could've also assumed a more granular stance for organization, such as breaking down data.py into split.py, preprocess.py, etc. This might make more sense if we have multiple ways of splitting, preprocessing, etc. but for our task, it's sufficient to be at a higher level.

Note

Another way to supplement organized code is through documentation, which we'll cover in the next lesson.

So what's the best way to read a code base like this? We could look at the documentation but that's usually useful if you're looking for specific functions or classes within a script. What if you want to understand the overall functionality and how it's all organized? Well, we can start with the options in app/cli.py and dive deeper into the specific utilities. Let's say we wanted to see how a single model is trained, then we'd go to the train_model function and inspect each line and build a mental model of the process. For example, when you reach the line:

# Train
artifacts = train.run(args=args)

you'll want to go to train.pyrun to see it's operations:
Operations for training.
1. Set seed
2. Set device
4. Clean data
5. Preprocess data
6. Encode labels
7. Split data
8. Tokenize inputs