This is an implementation of ULMFiT for genomics classification using Pytorch and Fastai. The model architecture used is based on the AWD-LSTM model, consisting of an embedding, three LSTM layers, and a final set of linear layers. The ULMFiT approach uses three training phases to produce a classification model: • Train a language model on a large, unlabeled corpus • Fine tune the language model on the classification corpus • Use the fine tuned language model to initialize a classification model

Don't forget to tag @kheyer in your comment.

Interested in anything related to deep learning, biotech, energy, materials
Share this project
Similar projects
Strategies to Expand Data for Specialized Genomics Problems
How can we train a model that captures the specialized aspects of a data-limited problem and benefits from large amounts of related training data?
Large SVDs - Dask + CuPy + Zarr + Genomics
Using Dask to perform Singular Value Decomposition on large datasets