Advantages over unstructured PyTorch

Models become hardware agnostic
Code is clear to read because engineering code is abstracted away
Easier to reproduce
Make fewer mistakes because lightning handles the tricky engineering
Keeps all the flexibility (LightningModules are still PyTorch modules), but removes a ton of boilerplate
Lightning has dozens of integrations with popular machine learning tools.
Tested rigorously with every new PR. We test every combination of PyTorch and Python supported versions, every OS, multi GPUs and even TPUs.
Minimal running speed overhead (about 300 ms per epoch compared with pure PyTorch).

Basic Trainer

https://pytorch-lightning.readthedocs.io/en/0.7.3/lightning-module.html

Code

from pytorch_lightning import Trainer

import os

import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl


class SimpleLightningModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_nb):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        tensorboard_logs = {'train_loss': loss}
        return {'loss': loss, 'log': tensorboard_logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

train_loader = DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)

mnist_model = SimpleLightningModel()
trainer = pl.Trainer(gpus=None, progress_bar_refresh_rate=20, max_epochs=1)    
trainer.fit(mnist_model, train_loader)

GPU available: False, used: False
TPU available: None, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 7.9 K 
--------------------------------
7.9 K     Trainable params
0         Non-trainable params
7.9 K     Total params
0.031     Total estimated model params size (MB)
c:\programdata\miniconda3\lib\site-packages\pytorch_lightning\utilities\distributed.py:51: UserWarning: The {log:dict keyword} was deprecated in 0.9.1 and will be removed in 1.0.0
Please use self.log(...) inside the lightningModule instead.
# log on a step or aggregate epoch metric to the logger and/or progress bar (inside LightningModule)
self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True)
  warnings.warn(*args, **kwargs)
c:\programdata\miniconda3\lib\site-packages\pytorch_lightning\utilities\distributed.py:51: UserWarning: Detected KeyboardInterrupt, attempting graceful shutdown...
  warnings.warn(*args, **kwargs)

If you def train_dataloader, Trainer will use it automatically.

def train_dataloader(self):
    # REQUIRED
    return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)

SimpleLightningModel.train_dataloader  = train_dataloader

pl_model = SimpleLightningModel()
trainer = Trainer(max_epochs=1)
trainer.fit(pl_model)

GPU available: False, used: False
TPU available: None, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 7.9 K 
--------------------------------
7.9 K     Trainable params
0         Non-trainable params
7.9 K     Total params
0.031     Total estimated model params size (MB)

training_step(), train_dataloader(),configure_optimizers() are essential for LightningModule.

Lifecycle The methods in the LightningModule are called in this order:

__init__
prepare_data
configure_optimizers
train_dataloader

If you define a validation loop then val_dataloader

And if you define a test loop: test_dataloader

You will find Trainer.fit() automatically do validation and testing for you.

def validation_step(self, batch, batch_nb):
    # OPTIONAL
    x, y = batch
    y_hat = self(x)
    return {'val_loss': F.cross_entropy(y_hat, y)}

def validation_epoch_end(self, outputs):
    # OPTIONAL
    avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
    tensorboard_logs = {'val_loss': avg_loss}
    print("Validation Loss: ", avg_loss)
    return {'val_loss': avg_loss, 'log': tensorboard_logs}

def val_dataloader(self):
    # OPTIONAL
    return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)

SimpleLightningModel.validation_step = validation_step
SimpleLightningModel.validation_epoch_end = validation_epoch_end
SimpleLightningModel.val_dataloader = val_dataloader

pl_model = SimpleLightningModel()
trainer = Trainer(max_epochs=2)
trainer.fit(pl_model)

GPU available: False, used: False
TPU available: None, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | l1   | Linear | 7.9 K 
--------------------------------
7.9 K     Trainable params
0         Non-trainable params
7.9 K     Total params
0.031     Total estimated model params size (MB)
c:\programdata\miniconda3\lib\site-packages\pytorch_lightning\utilities\distributed.py:51: UserWarning: Detected KeyboardInterrupt, attempting graceful shutdown...
  warnings.warn(*args, **kwargs)

Validation Loss:  tensor(2.3084)
Validation Loss:  tensor(1.1287)

Note

If you are running the above cell, you will see validation progress bar in action.

By using the trainer you automatically get: * Tensorboard logging * Model checkpointing * Training and validation loop * early-stopping

Pytorch nn.Module versus pl.LightningModule

import torch
import pytorch_lightning as pl
from torch import nn

x = torch.rand((10,10))
x

tensor([[0.0745, 0.0237, 0.4719, 0.6037, 0.6015, 0.0921, 0.5982, 0.4860, 0.0959,
         0.5204],
        [0.2481, 0.2893, 0.5760, 0.3834, 0.6479, 0.0508, 0.5352, 0.5702, 0.4732,
         0.3867],
        [0.3467, 0.3321, 0.8570, 0.0983, 0.9210, 0.1848, 0.7397, 0.1350, 0.2646,
         0.7202],
        [0.6952, 0.8071, 0.1428, 0.3600, 0.1514, 0.2246, 0.8887, 0.9971, 0.0257,
         0.5519],
        [0.7547, 0.7165, 0.3677, 0.6642, 0.9991, 0.6585, 0.8673, 0.5005, 0.1843,
         0.1360],
        [0.1809, 0.0794, 0.5101, 0.6751, 0.2822, 0.6695, 0.8085, 0.2127, 0.7562,
         0.9859],
        [0.5914, 0.4481, 0.5107, 0.0032, 0.9766, 0.4627, 0.1520, 0.2915, 0.4323,
         0.3833],
        [0.6371, 0.7782, 0.7762, 0.4197, 0.2566, 0.7240, 0.0759, 0.9976, 0.6020,
         0.9528],
        [0.7674, 0.4044, 0.3497, 0.9784, 0.9318, 0.7313, 0.2962, 0.6555, 0.5570,
         0.9998],
        [0.1155, 0.8013, 0.7982, 0.5713, 0.2252, 0.4513, 0.8395, 0.7791, 0.1929,
         0.7707]])

class SimplePytorchModel(nn.Module):
    ...

torch_model = SimplePytorchModel()
torch_model(x)

NotImplementedError:

In python, a NotImplementedError usually appears when you inherit an abstract class, it is a way to tell you that you should implement forward method.

class SimplePytorchModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10,10)
    def forward(self,x):
        return self.linear(x)
        
torch_model = SimplePytorchModel()
torch_model(x)

tensor([[-0.1243,  0.2997,  0.0861,  0.1849,  0.7241,  0.2632, -0.0680, -0.2111,
         -0.2606,  0.0837],
        [-0.0055,  0.1734,  0.2746,  0.1991,  0.6859,  0.2768,  0.0025, -0.2273,
         -0.1930,  0.2122],
        [-0.1407,  0.2008,  0.3773,  0.0956,  0.9796,  0.1915,  0.2936, -0.0837,
         -0.3146,  0.0808],
        [-0.0511,  0.1153,  0.2846,  0.2106,  0.7390,  0.0737, -0.1066, -0.3968,
         -0.3212,  0.2819],
        [-0.3408,  0.3093,  0.3826,  0.0783,  0.5542,  0.1298, -0.1768, -0.1407,
         -0.4774,  0.1776],
        [-0.1892,  0.2563,  0.1489, -0.0091,  0.4639,  0.1332, -0.0166, -0.3798,
         -0.4021,  0.2960],
        [-0.1463,  0.0375,  0.4741,  0.0881,  0.5674, -0.0446,  0.1802, -0.2256,
         -0.3006,  0.0376],
        [-0.1006, -0.1654,  0.3519,  0.3158,  0.5454, -0.0781,  0.0866, -0.4032,
         -0.5419,  0.2580],
        [-0.4006,  0.3089,  0.3450, -0.1411,  0.4353, -0.0416, -0.1630, -0.4652,
         -0.7266,  0.1949],
        [-0.1350,  0.0554,  0.1492,  0.4462,  0.8991,  0.2545,  0.1237, -0.1321,
         -0.4591,  0.2725]], grad_fn=<AddmmBackward>)

pl.LightningModule is a higher level class for nn.Module.

class SimpleLightningModel(pl.LightningModule):
    ...
    
pl_model = SimpleLightningModel()
pl_model(x)

NotImplementedError:

It shouldn’t surprise you the same error pop out again, after all, pl.LightningModule is a high level wrapper for nn.Module. So we need to implement what is the forward method too. We can confirm this with this line.

issubclass(pl.LightningModule, nn.Module)

True

class SimpleLightningModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10,10)
        
    def forward(self,x):
        return self.linear(x)
    
pl_model = SimpleLightningModel()
pl_model(x)

tensor([[-1.9430e-01, -3.2665e-01,  1.5439e-01, -9.5051e-02, -2.6667e-01,
          7.0515e-01,  5.4318e-01,  4.8522e-02,  2.2087e-01,  4.6927e-02],
        [-1.9757e-01, -4.1862e-01,  1.0334e-01, -1.7735e-01, -3.7793e-01,
          7.6570e-01,  5.1128e-01, -5.9839e-04,  2.5192e-01,  9.6547e-02],
        [-2.1917e-01, -3.4533e-01,  1.6259e-01, -3.4603e-02, -5.8233e-01,
          7.6317e-01,  4.2289e-01, -5.8673e-02,  1.8833e-01,  9.4830e-02],
        [ 1.8358e-01, -4.9185e-01,  3.7877e-01, -2.4924e-03,  8.9796e-02,
          8.3502e-01,  6.2751e-01, -8.9419e-02,  5.8510e-01,  4.9892e-01],
        [-4.1500e-01, -5.1444e-01,  3.3273e-01, -1.9838e-01, -2.7256e-01,
          7.2250e-01,  3.3026e-01, -3.0803e-01,  4.8670e-01, -7.5673e-02],
        [-3.1485e-01, -5.7277e-01,  1.1172e-01,  2.0040e-01, -1.3642e-01,
          1.1535e+00,  4.7762e-01,  1.8485e-01, -1.2243e-01, -7.5894e-02],
        [-4.0921e-01, -4.7966e-01,  6.6770e-02, -2.1177e-01, -6.4936e-01,
          6.5091e-01,  1.9740e-01, -2.5598e-01,  6.5671e-02,  1.9597e-01],
        [-9.3814e-02, -6.7715e-01,  1.8347e-01, -2.4216e-01, -2.0083e-01,
          1.1088e+00,  4.1320e-01, -3.5082e-01,  1.6069e-01,  6.4193e-01],
        [-4.7541e-01, -8.7359e-01,  2.3989e-01, -3.2175e-01, -2.7573e-01,
          9.9955e-01,  3.8217e-01, -2.8564e-01,  1.1412e-02,  7.2301e-02],
        [-1.6360e-03, -3.6030e-01,  2.6286e-01,  5.9354e-02,  7.0063e-02,
          1.0381e+00,  5.0484e-01, -8.8854e-02,  3.9800e-01,  3.4168e-01]],
       grad_fn=<AddmmBackward>)

Pytorch Dataloader versus pl.DataMoudle

A DataModule implements 5 key methods: * prepare_data (things to do on 1 GPU/TPU not on every GPU/TPU in distributed mode, e.g. split data). * setup (things to do on every accelerator in distributed mode, e.g. download data). * train_dataloader the training dataloader. * val_dataloader the val dataloader(s). * test_dataloader the test dataloader(s).

Note

Why do we need to to setup? It’s more a design choice, the benefit of doing so is that the framework takes care how to do distributed training in most efficient way. On the other hand, if you only doing local training on 1 GPU, there is not much benefit of doing so.

Trainer.tune()

    def tune(self, model, train_dataloader, val_dataloaders, datamodule):
        # Run auto batch size scaling
        if self.trainer.auto_scale_batch_size:
            if isinstance(self.trainer.auto_scale_batch_size, bool):
                self.trainer.auto_scale_batch_size = 'power'
            self.scale_batch_size(
                model,
                mode=self.trainer.auto_scale_batch_size,
                train_dataloader=train_dataloader,
                val_dataloaders=val_dataloaders,
                datamodule=datamodule,
            )

        # Run learning rate finder:
        if self.trainer.auto_lr_find:
            self.lr_find(model, update_attr=True)

The main usage of Trainer.tune() is to automatically find the best learning rate and batch size according to your model.

Now Back to our Lab1 (training/run_experiment.py)

I slightly modified the script so it can be run inside a notebook instead of using argparse. We change these arguments to variable instead.

python3 training/run_experiment.py --model_class=MLP --data_class=MNIST --max_epochs=5 --gpus=1 --fc1=4 --fc2=8

# Add current directory so we can import the library
import os, sys
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), "text_recognizer"))

parser = _setup_parser()
args = parser.parse_args([
    '--model_class',
    'MLP',
    '--data_class',
    'MNIST',
    '--max_epochs',
    '5',
    '--gpus',
    '0',
    '--fc1',
    '4',
    '--fc2',
    '8',
    ])

data_class = _import_class(f"text_recognizer.data.{args.data_class}")
model_class = _import_class(f"text_recognizer.models.{args.model_class}")

data = data_class(args)
model = model_class(data_config=data.config(), args=args)

if args.loss not in ('ctc', 'transformer'):
    lit_model_class = lit_models.BaseLitModel

if args.load_checkpoint is not None:
    lit_model = lit_model_class.load_from_checkpoint(args.load_checkpoint, args=args, model=model)
else:
    lit_model = lit_model_class(args=args, model=model)

logger = pl.loggers.TensorBoardLogger("training/logs")

callbacks = [pl.callbacks.EarlyStopping(monitor="val_loss", mode="min", patience=10)]
args.weights_summary = "full"  # Print full summary of the model

trainer = pl.Trainer.from_argparse_args(args, callbacks=callbacks, logger=logger, default_root_dir="training/logs")
trainer.tune(lit_model, datamodule=data)  # If passing --auto_lr_find, this will set learning rate
trainer.fit(lit_model, datamodule=data)
trainer.test(lit_model, datamodule=data)

trainer.tune(lit_model, datamodule=data)  # If passing --auto_lr_find, this will set learning rate
trainer.fit(lit_model, datamodule=data)
trainer.test(lit_model, datamodule=data)

First line try to find the optimal batch size
Second line try to trains 5 epochs
Run test defined in DataModule