Example Code ================== In previous chapters, various features and techniques of Nytorch have been discussed. This chapter provides a practical example to demonstrate its application. Purpose ------------- This chapter serves two main purposes: 1. **Introduction of ParticleModule**: Earlier tutorials excluded ParticleModule for brevity, which is insufficient in practical training scenarios. Therefore, this chapter introduces its integration. 2. **Combining Evolutionary Algorithm and Gradient Descent**: Nytorch facilitates the integration of these methods. Here, both algorithms are employed in training: Gradient Descent optimizes parameters in most iterations, while periodically, every 5 iterations, the Evolutionary Algorithm adjusts a subset of the swarm to explore better solutions efficiently. For the Evolutionary Algorithm phase, we adopt an approach similar to Accelerated Particle Swarm Optimization, updating models based on: .. math:: W_{i,t} = (1 - \alpha) W_{i,t-1} + \alpha W_{g,t-1} where: * :math:`W_{i,t}` is particle i at time *t*. * :math:`W_{i,t-1}` is particle i at time *t-1*. * :math:`W_{g,t-1}` is the best-known particle in the swarm at time *t-1*. * :math:`\alpha` is a scalar. To optimize distributed models across nodes with high communication costs, adjustments include: 1. Reduce communication frequency 2. Optimizing only a subset of the swarm per iteration. Reducing communication frequency involves periodic use of Evolutionary Algorithm, such as every 5 iterations, while Gradient Descent optimizes in other iterations. Optimizing a subset of the swarm involves selecting random particles for optimization, enhancing efficiency. Example Content -------------------------- Let's begin by configuring the training parameters:: from nytorch import NytoModule, ParticleModule from nytorch.particle_module import PMProduct from random import choices, random import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import Subset, DataLoader, random_split from torchvision import datasets, transforms BATCH_SIZE = 64 TRAIN_BATCH_NUM = 256 TEST_BATCH_NUM = 16 POOL_SIZE = 12 SWARM_SIZE = 6 LR = 0.01 ALPHA = 0.5 SWARM_INTERVAL = 16 PRINT_INTERVAL = 16 DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") We use the MNIST dataset for demonstration, selecting only a subset for the example:: full_train_dataset = datasets.MNIST('mnist', train=True, download=True, transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])) full_test_dataset = datasets.MNIST('mnist', train=False, transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])) train_size = TRAIN_BATCH_NUM * BATCH_SIZE test_size = TEST_BATCH_NUM * BATCH_SIZE train_dataset, _ = random_split(full_train_dataset, [train_size, len(full_train_dataset)-train_size]) test_dataset, _ = random_split(full_test_dataset, [test_size, len(full_test_dataset)-test_size]) train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False) Next, we define the model:: class ConvNet(NytoModule): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 10, 5) self.conv2 = nn.Conv2d(10, 20, 3) self.fc = nn.Linear(20*10*10, 10) def forward(self, x): in_size = x.size(0) out = self.conv1(x) out = F.relu(out) out = F.max_pool2d(out, 2, 2) out = self.conv2(out) out = F.relu(out) out = out.view(in_size,-1) out = self.fc(out) out = F.log_softmax(out,dim=1) return out class ConvModel: @classmethod def from_product(cls, product, device): assert isinstance(product, PMProduct) return cls(product.module(), device) def __init__(self, particle, device): assert isinstance(particle, ParticleModule) self.device = device self.particle = particle self.optimizer = optim.SGD(self.particle.parameters(), lr=LR) self.particle.to(self.device) def product(self): return self.particle.product() def train(self, data, target): data, target = data.to(self.device), target.to(self.device) self.particle.train() self.optimizer.zero_grad() loss = F.nll_loss(self.particle(data), target) loss.backward() self.optimizer.step() return loss.item() def test(self, data, target): data, target = data.to(self.device), target.to(self.device) self.particle.eval() with torch.no_grad(): output = self.particle(data) loss = F.nll_loss(output, target, reduction='sum').item() pred = output.max(1, keepdim=True)[1] correct = pred.eq(target.view_as(pred)).sum().item() return loss, correct We also create a decorator for ConvNet called ConvModel, which wraps the optimizer and training/testing methods. The ``product`` method returns a PMProduct instance for particle operations, and the ``from_product`` method transforms the PMProduct instance back to ConvModel after particle operations. Since we are using a swarm-based algorithm, we need some swarm operations during training, which we wrap into functions::: def create_pool(size, device): assert size >= 2 pool = [ParticleModule(ConvNet()) for _ in range(size)] p0 = pool[0] return [ConvModel(p0.clone_from(p), device) for p in pool[1:]] + [ConvModel(p0, device)] def test_model(model, test_loader): test_loss = 0 total_correct = 0 for data, target in test_loader: loss, correct = model.test(data, target) test_loss += loss total_correct += correct test_loss /= len(test_loader.dataset) test_acc = total_correct / len(test_loader.dataset) return test_loss, test_acc def swarm_algorithm(pool, swarm_size, loss_list, alpha): assert 0 < swarm_size <= len(pool) == len(loss_list) assert 1 > alpha > 0 idx_list = choices(list(range(len(pool))), k=swarm_size) idx_loss_list = [(idx, loss_list[idx]) for idx in idx_list] idx_loss_list = sorted(idx_loss_list, key=lambda idx_loss: idx_loss[1]) best_seed_idx, _ = idx_loss_list[0] for i, (idx, loss) in enumerate(idx_loss_list): if idx == best_seed_idx: continue seed0 = pool[best_seed_idx].product() seed1 = pool[idx].product() new_product = alpha*seed0 + (1-alpha)*seed1 pool[idx] = ConvModel.from_product(new_product, pool[idx].device) def train_pool(pool, train_loader, test_loader, swarm_size, swarm_interval=4, alpha=0.5, print_interval=8): assert len(pool) >= swarm_size >= 2 assert swarm_interval > 0 assert 1 > alpha > 0 assert print_interval > 0 for batch_idx, (data, target) in enumerate(train_loader): loss_list = [model.train(data, target) for model in pool] if (batch_idx+1)%swarm_interval == 0: swarm_algorithm(pool, swarm_size, loss_list, alpha) if batch_idx==0 or (batch_idx+1)%print_interval == 0: print(f"batch: {batch_idx:>3} Accuracy: ", end='') for idx, model in enumerate(pool): _, acc = test_model(model, test_loader) print(f"[{idx}]{acc:.2f}", end=' ') print() We pay special attention to the techniques used in ``create_pool`` and ``swarm_algorithm``. In ``create_pool``, we use ParticleModule to wrap NytoModule to eliminate circular references and reduce memory pressure. In ``swarm_algorithm``, we use the ``product`` method to transform to PMProduct instances for particle operations, and then transform back to ParticleModule instances in a batch to reduce unnecessary conversions. Finally, we start training:: if __name__ == '__main__': pool = create_pool(POOL_SIZE, DEVICE) train_pool(pool, train_loader, test_loader, SWARM_SIZE, SWARM_INTERVAL, ALPHA, PRINT_INTERVAL) print("End") print("Accuracy: ", end='') for idx, model in enumerate(pool): _, acc = test_model(model, test_loader) print(f"[{idx}]{acc:.2f}", end=' ') Below is the output of the program:: batch: 0 Accuracy: [0]0.11 [1]0.06 [2]0.09 [3]0.07 [4]0.08 [5]0.07 [6]0.11 [7]0.17 [8]0.15 [9]0.13 [10]0.13 [11]0.20 batch: 15 Accuracy: [0]0.37 [1]0.43 [2]0.51 [3]0.60 [4]0.63 [5]0.36 [6]0.22 [7]0.44 [8]0.62 [9]0.62 [10]0.58 [11]0.63 batch: 31 Accuracy: [0]0.47 [1]0.47 [2]0.61 [3]0.69 [4]0.73 [5]0.57 [6]0.55 [7]0.56 [8]0.70 [9]0.55 [10]0.69 [11]0.63 batch: 47 Accuracy: [0]0.70 [1]0.67 [2]0.72 [3]0.83 [4]0.84 [5]0.69 [6]0.77 [7]0.67 [8]0.81 [9]0.77 [10]0.80 [11]0.75 batch: 63 Accuracy: [0]0.79 [1]0.75 [2]0.80 [3]0.80 [4]0.84 [5]0.80 [6]0.78 [7]0.78 [8]0.81 [9]0.78 [10]0.73 [11]0.82 batch: 79 Accuracy: [0]0.81 [1]0.84 [2]0.84 [3]0.84 [4]0.87 [5]0.82 [6]0.84 [7]0.82 [8]0.84 [9]0.82 [10]0.87 [11]0.87 batch: 95 Accuracy: [0]0.85 [1]0.83 [2]0.85 [3]0.85 [4]0.88 [5]0.83 [6]0.84 [7]0.80 [8]0.85 [9]0.85 [10]0.86 [11]0.86 batch: 111 Accuracy: [0]0.87 [1]0.88 [2]0.87 [3]0.88 [4]0.89 [5]0.84 [6]0.87 [7]0.85 [8]0.82 [9]0.87 [10]0.89 [11]0.89 batch: 127 Accuracy: [0]0.87 [1]0.86 [2]0.86 [3]0.87 [4]0.87 [5]0.85 [6]0.87 [7]0.87 [8]0.86 [9]0.85 [10]0.87 [11]0.87 batch: 143 Accuracy: [0]0.87 [1]0.86 [2]0.85 [3]0.84 [4]0.87 [5]0.83 [6]0.87 [7]0.86 [8]0.87 [9]0.87 [10]0.87 [11]0.87 batch: 159 Accuracy: [0]0.86 [1]0.83 [2]0.82 [3]0.84 [4]0.88 [5]0.87 [6]0.87 [7]0.83 [8]0.87 [9]0.83 [10]0.88 [11]0.88 batch: 175 Accuracy: [0]0.89 [1]0.89 [2]0.89 [3]0.88 [4]0.90 [5]0.90 [6]0.90 [7]0.90 [8]0.90 [9]0.90 [10]0.90 [11]0.90 batch: 191 Accuracy: [0]0.89 [1]0.88 [2]0.89 [3]0.88 [4]0.89 [5]0.89 [6]0.89 [7]0.89 [8]0.89 [9]0.88 [10]0.89 [11]0.89 batch: 207 Accuracy: [0]0.90 [1]0.90 [2]0.90 [3]0.81 [4]0.90 [5]0.90 [6]0.90 [7]0.90 [8]0.90 [9]0.90 [10]0.90 [11]0.90 batch: 223 Accuracy: [0]0.90 [1]0.90 [2]0.90 [3]0.88 [4]0.90 [5]0.90 [6]0.90 [7]0.90 [8]0.90 [9]0.90 [10]0.90 [11]0.90 batch: 239 Accuracy: [0]0.90 [1]0.90 [2]0.90 [3]0.88 [4]0.90 [5]0.90 [6]0.90 [7]0.90 [8]0.89 [9]0.89 [10]0.90 [11]0.90 batch: 255 Accuracy: [0]0.90 [1]0.90 [2]0.90 [3]0.87 [4]0.90 [5]0.90 [6]0.90 [7]0.90 [8]0.90 [9]0.90 [10]0.90 [11]0.90 End Accuracy: [0]0.90 [1]0.90 [2]0.90 [3]0.87 [4]0.90 [5]0.90 [6]0.90 [7]0.90 [8]0.90 [9]0.90 [10]0.90 [11]0.90 As training progresses, particle performance converges, demonstrating the Evolutionary Algorithm's efficacy. Initially impactful, its influence diminishes as parameters converge. By slowing Evolutionary Algorithm convergence, particles explore better solutions, though computational overhead increases. Summary -------- This chapter detailed Nytorch usage for model training, emphasizing Gradient Descent and Evolutionary Algorithm optimization. Techniques included encapsulating NytoModule with ParticleModule and using PMProduct for particle operations, fostering deeper Nytorch application insights.