Introduction

In this section, we’ll quickly introduce the basic functionalities of Nytorch along with usage examples.

Particle Operation

Nytorch doesn’t reinvent the wheel but rather provides particle operation functionalities built upon the existing capabilities of PyTorch. When using Nytorch, simply change the inheritance object of the model from torch.nn.Module to nytorch.NytoModule. As nytorch.NytoModule is a subclass of torch.nn.Module, it provides corresponding implementations of all parent class methods.

Below is a simple example:

import nytorch as nyto
import torch

class NytoLinear(nyto.NytoModule):
    def __init__(self) -> None:
        super().__init__()
        self.weight = torch.nn.Parameter(torch.Tensor([2.]))
        self.bias = torch.nn.Parameter(torch.Tensor([1.]))

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.weight * x + self.bias

net: NytoLinear = NytoLinear()

Particle operation allows the model to perform operations similar to tensors to obtain a new model. For example, net obtains a new NytoLinear instance through particle operation:

net2: NytoLinear = net + 10

Compare the results of forward propagation between net and net2:

x = torch.arange(4)  # tensor([0, 1, 2, 3])
print(net(x))        # tensor([1., 3., 5., 7.], grad_fn=<MulBackward0>)
print(net2(x))       # tensor([11., 23., 35., 47.], grad_fn=<MulBackward0>)

Compare the parameters of net and net2:

print(net.weight)   # Parameter containing: tensor([2.], requires_grad=True)
print(net2.weight)  # Parameter containing: tensor([12.], requires_grad=True)
print(net.bias)     # Parameter containing: tensor([1.], requires_grad=True)
print(net2.bias)    # Parameter containing: tensor([11.], requires_grad=True)

We can observe that the result of adding a scalar to the model is equivalent to applying the scalar to all parameters of the model. Similar principles apply to other arithmetic operations such as subtraction or multiplication. When the operands are models, the operation acts on corresponding parameter positions:

net3: NytoLinear = net + net2

print(net3.weight)  # Parameter containing: tensor([14.], requires_grad=True)
print(net3.bias)    # Parameter containing: tensor([12.], requires_grad=True)

Besides arithmetic particle operations, there are operations to generate a new model from an existing one. For instance, if a model with the same structure but random parameters is needed, the randn() method can be used. This method returns a model with parameters sampled from a standard normal distribution:

net_randn: NytoLinear = net.randn()

print(net_randn.weight)  # Parameter containing: tensor([0.3187], requires_grad=True)
print(net_randn.bias)    # Parameter containing: tensor([0.1877], requires_grad=True)

If a model needs to be copied and parameters cloned from the original model, the clone() method can be used:

net_clone: NytoLinear = net.clone()

print(net_clone.weight is net.weight)             # False
print(torch.equal(net_clone.weight, net.weight))  # True

Benefits of Particle Operation

Why do we need functionalities like particle operation? What benefits does it bring to us? If you’re curious about these questions, we have a dedicated section to discuss this topic. In simple terms, such functionalities are for researching and using hybrid evolutionary algorithms and gradient descent algorithms.

Even if you’re not familiar with evolutionary algorithms, particle operation is not something new in deep learning. It’s used in momentum encoders in MoCo or target networks in DDQN.

Here’s another example with momentum. In fact, momentum also utilizes particle operation:

\[ \begin{align}\begin{aligned}W_t = W_{t-1} + V_t\\V_t = \beta V_{t-1} + G_t\end{aligned}\end{align} \]

The above is the common expression of momentum, where we need to calculate momentum and gradients to update weights. However, momentum can be expressed in the form of particle operation as:

\[W_t = (W_{t-1} + G_t) + \beta (W_{t-1} - W_{t-2})\]

This means that momentum is equivalent to adding \(\beta (W_{t-1} - W_{t-2})\) to gradient descent.

In code, it would look something like this:

net1, net0 = gradient_descent(net1) + beta*(net1 - net0), net1

So, algorithms that might have been difficult to implement in the past can now be implemented elegantly and simply through Nytorch.

Below is an example:

from nytorch import NytoModule
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as f


# load Iris dataset
iris = load_iris()
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
x_train = torch.FloatTensor(x_train)
x_test = torch.FloatTensor(x_test)
y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

# Hyperparameter
BETA = 0.9
LR = 0.005
criterion = nn.CrossEntropyLoss()


def train_module(net: 'MyModel', lr: float) -> 'MyModel':
    new_net = net.clone()
    optimizer = torch.optim.SGD(new_net.parameters(), lr=lr)
    optimizer.zero_grad()
    loss = criterion(new_net(x_train), y_train)
    loss.backward()
    optimizer.step()
    return new_net


@torch.no_grad()
def eval_module(net: nn.Module) -> np.float64:
    net.eval()
    _, predicted = torch.max(net(x_test), 1)
    net.train()
    return accuracy_score(y_test, predicted)


# MyModel class
class MyModel(NytoModule):
    def __init__(self, in_feat: int, h_size: int, out_feat: int):
        super().__init__()
        self.layer1 = nn.Linear(in_feat, h_size)
        self.layer2 = nn.Linear(h_size, out_feat)

    def forward(self, inpts):
        h_out = f.relu(self.layer1(inpts))
        return self.layer2(h_out)


# init network
net0: MyModel = MyModel(4, 12 ,3)
net1: MyModel = train_module(net0, LR)

# trainig
for epoch in range(100):
    momentum: MyModel = BETA * (net1 - net0)
    net1, net0 = train_module(net1, LR) + momentum, net1

    if (epoch+1) % 10 == 0:
        accuracy = eval_module(net1)
        print(f"{epoch=} acc={accuracy:.2f}")