Particle operation

First, let’s clarify what a particle is. The original definition of a particle is the collection of all parameters of a model. However, a more common and extended meaning refers to a Module instance and the ensemble of its sub-modules. Here, when we refer to a particle, we are talking about this extended meaning.

Operations based on particles are termed particle operation. We have two types of particle operations:

Operations between particles and numbers.

Operations between particles.

nytorch.NytoModule inherits from torch.nn.Module. We can enable the functionality of particle operations by inheriting from NytoModule:

import nytorch as nyto
import torch

class NytoLinear(nyto.NytoModule):
    def __init__(self, w: torch.Tensor, b: torch.Tensor) -> None:
        super().__init__()
        self.weight = torch.nn.Parameter(w)
        self.bias = torch.nn.Parameter(b)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.weight * x + self.bias

Operations Between Particles and Numbers

Operations between particles and numbers apply the operation to each parameter of the particle and return the result as a new particle. For example:

net = NytoLinear(torch.Tensor([2.]), torch.Tensor([1.]))
new_net = net + 10

>>> net.weight
Parameter containing:
tensor([2.], requires_grad=True)

>>> new_net.weight
Parameter containing:
tensor([12.], requires_grad=True)

>>> net.bias
Parameter containing:
tensor([1.], requires_grad=True)

>>> new_net.bias
Parameter containing:
tensor([11.], requires_grad=True)

The following operations are supported:

pos_net = +net
neg_net = -net
pow_net = net ** 2
add_net = net + 2
sub_net = net - 2
mul_net = net * 2
truediv_net = net / 2
rpow_net = 2 ** net
radd_net = 2 + net
rsub_net = 2 - net
rmul_net = net * 2
rtruediv_net = 2 / net

# Inplace operators
ipow_net **= 2
iadd_net += 2
isub_net -= 2
imul_net *= 2
itruediv_net /= 2

Operations Between Particles

Operations between particles apply the operation to the corresponding parameters of the two particles and return the result as a new particle. For example:

net = NytoLinear(torch.Tensor([2.]), torch.Tensor([1.]))
new_net = net + net

>>> net.weight
Parameter containing:
tensor([2.], requires_grad=True)

>>> new_net.weight
Parameter containing:
tensor([4.], requires_grad=True)

>>> net.bias
Parameter containing:
tensor([1.], requires_grad=True)

>>> new_net.bias
Parameter containing:
tensor([2.], requires_grad=True)

The following operations are supported:

pow_net = net ** net
add_net = net + net
sub_net = net - net
mul_net = net * net
truediv_net = net / net

inplace operators:

ipow_net **= net
iadd_net += net
isub_net -= net
imul_net *= net
itruediv_net /= net

Note

It’s essential to note that not all combinations of particles can undergo particle operation. They must belong to the same species. Details about species are discussed in the next subsection.

Species

Here, we introduce a new concept. If two particles are derived from the same particle through particle operation or one particle is derived from another particle through particle operation, we say the two particles belong to the same species, and we call the collection of particles belonging to the same species a swarm.

In other words, whenever we create a new particle through the constructor of NytoModule, we essentially create a new species, and the new particle belongs to this new species. Particle operations can only occur between particles belonging to the same species. If particle operations occur between particles from different species, it will lead to an error:

net1 = NytoLinear(torch.Tensor([1.]), torch.Tensor([2.]))
net2 = NytoLinear(torch.Tensor([3.]), torch.Tensor([4.]))
net3 = net1 + 10

>>> net1 + net2
AssertionError

>>> net1 + net3
NytoLinear()

This is because particles from different species cannot guarantee the same structure or shape. While it’s possible to check for structure or shape consistency during each particle operation, it’s not a common scenario and incurs high computational costs. Hence, we prioritize the efficiency of particle operations, and this approach is not employed.

However, sometimes it might be necessary to perform particle operations between particles from different species. In such cases, one can first copy the parameters of a particle to another particle from a different species and then perform particle operations:

net3.load_state_dict(net2.state_dict())

>>> net1 + net3
NytoLinear()

Clone and Detach

Here, we introduce two related methods: clone() and detach() .

clone() returns a new particle with cloned parameters from the original particle, and they belong to the same species:

net = NytoLinear(torch.Tensor([1.]), torch.Tensor([2.]))
net_clone = net1.clone()

>>> net.weight is net_clone.weight
False

>>> net.bias is net_clone.bias
False

>>> torch.equal(net.weight, net_clone.weight)
True

>>> torch.equal(net.bias, net_clone.bias)
True

>>> net + net_clone
NytoLinear()

detach() returns a new particle with parameters referencing the original particle, and they belong to different species:

net = NytoLinear(torch.Tensor([1.]), torch.Tensor([2.]))
net_detach = net1.clone()

>>> net.weight is net_detach.weight
True

>>> net.bias is net_detach.bias
True

>>> net + net_detach
AssertionError

clone_from

If there’s a need to clone particles from another species to the current species, one can use either of the following approaches:

net1 = NytoLinear(torch.Tensor([1.]),
                  torch.Tensor([2.]))
net2 = NytoLinear(torch.Tensor([3.]),
                  torch.Tensor([4.]))

net3 = net1.clone()
net3.load_state_dict(net2.state_dict())

or:

net1 = NytoLinear(torch.Tensor([1.]),
                  torch.Tensor([2.]))
net2 = NytoLinear(torch.Tensor([3.]),
                  torch.Tensor([4.]))

net3 = net1.clone_from(net2)

Both approaches are equivalent logically.

Randn

Sometimes, it’s necessary to introduce randomness. In such cases, one can use the randn() method, which returns a particle with parameters drawn from a standard normal distribution, and they belong to the same species as the original particle:

net = NytoLinear(torch.Tensor([1.]), torch.Tensor([2.]))
net_randn = net.randn()

>>> net_randn.weight
Parameter containing:
tensor([1.2006], requires_grad=True)

>>> net_randn.bias
Parameter containing:
tensor([-1.6793], requires_grad=True)

>>> net + net_randn
NytoLinear()

Particle Operation on Submodules

Note

We usually consider a module instance and its submodules as a particle, because they need to work together to complete a forward pass. In Nytorch, there are two definitions for the root module:

A particle has only one root module.

The module that can traverse all modules in the particle starting from itself.

Usually, we perform particle operations on the root module. But what happens if we perform particle operations on submodules?

Consider the following example:

class Layer(nyto.NytoModule):
    def __init__(self, in_size, out_size):
        super().__init__()
        self.lin = nn.Linear(in_size, out_size)

    def forward(self, x):
        return self.lin(x)


class ResLayer(nyto.NytoModule):
    def __init__(self, in_size, out_size):
        super().__init__()
        self.sub_moudle = Layer(in_size, out_size)

    def forward(self, x):
        return self.sub_moudle(x) + x

root_module = ResLayer(12, 2)
sub_moudle = root_module.sub_moudle

In this example, we have a root module and a submodule. If we perform particle operations on both, new particles corresponding to each module are generated:

new_root_module = root_module + 10
new_sub_moudle = sub_moudle + 10

>>> new_root_module
ResLayer(
  (sub_moudle): Layer(
    (lin): Linear(in_features=12, out_features=2, bias=True)
  )
)

>>> new_sub_moudle
Layer(
  (lin): Linear(in_features=12, out_features=2, bias=True)
)

>>> new_root_module.sub_moudle is new_sub_moudle
False

The resulting new submodule is independent of the new root module.