Particle operation
==================

First, let's clarify what a particle is. The original definition of a particle is the collection of all parameters of a model. However, a more common and extended meaning refers to a Module instance and the ensemble of its sub-modules. Here, when we refer to a particle, we are talking about this extended meaning.

Operations based on particles are termed **particle operation**.
We have two types of particle operations:

	1. Operations between particles and numbers.
	2. Operations between particles.

nytorch.NytoModule inherits from torch.nn.Module. We can enable the functionality of particle operations by inheriting from NytoModule::

    import nytorch as nyto
    import torch

    class NytoLinear(nyto.NytoModule):
        def __init__(self, w: torch.Tensor, b: torch.Tensor) -> None:
            super().__init__()
            self.weight = torch.nn.Parameter(w)
            self.bias = torch.nn.Parameter(b)

        def forward(self, x: torch.Tensor) -> torch.Tensor:
            return self.weight * x + self.bias


Operations Between Particles and Numbers
------------------------------------------

Operations between particles and numbers apply the operation to each parameter of the particle and return the result as a new particle. For example::

	net = NytoLinear(torch.Tensor([2.]), torch.Tensor([1.]))
	new_net = net + 10

::

	>>> net.weight
	Parameter containing:
	tensor([2.], requires_grad=True)

	>>> new_net.weight
	Parameter containing:
	tensor([12.], requires_grad=True)

	>>> net.bias
	Parameter containing:
	tensor([1.], requires_grad=True)
	
	>>> new_net.bias
	Parameter containing:
	tensor([11.], requires_grad=True)

The following operations are supported::

	pos_net = +net
	neg_net = -net
	pow_net = net ** 2
	add_net = net + 2
	sub_net = net - 2
	mul_net = net * 2
	truediv_net = net / 2
	rpow_net = 2 ** net
	radd_net = 2 + net
	rsub_net = 2 - net
	rmul_net = net * 2
	rtruediv_net = 2 / net

	# Inplace operators
	ipow_net **= 2
	iadd_net += 2
	isub_net -= 2
	imul_net *= 2
	itruediv_net /= 2
	

Operations Between Particles
-------------------------------

Operations between particles apply the operation to the corresponding parameters of the two particles and return the result as a new particle. For example::

	net = NytoLinear(torch.Tensor([2.]), torch.Tensor([1.]))
	new_net = net + net

::

	>>> net.weight
	Parameter containing:
	tensor([2.], requires_grad=True)

	>>> new_net.weight
	Parameter containing:
	tensor([4.], requires_grad=True)

	>>> net.bias
	Parameter containing:
	tensor([1.], requires_grad=True)
	
	>>> new_net.bias
	Parameter containing:
	tensor([2.], requires_grad=True)
	
The following operations are supported::

	pow_net = net ** net
	add_net = net + net
	sub_net = net - net
	mul_net = net * net
	truediv_net = net / net

inplace operators::

	ipow_net **= net
	iadd_net += net
	isub_net -= net
	imul_net *= net
	itruediv_net /= net

.. note::

	It's essential to note that not all combinations of particles can undergo particle operation.
	They must belong to the same **species**.
	Details about species are discussed in the next subsection.
	
	
Species
------------------------

Here, we introduce a new concept. If two particles are derived from the same particle through particle operation or one particle is derived from another particle through particle operation, we say the two particles belong to the same **species**, and we call the collection of particles belonging to the same species a **swarm**.

In other words, whenever we create a new particle through the constructor of NytoModule, we essentially create a new species, and the new particle belongs to this new species. Particle operations can only occur between particles belonging to the same species. If particle operations occur between particles from different species, it will lead to an error::

	net1 = NytoLinear(torch.Tensor([1.]), torch.Tensor([2.]))
	net2 = NytoLinear(torch.Tensor([3.]), torch.Tensor([4.]))
	net3 = net1 + 10

::

	>>> net1 + net2
	AssertionError

	>>> net1 + net3
	NytoLinear()
	
This is because particles from different species cannot guarantee the same structure or shape. While it's possible to check for structure or shape consistency during each particle operation, it's not a common scenario and incurs high computational costs. Hence, we prioritize the efficiency of particle operations, and this approach is not employed.
	
However, sometimes it might be necessary to perform particle operations between particles from different species. In such cases, one can first copy the parameters of a particle to another particle from a different species and then perform particle operations::

	net3.load_state_dict(net2.state_dict())

::

	>>> net1 + net3
	NytoLinear()


Clone and Detach
-----------------

Here, we introduce two related methods: ``clone()`` and ``detach()`` .

``clone()`` returns a new particle with cloned parameters from the original particle, and they belong to the same species::

	net = NytoLinear(torch.Tensor([1.]), torch.Tensor([2.]))
	net_clone = net1.clone()

::

	>>> net.weight is net_clone.weight
	False
	
	>>> net.bias is net_clone.bias
	False

	>>> torch.equal(net.weight, net_clone.weight)
	True
	
	>>> torch.equal(net.bias, net_clone.bias)
	True
	
	>>> net + net_clone
	NytoLinear()

``detach()`` returns a new particle with parameters referencing the original particle, and they belong to different species::

	net = NytoLinear(torch.Tensor([1.]), torch.Tensor([2.]))
	net_detach = net1.clone()

::

	>>> net.weight is net_detach.weight
	True
	
	>>> net.bias is net_detach.bias
	True
	
	>>> net + net_detach
	AssertionError
	

clone_from
-------------

If there's a need to clone particles from another species to the current species, one can use either of the following approaches::

	net1 = NytoLinear(torch.Tensor([1.]),
	                  torch.Tensor([2.]))
	net2 = NytoLinear(torch.Tensor([3.]),
	                  torch.Tensor([4.]))
	                  
	net3 = net1.clone()
	net3.load_state_dict(net2.state_dict())
	
or::

	net1 = NytoLinear(torch.Tensor([1.]),
	                  torch.Tensor([2.]))
	net2 = NytoLinear(torch.Tensor([3.]),
	                  torch.Tensor([4.]))
	
	net3 = net1.clone_from(net2)

Both approaches are equivalent logically.


Randn
----------

Sometimes, it's necessary to introduce randomness. In such cases, one can use the randn() method, which returns a particle with parameters drawn from a standard normal distribution, and they belong to the same species as the original particle::

	net = NytoLinear(torch.Tensor([1.]), torch.Tensor([2.]))
	net_randn = net.randn()

::

	>>> net_randn.weight
	Parameter containing:
	tensor([1.2006], requires_grad=True)
	
	>>> net_randn.bias
	Parameter containing:
	tensor([-1.6793], requires_grad=True)

	>>> net + net_randn
	NytoLinear()


Particle Operation on Submodules
----------------------------------------

.. note::

	We usually consider a module instance and its submodules as a particle,
	because they need to work together to complete a forward pass.
	In Nytorch, there are two definitions for the **root module**:
	
		1. A particle has only one root module.
		2. The module that can traverse all modules in the particle starting from itself.
	
	.. image:: ./image/root_module.png
		:width: 500

Usually, we perform particle operations on the **root module**.
But what happens if we perform particle operations on submodules?

Consider the following example::

    class Layer(nyto.NytoModule):
        def __init__(self, in_size, out_size):
            super().__init__()
            self.lin = nn.Linear(in_size, out_size)
    
        def forward(self, x):
            return self.lin(x)
    
    
    class ResLayer(nyto.NytoModule):
        def __init__(self, in_size, out_size):
            super().__init__()
            self.sub_moudle = Layer(in_size, out_size)
    
        def forward(self, x):
            return self.sub_moudle(x) + x
    
    root_module = ResLayer(12, 2)
    sub_moudle = root_module.sub_moudle

In this example, we have a root module and a submodule. If we perform particle operations on both, new particles corresponding to each module are generated::

    new_root_module = root_module + 10
    new_sub_moudle = sub_moudle + 10

::

    >>> new_root_module
    ResLayer(
      (sub_moudle): Layer(
        (lin): Linear(in_features=12, out_features=2, bias=True)
      )
    )

    >>> new_sub_moudle
    Layer(
      (lin): Linear(in_features=12, out_features=2, bias=True)
    )

    >>> new_root_module.sub_moudle is new_sub_moudle
    False

The resulting new submodule is independent of the new root module.