====================================
this repository includes the elementary knowledge of pytorch.
- Tensors
from __future__ import print_function
import torch
# Construct a 5x3 matrix, uninitialized
x = torch.Tensor(5, 3)
print(x)
Out:
0.2285 0.2843 0.1978
0.0092 0.8238 0.2703
0.1266 0.9613 0.2472
0.0918 0.2827 0.9803
0.9237 0.1946 0.0104
[torch.FloatTensor of size 5x3]
print(x.size())
Out:
torch.Size([5, 3])
- operations
y=torch.rand(5,3)
[note] the followings are the same
(1) print(x+y)
(2) print(torch.add(x,y))
(3)
result = torch.Tensor(5,3)
torch.add(x,y,out=result)
print(result)
(4)
# in-place addition, add x to y
y.add_(x)
print(y)
NOTE: in-place operations: post-fixed with '_', eg. x.copy_(y), x.t_() * Numpy Bridge ---------------------------------------------
Torch tensors <==> Numpy array
they share the same momery locations, and changing one will change the other.
(1) torch tensor => numpy array
a = torch.ones(5)
print(a)
Out:
1
1
1
1
1
[torch.FloatTensor of size 5]
b = a.numpy()
print(b)
Out:
[ 1. 1. 1. 1. 1.]
a.add_(1)
print(a)
print(b)
Out:
2
2
2
2
2
[torch.FloatTensor of size 5]
[ 2. 2. 2. 2. 2.]
(2) numpy array => torch tensor
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a,1,out=a)
print(a)
print(b)
- CUDA tensors
Tensors can be moved onto GPU using the .cuda function.
# only if CUDA is available
if toch.cuda.is_available():
x = x.cuda()
y = y.cuda()
x+y
NOTE: 100+ tensor operations
Central to all neural networks in PyTorch is the autograd
package.
- Variable
autograd.Variable
is the central class of the package. It wraps a Tensor, and supports nearly all of operations defined on it. Once you finish your computation you can call .backward()
and have all the gradients computed automatically.
.data
you can access the raw tensor through the .data
attribute
.grad
while the gradient w.r.t. this variable is accumulated into .grad
.
Fuction
There's one more class which is very important for autograd implementation - a Fuction
.
Variable
and Function
, encode a complete history of computation. Each Variable has a .grad_fn
attribute that references a Fuction
that has created the Variable
(except for Variables created by the user - their grad_fn
is None).
If you want to compute the derivatives, you can call .backward()
on a Variable
. If the Variable
is a scalar, you don't need to specify any arguments to backward()
, else you need to specify grad_output
argument that is a tensor of matching shape. (需要指定一个和tensor的形状相匹配的grad_output参数。)
import torch
from torch.autograd improt Varible
# Create a variable
x = Variable(torch.ones(2,2),requires_grad = True)
print(x)
# Do an operation of variable
y = x+2
# y is crated as a result of an operation, so it has a grd_fn
print(y.grad_fn)
# Do more operations on y
z = y * y * 3
out = z.mean()
print(z, out)
Out:
Variable containing:
27 27
27 27
[torch.FloatTensor of size 2x2]
Variable containing:
27
[torch.FloatTensor of size 1]
- Gradients
backprop:
out.backward()
is equivalent to out.backward(torch.Tensor([1.0]))
.
# print gradients d(out)/dx
out.backward()
print(x.grad)
Out:
Variable containing:
4.5000 4.5000
4.5000 4.5000
[torch.FloatTensor of size 2x2]
x.torch.randn(3)
x = Variable(x,requires_grad=True)
y = x*2
while y.data.norm() < 1000
y = y*2
print(y)
gradients = torch.FloatTensor([0.1,1.0,0.0001])
y.backward(gradients)
print(x.grad)
Out:
Variable containing: -- y
682.4722
-598.8342
692.9528
[torch.FloatTensor of size 3]
Variable containing: -- dy/dx
102.4000
1024.0000
0.1024
[torch.FloatTensor of size 3]
NOTE: Automatic differentiation package - torch.autograd
Neural networks can be constructed using the torch.nn
package.
nn
depends on autograd
to define models and differentiable them.
nn.Module
contains layers, and a method forward(input)
that returns the output
.
The mnist network classifies digit images. It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.
A typical training procedure for a neural network is as follows:
- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate gradient
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution kernel
self.conv1 = nn.Conv2d(1, 6, 5) # conv layer
self.conv2 = nn.Conv2d(6, 16, 5)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120) # fc layer, input dim, out put dim
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x)) # view: reshape the tensor(feature map) into array
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
NOTE:
view
is similar to reshape
.
import torch
a = torch.range(1,16) # 16 included
a = a.view(4,4)
then a
will be a 4*4 tensor.
What's the meaning of -1? If there is any situation that you don't know how many rows you want but are sure of the number of columns then you can mention it as -1(You can extend this to tensors with more dimensions. Only one of the axis value can be -1). This is a way of telling the library; give me a tensor that has these many columns and you compute the appropriate number of rows that is necessary to make this happen.
You just have to define the forward
function, and the backward
function (where gradients are computed) is automatically defined for you using autograd
.
The learnable parameters of a model are returned by net.parameters()
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight
Out:
10
torch.Size([6, 1, 5, 5])
The input to the forward is an autograd.Variable
, and so is the output.
input = Variable(torch.randn(1, 1, 32, 32)) # batch size, image channel, H, W
out = net(input)
print(out)
Variable containing:
-0.0431 0.1465 0.0130 -0.0784 -0.0989 -0.0063 0.1443 -0.0105 0.1308 0.0281
[torch.FloatTensor of size 1x10]
Zero the gradient buffers of all parameters and backprops with random gradients:
net.zero_grad()
out.backward(torch.randn(1, 10))
NOTE:
The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample. For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width.
If you have a single sample, just use input.unsqueeze(0)
to add a fake batch dimension.
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.
MSELoss
computes the mean-squared error between the input and the target.
output = net(input)
target = Variable(torch.arange(1, 11)) # a dummy target, for example
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
To backpropogate the error all we have to do is to loss.backward()
. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.
net.zero_grad()
print('conv1.bias.grad befine backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
Out:
conv1.bias.grad before backward
Variable containing:
0
0
0
0
0
0
[torch.FloatTensor of size 6]
conv1.bias.grad after backward
Variable containing:
-0.0390
0.1407
0.0613
-0.1214
-0.0129
-0.0582
[torch.FloatTensor of size 6]
NOTE: more modules and loss functions defined in the nerual network package
The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):
weight = weight - learning_rate * gradient
We can implement this using simple python code:
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
torch.optim
various update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc, are encapsulated in the torch.optim
package.
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(),lr = 0.01)
# in your training loop
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output,target)
loss.backward()
optimizer.step() # does the update
When you deal with image, text, audio or video data, you can use standard python packages that load data into a numpy array. Then you can convery this array in to a torch.*Tensor
.
- For images, packages such as Pillow, OpenCV are useful.
- For audio, packages such as scipy and librosa
- For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful.
Specifically for vision
, the package torchvision
is useful. It has data loaders for common datasets such as Imagenet, CIFAR10, MNIST, etc. and data transformers for images, viz., torchvison.datasets
and torch.utils.data.DataLoader
.
For the CIFAR10 dataset, it has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’.
steps of training an image classifier:
- Load and normalizing the CIFAR10 training and test datasets using
torchvision
- Define a Convolution Neural Network
- Define a loss function
- Train the network on the training data
- Test the network in the test data
import torch
import torchvision
improt torchvision.transforms as transforms
The output of torchvision datasets are PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1]
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data',transform = transform)
trainloader = torch.utils.data.DataLoader(trainset,batch_size=4,shuffle=True,num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data',train=False,download=True,transform=transform)
testloader = torch.utils.data.DataLoader(testset,batch_size=4,shuffle=False,num_workers=2)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Out:
Downloading http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
Files already downloaded and verified
NOTE:
transforms.Compose
将多个transform
组合起来使用,对图像、数据进行一些变换。
for details: pytorch torchvision transform
Show some of the training images,
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
det imshow(img):
img = img/2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg,(1,2,0))) # c h w -> h w c
# get some random training images
dataiter = iter(trainloader)
images,labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.joint('%5s' % classes[labels[j]] for j in range(4)))
Out:
frog car deer plane
Fig. 3 a batch of training images
NOTE:
iter
迭代器为类序列对象提供了一个类序列的接口。也可以迭代不是序列但是表现出序列行为的对象,如字典的键,文字的行等。
#iter and generator
#the first try
#=================================
i = iter('abcd')
print i.next()
print i.next()
print i.next()
s = {'one':1,'two':2,'three':3}
print s
m = iter(s)
print m.next()
print m.next()
print m.next()
Out:
D:\Scirpt\Python\Python高级编程>python ch2_2.py
a
b
c
{'three': 3, 'two': 2, 'one': 1}
three
two
one
case of 3 channel images
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3,6,5)
self.pool = nn.MaxPool2d(2,2)
self.conv2 = nn.Conv2d(6,16,5)
self.fc1 = nn.Linear(16*5*5,120)
self.fc2 = nn.Linear(120,84)
self.fc3 = nn.Linear(84,10)
def forward(self,x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1,16*5*5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Use a Classification Cross-Entropy loss and SGD momentum
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(),lr=0.001,momentum=0.9)
We simply have to loop over our data iterator, and feed the inputs to the network and optimize.
for epoch in range(2): # Loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader,0):
# get the inputs
input, labels = data
# wrap them in Variable
inputs, labels = Variable(inputs), Variable(labels)
# zero the paramenter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.data[0]
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d,%5d] loss: %.3f' % (epoch+1, i+1, running_loss/2000))
running_loss = 0.0
print('Finished Training')
Out:
[1, 2000] loss: 2.193
[1, 4000] loss: 1.814
[1, 6000] loss: 1.653
[1, 8000] loss: 1.571
[1, 10000] loss: 1.470
[1, 12000] loss: 1.454
[2, 2000] loss: 1.374
[2, 4000] loss: 1.342
[2, 6000] loss: 1.337
[2, 8000] loss: 1.300
[2, 10000] loss: 1.297
[2, 12000] loss: 1.271
Finished Training
We will check this by predicting the class label that the neural network outputs, and checking it against the ground-truth. If the prediction is correct, we add the sample to the list of correct predictions.
- display an image from the test set
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
GroundTruth: cat ship ship plane
- what the neural network thinks these examples above are:
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data,1)
print('Predicted: ',' '.join('%5s' % classes[predicted[j]] for j in range(4)))
Out:
Predicted: frog car ship plane
- how the network performs on the whole dataset
correct = 0
total = 0
for data in testloader:
images, labels = data
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
Out:
Accuracy of the network on the 10000 test images: 55 %
- class-wise performance
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
for data in testloader:
images, labels = data
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i]
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
Out:
Accuracy of plane : 69 %
Accuracy of car : 61 %
Accuracy of bird : 31 %
Accuracy of cat : 35 %
Accuracy of deer : 65 %
Accuracy of dog : 26 %
Accuracy of frog : 69 %
Accuracy of horse : 62 %
Accuracy of ship : 67 %
Accuracy of truck : 68 %
Just like how you transfer a Tensor on to the GPU, you transfer the neural net onto the GPU. This will recursively go over all modules and convert their parameters and buffers to CUDA tensors: ··· net.cuda() ···
Remember that you will have to send the inputs and targets at every step to the GPU too:
inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())
inspected the weights and the gradients,
print(net.conv1.weight.grad.size())
print(net.conv1.weight.data.norm()) # norm of the weight
print(net.conv1.weight.grad.data.norm()) # norm of the gradients
hook
inspecting / modifying the output and grad_output of a layer.
You can register a function on a Module or a Variable. The hook can be a forward hook or a backward hook. The forward hook will be executed when a forward call is executed. The backward hook will be executed in the backward phase.
- foward hook
def printnorm(self, input, output):
# input is a tuple of packed inputs
# output is a Variable. output.data is the Tensor we are interested
print('Inside ' + self.__class__.__name__ + ' forward')
print('')
print('input: ', type(input))
print('input[0]: ', type(input[0]))
print('output: ', type(output))
print('')
print('input size:', input[0].size())
print('output size:', output.data.size())
print('output norm:', output.data.norm())
net.conv2.register_forward_hook(printnorm)
out = net(input)
Out:
Inside Conv2d forward
input: <class 'tuple'>
input[0]: <class 'torch.autograd.variable.Variable'>
output: <class 'torch.autograd.variable.Variable'>
input size: torch.Size([1, 10, 12, 12])
output size: torch.Size([1, 20, 8, 8])
output norm: 16.448492454427257
- backward hook
def printgradnorm(self, grad_input, grad_output):
print('Inside ' + self.__class__.__name__ + ' backward')
print('Inside class:' + self.__class__.__name__)
print('')
print('grad_input: ', type(grad_input))
print('grad_input[0]: ', type(grad_input[0]))
print('grad_output: ', type(grad_output))
print('grad_output[0]: ', type(grad_output[0]))
print('')
print('grad_input size:', grad_input[0].size())
print('grad_output size:', grad_output[0].size())
print('grad_input norm:', grad_input[0].data.norm())
net.conv2.register_backward_hook(printgradnorm)
out = net(input)
err = loss_fn(out, target)
err.backward()
Out:
Inside Conv2d forward
input: <class 'tuple'>
input[0]: <class 'torch.autograd.variable.Variable'>
output: <class 'torch.autograd.variable.Variable'>
input size: torch.Size([1, 10, 12, 12])
output size: torch.Size([1, 20, 8, 8])
output norm: 16.448492454427257
Inside Conv2d backward
Inside class:Conv2d
grad_input: <class 'tuple'>
grad_input[0]: <class 'torch.autograd.variable.Variable'>
grad_output: <class 'tuple'>
grad_output[0]: <class 'torch.autograd.variable.Variable'>
grad_input size: torch.Size([1, 10, 12, 12])
grad_output size: torch.Size([1, 20, 8, 8])
grad_input norm: 0.10571633312468412
A full and working MNIST example is located here
minst