The latest release of Pytorch 1.0 by Facebook marks another major milestone for the open source Deep Learning platform. It is increasingly making it easier for developers to build Machine Learning capabilities into their applications while testing their code is real time. In this piece about Pytorch Tutorial, I talk about the new platform in Deep Learning.
The latest version of the platform brings a lot of new capabilities to the table and is clocking vibrant support from the whole industry. It is remarkable how Pytorch is being touted as a serious contender to Google’s Tensorflow just within a couple of years of its release. Its popularity is mainly being driven by a smoother learning curve and a cleaner interface, which is providing developers with a more intuitive approach to build neural networks.
In fact, according to Soumith Chintala, AI Research Engineer at Facebook and creator of Pytorch, the inability to leverage the available Machine Learning libraries in an ideal development scenario is what prompted them to build something specific. “Google’s TensorFlow was released in 2015. We tried using it but were not super happy with it. Before this, we tried Caffe1, Theano, and Torch. At the time, we were using Torch and Caffe1 for research and production. The field has changed a lot, and we felt a new tool was needed. Looked like nobody else was building it, not the way we thought will be needed in the future. So, we felt we should build it.”
So what’s new in Pytorch 1.0? Here are the highlights of the new release:
- Torch Script
PyTorch 1.0 introduces JIT for model graphs that revolve around the concept of Torch Script which is a restricted subset of the Python language. It has its very own compiler and transform passes, optimizations, etc. Class and method annotations are used to indicate the scripts as a part of the Python code. Examples include @torch.jit.script, and @torch.jit.script_method. Annotations help to preserve elements such as loops, print statements, and control flow.
- Although it should be noted that you need to remove the annotations in the case you want to:
- Debug the scripts using standard Python tools
Switch to eager execution modeSubsets that are valid Torch Scripts include:
- Tensors and numeric primitives
- If statements
- Simple Loops
- Code organizations with nn.Module
- Tuples, lists, strings, and print
- Gradient propagation through script functions
- In-place updates to tensors and lists
- Direct use of standard nn.Modules such as nn/ConvCalling functions like grad() or backwards() within @scripts
You can work with Torch Scripts in two ways:
- Tracing Mode
- Scripting Mode
- Tracing Mode
The Pytorch tracer (torch.jit.trace) records native Pytorch operations that are executed in a code region. Along with this, it also records the data dependencies between them. Although it has had a tracer since version 0.3, it can now re-execute the tracer for you by leveraging a high-performance C++ runtime environment. The trace no longer needs to be executed elsewhere as the latest version has made it possible to integrate the optimizations and hardware integrations of Caffe2.You can also create Torch Scripts by using a tracing JIT. Here the computational graph nodes will be visited and the final script will be produced after recording the operations.
- Scripting Mode
The Scripting Mode makes it possible for you to write regular Python functions without the need to use complicated language functions. The @script decorator can be used to compile a function once the desired functionality has been isolated.Such an annotation would directly transform the Python function into a C++ runtime for higher performance.Torch Scripts can be created by providing custom scripts where you provide the description of your model. Though it is necessary to take the limitations of Torch Scripts into account for this purpose.
- Integration of Research and Production
PyTorch 1.0 integrates research with production very intuitively. Although past versions quickly rose to popularity for the flexibility they provided in Artificial Intelligence development and research, performance at production scale remained a challenge. Developers had to translate the research code into a graph model representation in Caffe2 for production purposes. Such a migration used to be manual and time-consuming.But now PyTorch 1.0 integrates immediate and graph execution modes to help developers handle research and production simultaneously. With the help of a hybrid front-end, you can now share code between both the modes for seamless prototyping and production.
- C++ API and Frontend
Python has not been a popular option for deployment in C++ due to factors such as high overheads on small models, multi-threading services bottleneck on GIL, etc. PyTorch 1.0 provides developers with a two-way pathway from Python to C++ and vice versa. This helps in functions such as debugging and refactoring. The C++ API allows you to write custom implementations such as calls to third-party functions.In addition to this, the beta version of C++ Frontend was also announced. Though it is currently marked as ‘API Unstable’. This makes it ready to be used for building research applications but its utilization for production purposes will take some time to stabilize.
- Optimization and Export
It does not matter whether you are using the Tracing mode or the Scripting mode. With PyTorch 1.0, the result is always a Python free representation of your model which can be used in two ways – to optimize the model or export the model – in the production environments.
Whole program optimizations become possible with the ability to extract bigger segments of the model into an intermediate representation. Computations can also be offloaded to specialized AI accelerators. PyTorch 1.0 also includes passes to fuse GPU operations together and improve the performance of smaller RNN models.
Support From the Ecosystem
The tech world has been quick to respond to the added capabilities of PyTorch with major market players announcing extended support to create a thriving ecosystem around the Deep Learning platform. Here is a wrap up of the major announcements that the release of PyTorch 1.0 has attracted:
- Amazon Web Services (AWS):Amazon SageMaker now features PyTorch 1.0 images. Developers can even train and deploy their Deep Learning models of PyTorch in SageMarker. Once the PyTorch script has been written, Amazon SageMaker training can handle subsequent tasks such as setting up the distributed training cluster, hyperparameter tuning and transferring of data. This makes Amazon SageMaker as a managed online endpoint with a coveted ability to scale up automatically whenever the need arises.Using PyTorch in Amazon SageMarker starts with the developer providing the relevant script and then use the PyTorch estimator from Amazon SageMarker Python SDK as follows:
estimator = PyTorch(entry_point=”pytorch_script.py”,
Developers can package their code as a Docker container to host it or deploy it for inference.
Microsoft has also been quick in announcing major support for PyTorch. The highlights include:Setting up extensive Windows support for PyTorch. Actively contributing to the GitHub code.Allocating a dedicated team of Developers to improve PyTorch.Closely working with the community.Integration of PyTorch in all Machine Learning products of Microsoft which includes:
Data Science VM
- Google Cloud Platform (GCP):
Google has not held back in any way by jumping into the mix with a few major announcements of its own in partnership with the PyTorch 1.0 release:Although Kubeflow already supported PyTorch, Google has extended the TensorRT package in Kubeflow to support serving PyTorch models.A collaboration of Tenserboard with PyTorch. This includes Cloud TPU and TPU pods with support for easy scaling.Broadened support for PyTorch throughout the AI platforms and services of Google Cloud Support.Fully hybrid Python and C/C++ front-end support and native distribution execution support for production environments.
Nvidia and Facebook also have a healthy collaboration history with the companies joining hands together in 2017 to create large-scale distributed training scenarios to develop Machine Learning based applications for Edge devices. Collaborative efforts continue today with Nvidia actively working to integrate Pytorch into their current offerings:
- A PyTorch Extension Tools (APEX) for easy Mixed Precision and Distributed Training.
- Support for PyTorch framework across the inference workflow. Developers can:
- Import PyTorch models with the ONNX format
- Apply INT8 and FP16 optimizations
- Calibrate for lower precision with high accuracy
- Generate runtimes for production deployment
- Availability of PyTorch container from the Nvidia GPU Cloud container registry to help developers get started quickly with the platform.
Key Capabilities and Features
Having started out just 2 years ago, it is incredible how quickly it has matured to add new capabilities and functionalities. A host of improved abilities have been introduced in PyTorch 1.0. Here is a quick look at what the open source Deep Learning platform is capable of today:
- Hybrid Front-end: Provides ease of use and better flexibility in eager mode. Provides graph mode for speed, optimization, and functionality in C++ runtime environments.
- Distributed Training: Optimized performance for both research and production. Provides asynchronous execution of collective operations and peer to peer communication.
- Python First: PyTorch has been built to be deeply integrated with Python and can be actively used with popular libraries and packages such as Cython and Numba.
- Tools and Libraries: The community of PyTorch is highly active, which has led to the development of a rich ecosystem of tools and libraries. This has extended the reach and supported development in numerous areas.
- Native ONNX Support: PyTorch also offers export models in the standard Open Neural Network Exchange format. This provides developers with direct access to ONNX-compatible platforms, runtimes, visualizers, etc.
- C++ Front-end: A C++ interface that is intended to enable research in high performance or low latency C++ applications.
- Cloud Partners: As established by the support provided from the ecosystem, all major cloud computing platforms are supporting PyTorch today. This paves way for a smooth development process, easy scaling, large-scale training on GPUs, etc.
Main Elements in PyTorch
If you are planning to fuel your development process by leveraging the phenomenal capabilities, there are some main elements that you should know about before starting out to plan your development process in the most optimum way. Let’s take a look:
- PyTorch Tensors
Tensors are multidimensional arrays. Tensors are similar to numpy’s ndarrays, though they can also be used on GPUs. A simple one-dimensional matrix can be defined as:
# import pytorch
# define a tensor
2[torch.FloatTensor of size 1]
- Mathematical Operations
PyTorch provides you with 200+ mathematical operators to work with. This meets the need of a scientific computing library making efficient implementations of mathematical functions. Here is how addition works out :
a = torch.FloatTensor()
b = torch.FloatTensor()
a + b
3[torch.FloatTensor of size 1]
Various functions on matrices can also be performed on the defined PyTorch Tensors. Here’s an example:
matrix = torch.randn(3, 3)
0.4182 2.1159 8.3576
-0.4563 -0.2357 -2.5800
-0.5081 -2.1937 -0.0291
[torch.FloatTensor of size 3×3]
0.4182 -0.4563 -0.5081
2.1159 -0.2357 -2.1937
8.3576 -2.5800 -0.0291
[torch.FloatTensor of size 3×3]
- Autograd Module
PyTorch makes use of Automatic differentiation. A recorder records all the performed operations and then plays it back to compute the gradients. This technique finds extensive usage when neural networks are built.from torch.autograd import Variable
x = Variable(train_x)
y = Variable(train_y, requires_grad=False)
- Optim Module
The torch.optim module helps you to implement optimization algorithms to build neural networks. The best feature is the support for most of the commonly used methods. This eliminates the need to build them from scratch.
For instance, here is how you can use the Adam.optimizer:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
- nn Module
It can be difficult to define complex neural networks with raw autograd. The nn Module helps in this regard by allowing you to define a set of modules that can be considered as a neural network layer.
# define model
model = torch.nn.Sequential(
loss_fn = torch.nn.CrossEntropyLoss()
Important Things to Keep in Mind
With the basics covered, you can now kickstart building your very own neural network with PyTorch and make use of the maturing ecosystem to bring your ideas to life.
But before you begin, here are some important details to keep in mind to avoid certain pitfalls that might give you trouble at a later stage.
- Data Types
Data types matter a lot in PyTorch. For example, not all NumPy arrays can be converted to torch Tensor. Only certain NumPy data types can be converted to torch Tensor type such as numpy.uint8 to torch.ByteTensor, numpy.int16 to torch.ShortTensor, and numpy.int32 to torch.IntTensor.
- Numerical Stability
The thumb rule is – if it can overflow or underflow, it probably will. For instance, let’s say that you have to relate samples with tags in both positive and negative ways. You use the classical sigmoid + log loss for this purpose.
sigmoid = torch.nn.functional.sigmoiddot_p = torch.dot(anchor, tag_p)
loss_pos = -torch.log(sigmoid(dot_p)) #(1)
dot_n = torch.dot(anchor, tag_n)
loss_neg = -torch.log(1 – sigmoid(dot_n)) #(2)
Log(0) is the critical point here. Since Log is undefined for this input, there are two ways in which this situation can go down:
sigmoid(x) = 0, which means x is a “large” negative value.
sigmoid(x) = 1, which means x is a “large” positive value.
Regardless of the case, -log(y) evaluates to zero. This leads to a numerical instability which hinders with further optimization steps.
A workaround here can be to bound the values of sigmoid to be slightly below one and slightly above zero.
value = torch.nn.functional.sigmoid(x)
value = torch.clamp(torch.clamp(value, min=eps), max=1-eps)
This makes sigmoid(dot_p) to be always positive and (1 – sigmoid(dot_n)) to never amount to zero. Although this is not rocket science, you need to keep such evaluations in mind to ensure numerical stability while you code.
In PyTorch, Gradients accumulate by default. To understand this, consider a scenario in which you run a computation once, both forward and backward, and everything seems to be working correctly. But when you run it for the second time, new gradients get added to the gradients from the first operation. This is easy to forget, especially for developers who are dealing with a Machine Learning platform/library for the first time.A quick solution in such a scenario would be to manually set the gradients to zero between every two runs. This can be done with:
PyTorch is taking the world of Deep Learning by storm by paving way for better innovation in the whole ecosystem that even includes the likes of education providers such as Udacity and Fast.ai. All this and more makes the future of PyTorch quite promising and provides huge incentives to developers to start depending on the platform confidently. Subscribe to the blog for further tutorials and updates on PyTorch