NVIDIA’s T4 GPU, now available in regions around the world, accelerates a variety of cloud workloads, including high-performance computing (HPC), machine learning training and inference, data analytics, and graphics.
In January of this year, NVIDIA announced the availability of the NVIDIA T4 GPU in beta, to help customers run inference workloads faster and at a lower cost.
Earlier this month at Google Next ‘19, NVIDIA announced the general availability of the NVIDIA T4 in eight regions, making Google Cloud the first major cloud service provider to offer it globally.
A focus on speed and cost-efficiency
Each T4 GPU has 16 GB of GPU memory onboard, offers a range of precision (or data type) support (FP32, FP16, INT8 and INT4), includes NVIDIA Tensor Cores for faster training and RTX hardware acceleration for faster ray tracing.
Customers can create custom VM configurations that best meet their needs with up to four T4 GPUs, 96 vCPUs, 624 GB of host memory and optionally up to 3 TB of in-server local SSD.
At the time of publication, prices for T4 instances started at US$0.29 per hour per GPU on preemptible VM instances.
On-demand instances start at US$0.95 per hour per GPU, with up to a 30% discount with sustained use discounts.
Tensor Cores for both training and inference
NVIDIA’s Turing architecture brings the second generation of Tensor Cores to the T4 GPU. Debuting in the NVIDIA V100 (also available on Google Cloud Platform), Tensor Cores support mixed-precision to accelerate matrix multiplication operations that are prevalent in ML workloads.
If a training workload doesn’t fully utilise the more powerful V100, the T4 offers the acceleration benefits of Tensor Cores, but at a lower price.
This is great for large training workloads, especially as businesses scale up more resources to train faster, or to train larger models.
Tensor Cores also accelerate inference, or predictions generated by ML models, for low latency or high throughput.
When Tensor Cores are enabled with mixed precision, T4 GPUs on GCP can accelerate inference on ResNet-50 over 10X faster with TensorRT when compared to running only in FP32.
Considering its global availability and Google’s network, the NVIDIA T4 on GCP can serve global services that require fast execution at an efficient price point.
For example, Snap uses the NVIDIA T4 to create more effective algorithms for its global user base, while keeping costs low.
Snap monetisation senior director Nima Khajehnouri says, “Snap’s monetisation algorithms have the single biggest impact to our advertisers and shareholders. NVIDIA T4-powered GPUs for inference on GCP will enable us to increase advertising efficacy while at the same time lower costs when compared to a CPU-only implementation.”
Quadro Virtual Workstations on GCP
T4 GPUs are also a great option for running virtual workstations for engineers and creative professionals.
With NVIDIA Quadro Virtual Workstations from the GCP Marketplace, users can run applications built on the NVIDIA RTX platform to experience the next generation of computer graphics, including real-time ray tracing and AI-enhanced graphics, video and image processing, from anywhere.
“Access to NVIDIA Quadro Virtual Workstation on the Google Cloud Platform will empower many of our customers to deploy and start using Autodesk software quickly, from anywhere,” says Autodesk senior software development manager Eric Bourque.
“For certain workflows, customers leveraging NVIDIA T4 and RTX technology will see a big difference when it comes to rendering scenes and creating realistic 3D models and simulations. We’re excited to continue to collaborate with NVIDIA and Google to bring increased efficiency and speed to artist workflows."