dcn-na logo
Story image

Google Cloud rolls out Cloud Dataproc on Kubernetes

11 Sep 2019

Google Cloud is trialling alpha availability of a new platform for data scientists and engineers through Kubernetes.

Cloud Dataproc on Kubernetes combines open source, machine learning and cloud to help modernise big data resource management.

The alpha availability will first start with workloads on Apache Spark, with more environments to come.

According to Google Cloud product managers Christopher Crosbie and James Malone, Google Cloud Dataproc can provide open source data analytic processing for those who need to process data and train models at scale, faster.

However, as enterprise infrastructure becomes increasingly hybrid in nature, machines can sit idle, single workload clusters continue to sprawl, and open source software and libraries continue to become outdated and incompatible with your stack,” they explain.

“It’s critical that Cloud Dataproc continues to empower data professionals to focus more on workloads than infrastructure by combining the best of cloud and open source.”

The platform will include key benefits such as faster workloads, unified resource management, job isolation, collaboration, and expertise sharing.

Unified resource management will allow data scientists to work with a central view that spans both Kubernetes and YARN cluster management systems.

“Kubernetes has flipped the big data and machine learning open source software (OSS) world on its head, since it gives data scientists and data engineers a way to unify resource management, isolate jobs, and build resilient infrastructures across any environment.”

More resilient infrastructure: A self-healing GKE environment can support the smooth operation of mission critical ETL and machine learning jobs on Spark.

“Data scientists and data engineers don’t have to worry about sizing and building clusters, manipulating Docker files, or messing around with Kubernetes networking configurations. It just works. With leading support from the team that built Kubernetes, enterprises have access to the skills they need to close any Kubernetes skills gap on their team.”

Less time and resource on infrastructure, more on workloads – the development of new applications and models faster at scale

Isolate jobs to accelerate analytics life cycles – users can package up entire jobs in standalone containers to allow for testing, upgrading and patching without breaking underlying cluster.

Collaboration and expertise sharing to close the Kubernetes skills gap – new capabilities, bugs and security issues can be discussed and resolved by open source community

This is the first step in a larger journey to a container-first world. While Apache Spark is the first open source processing engine we will bring to Cloud Dataproc on Kubernetes, it won’t be the last,” comment Crosbie and Malone.

They add that Google Cloud’s data and analytics strategy has always involved open source as a core pillar.

“This alpha announcement of bringing enterprise-grade support, management, and security to Apache Spark jobs on Kubernetes is the first of many as we aim to simplify infrastructure complexities for data scientists and data engineers around the world.”

Story image
DE-CIX breaks another data throughput record at Frankfurt exchange
DE-CIX successfully delivered more than 9 terabits per second (TB/s) through a Frankfurt internet exchange. According to the company, it has now broken a new ‘sound barrier’.More
Story image
Aruba launches new global cloud data centre with Cogent
Aruba has activated a new global cloud data centre, based at its technology campus on the outskirts of Milan and provided by internet service provider Cogent.More
Story image
ThousandEyes launches outage map as internet usage explodes during COVID-19 outbreak
"Over the past couple of weeks, we’ve been inundated with requests from businesses, industry analysts and other various parties wanting to get a better understanding of global internet health during these trying times."More
Story image
Interview: ManageEngine's VP says legacy remote solutions aren't cutting it
Techday spoke with ManageEngine vice president Rajesh Ganesan on the company’s solutions to the rapid changes and issues facing workforces around the globe as millions upon millions pack up their offices and work from home.More
Story image
Nutanix & Udacity launch hybrid cloud nanodegree program
Nutanix is sponsoring 5000 scholarships, to be taught via Udacity, which will school IT professionals on topics such as modern private cloud infrastructure and the design of hybrid application deployment.More
Link image
Windows & Linux server monitoring for just $9 a month
Monitor your entire server infrastructure and get in-depth visibility into key performance indicators of your data center's Windows & Linux servers.More