dcn-eu logo
Story image

Google Cloud rolls out Cloud Dataproc on Kubernetes

11 Sep 2019

Google Cloud is trialling alpha availability of a new platform for data scientists and engineers through Kubernetes.

Cloud Dataproc on Kubernetes combines open source, machine learning and cloud to help modernise big data resource management.

The alpha availability will first start with workloads on Apache Spark, with more environments to come.

According to Google Cloud product managers Christopher Crosbie and James Malone, Google Cloud Dataproc can provide open source data analytic processing for those who need to process data and train models at scale, faster.

However, as enterprise infrastructure becomes increasingly hybrid in nature, machines can sit idle, single workload clusters continue to sprawl, and open source software and libraries continue to become outdated and incompatible with your stack,” they explain.

“It’s critical that Cloud Dataproc continues to empower data professionals to focus more on workloads than infrastructure by combining the best of cloud and open source.”

The platform will include key benefits such as faster workloads, unified resource management, job isolation, collaboration, and expertise sharing.

Unified resource management will allow data scientists to work with a central view that spans both Kubernetes and YARN cluster management systems.

“Kubernetes has flipped the big data and machine learning open source software (OSS) world on its head, since it gives data scientists and data engineers a way to unify resource management, isolate jobs, and build resilient infrastructures across any environment.”

More resilient infrastructure: A self-healing GKE environment can support the smooth operation of mission critical ETL and machine learning jobs on Spark.

“Data scientists and data engineers don’t have to worry about sizing and building clusters, manipulating Docker files, or messing around with Kubernetes networking configurations. It just works. With leading support from the team that built Kubernetes, enterprises have access to the skills they need to close any Kubernetes skills gap on their team.”

Less time and resource on infrastructure, more on workloads – the development of new applications and models faster at scale

Isolate jobs to accelerate analytics life cycles – users can package up entire jobs in standalone containers to allow for testing, upgrading and patching without breaking underlying cluster.

Collaboration and expertise sharing to close the Kubernetes skills gap – new capabilities, bugs and security issues can be discussed and resolved by open source community

This is the first step in a larger journey to a container-first world. While Apache Spark is the first open source processing engine we will bring to Cloud Dataproc on Kubernetes, it won’t be the last,” comment Crosbie and Malone.

They add that Google Cloud’s data and analytics strategy has always involved open source as a core pillar.

“This alpha announcement of bringing enterprise-grade support, management, and security to Apache Spark jobs on Kubernetes is the first of many as we aim to simplify infrastructure complexities for data scientists and data engineers around the world.”

Story image
Advanced Energy develops 48v power rack for Open Compute Project
Traditional data center racks use 12-volt power shelves. However, higher performance compute and storage platforms now demand more power, which results in very high current. More
Story image
Zscaler buys Edgewise, with its sight set on zero-trust
The acquisition indicates Zscaler's path towards improving the security of east-west communication, as well as its quest to achieve a zero-trust environment.More
Story image
Dell and Google Cloud deepen integration to bolster hybrid cloud storage
Dell and Google Cloud have announced the launch of their new hybrid cloud storage system, with the capability of moving as much as 50 petabytes of high-performance workloads.More
Story image
Over half of IT pros prefer hybrid and multi-cloud architectures - report
Denodo surveyed executives from over 250 organisations on their attitudes toward cloud, the challenges it presents, and the way in which it has changed workflows within organisations.More
Story image
Schneider Electric and AVEVA team up, target hyperscale data centre market
AVEVA’s Unified Operations Centre scalable industrial software offering will be combined with Schneider Electric’s EcoStruxure for Data Centres, the control and monitoring capabilities used for visibility to everyday operations.More
Story image
Microsoft acquires Metaswitch, affirming investment in telcos
The acquisition affirms Microsoft’s significant investment in 5G technology – the announcement comes only weeks after the tech giant revealed its plans to buy Affirmed Networks, a company with a focus on 5G and edge computing.More