Databricks
Databricks by sennchi
Databricks is based in San Francisco, California, U.S. It offers the Apache Spark-based Databricks Unified Analytics Platform in the cloud. In addition to Spark, it provides proprietary features for security, reliability, operationalization, performance and real-time enablement on Amazon Web Services (AWS). Databricks announced a Microsoft Azure Databricks platform for preview in November 2017, which is not considered in this Magic Quadrant because it was not generally available at the time of evaluation.
Databricks is a new entrant to this Magic Quadrant. As a Visionary, it draws on the open-source community and its own Spark expertise to provide a platform that is easily accessible and familiar to many. In addition to data science and machine learning, Databricks focuses on data engineering. A 2017 Series D funding round of $140 million gives Databricks substantial resources to expand its deployment options and fulfill its vision.
STRENGTHS
-
**Center of the Spark ecosystem: **Founded by creators of Apache Spark, Databricks uses its key position in the Spark ecosystem to grow its customer base. The Spark user community is expanding, because Databricks spearheads numerous Spark meetups, Spark Summits and training courses integrated with the Databricks Community Edition. Many companies introduce machine learning by using Spark as a starting point. Experienced organizations often select the Spark ecosystem to further strengthen their business.
-
**Work with large datasets: **Databricks optimizes its infrastructure for performance and scalability, and pays special attention to large datasets. As a result, Databricks' platform outperforms plain Apache Spark. Reference customers are especially pleased with its SQL performance, and with the exposure of deep learning in SQL as a feature of sparkdl.
-
**Innovation: **Databricks' innovation in open-source software, streaming and the Internet of Things (IoT) accounts for its Visionary status. Reference customers like its turnkey notebooks for interactive collaboration and support of multiple languages (SQL, Python, R and Scala) on data from various sources. Databricks' innovative infrastructure approaches to cluster management and serverless capabilities enable execution of machine-learning models at scale.
CAUTIONS
-
**Limited market awareness: **Despite the marketing strategy that underpins Databricks' impressive growth, much of the market is not aware of the fully managed Databricks platform built on Spark, and is instead buying Spark support from other cloud vendors or Hadoop distributors.
-
**Cost tracking: **The overall cost of running Databricks' platform consists of the cost of the underlying cloud capacity, which is paid to the cloud provider directly, and the native cost of the Databricks platform. Although Databricks reduces the Spark total cost of ownership (TCO) for comparable loads, reference customers identified difficulties with control, analysis and monitoring of third-party cloud expenses.
-
**Debugging capabilities: **Most customers use Databricks for "do it yourself" machine learning. In addition to the debugging capabilities that Databricks already offers, reference customers wish the vendor could provide debugging features better suited to the needs of data scientists. Databricks would also benefit from an integrated development environment (IDE) with comprehensive facilities for enterprise-grade debugging, development and version control, in addition to the currently offered IDE on GitHub that leaves many reference customers dissatisfied.