Databricks: Data and AI Lakehouse Company

Databricks is a San Francisco data and AI company founded in 2013 by the creators of Apache Spark. Its platform unifies data engineering, analytics, and machine learning around the lakehouse architecture, and a December 2025 round valued the company at about $134 billion.

Databricks is one of the most closely watched private companies in enterprise software, and it sits at the center of how large organizations store, govern, and build on their data. Founded in 2013 by a group of researchers who created Apache Spark at the University of California, Berkeley, the company set out to make large-scale data processing and machine learning practical for ordinary engineering teams rather than a handful of specialists. More than a decade later it serves a large share of the Fortune 500 and was valued at roughly $134 billion in a financing round announced in December 2025.

The company is headquartered in San Francisco, California, and operates across cloud providers including Amazon Web Services, Microsoft Azure, and Google Cloud. Its founders argued early that the separation between data warehouses and data lakes forced companies into duplicated systems and brittle pipelines, and much of the Databricks story is an attempt to collapse that divide into a single governed platform.

What Databricks does

Databricks sells a cloud platform that brings data engineering, analytics, business intelligence, and artificial intelligence into one environment. Teams use it to ingest raw data, clean and transform it, run SQL queries and dashboards, and train and serve machine learning models, all against the same underlying storage. The company describes the result as a data intelligence platform built on an open lakehouse foundation.

The practical appeal is consolidation. Instead of moving data between a lake for raw storage, a warehouse for analytics, and separate tooling for machine learning, an organization can keep one copy of its data and apply different workloads to it. That reduces copies, lowers the chance of figures drifting apart between systems, and gives security and governance teams a single place to set policy.

Origins in Apache Spark and Berkeley

The technical roots of Databricks trace to the AMPLab at UC Berkeley, where Matei Zaharia and colleagues built Apache Spark, an open-source engine for distributed data processing. Spark spread quickly because it was faster and more flexible than the batch systems that came before it, and a community of contributors formed around it. The founding team saw an opportunity to support the commercial demand growing around the project.

Databricks was incorporated in 2013 by seven people connected to that work, among them Ali Ghodsi, who serves as chief executive, along with Matei Zaharia, Reynold Xin, Ion Stoica, Patrick Wendell, Andy Konwinski, and Arsalan Tavakoli-Shiraji. The company has continued to release major open-source projects rather than keeping all of its technology proprietary, a choice that helped it earn trust among data engineers and shaped how the platform is adopted inside large companies.

The lakehouse and the product family

The idea most associated with Databricks is the lakehouse, an architecture that aims to deliver the low-cost, open storage of a data lake together with the reliability and performance features of a warehouse. The foundation is Delta Lake, an open storage layer that adds transactions, schema enforcement, and versioning to files held in cloud object storage. On top of that sit the tools teams use day to day.

Several products round out the platform. Unity Catalog provides governance and lineage so administrators can manage access and trace how data is used. MLflow, another open-source project originated at Databricks, manages the machine learning lifecycle from experiment tracking to deployment. Mosaic AI, built on the company's acquisition of MosaicML, supports building, tuning, and serving generative AI models and agents. For readers comparing platforms in this category, our guide to the best AI tools for business puts these capabilities in context alongside other options.

Customers and scale

Databricks reports that more than 10,000 organizations use its platform, ranging from technology firms and banks to retailers, manufacturers, and public-sector agencies. The breadth reflects how general the underlying problem is, since almost every large enterprise now wants to combine analytics with machine learning on the same data.

Growth has been rapid in financial terms as well. In its December 2025 announcement the company said it had surpassed a $4.8 billion revenue run rate, growing more than 55 percent year over year, with substantial contributions from both its data warehousing and AI product lines. Those figures help explain why investors have repeatedly marked the company up across successive private rounds.

Funding and why it matters

The December 2025 round, reported at more than $4 billion and led by investors including Insight Partners, Fidelity, and others, set the valuation near $134 billion. That figure placed Databricks among the most valuable venture-backed companies in the world and reflected investor conviction that the platform sits in the path of long-term enterprise spending on data and AI.

For founders and operators studying the company, the lesson is less about any single feature and more about positioning. Databricks attached itself to a widely adopted open-source project, kept contributing to open standards as it commercialized, and used that credibility to expand into adjacent markets such as warehousing and generative AI. Whether it eventually goes public or stays private, it has become a reference point for how an infrastructure company can grow from an academic project into a platform that large organizations depend on.

Frequently asked questions

When was Databricks founded and by whom?

Databricks was founded in 2013 by the creators of Apache Spark from UC Berkeley, including Ali Ghodsi, Matei Zaharia, Reynold Xin, Ion Stoica, Patrick Wendell, Andy Konwinski, and Arsalan Tavakoli-Shiraji.

Where is Databricks headquartered?

Databricks is headquartered in San Francisco, California, and operates across the major public cloud providers.

What is the Databricks lakehouse?

The lakehouse is an architecture that combines the open, low-cost storage of a data lake with the reliability and performance of a data warehouse, built on the open-source Delta Lake storage layer.

How much is Databricks valued at?

A financing round announced in December 2025 valued Databricks at about $134 billion, with the company reporting a revenue run rate above $4.8 billion.

What products does Databricks offer?

Its platform includes Delta Lake, Unity Catalog for governance, MLflow for the machine learning lifecycle, and Mosaic AI for building and serving generative AI models and agents, along with SQL analytics and data engineering tooling.

Sources and further reading: the company's official site at databricks.com, the Databricks entry on Wikipedia, and the company's December 2025 funding announcement.