By Bob Gourley
Below and at this link is a video of Mike Olson at the opening of the 2013 Strata Conference and Hadoop World. The context in this discussion is important for technologists, strategists, analysts and executives alike, it provides insights in easy to understand ways and succinctly articulates the impact of some key trends in the environment.
The ecosystem around Apache Hadoop has continued to mature so updates like this from one of its key leaders are critically important.
Mike opened with a great overview of where the community has come in the last five years:
- In 2008 the Big Data meme had not happened yet. And you probably hadn’t heard of Hadoop yet.
- In 2009 the first Hadoop world was held and over 700 people showed up. This year, 3000 people attended in a sold out venue.
- Now consider the many vendors in the ecosystem.
- And consider the big trends.
- When Hadoop was born was a compliment to traditional data processing. It was off on the side. Good for batch and for storage but could not handle real time. Much of the market did not pay much attention. But real time was always desired. Just a year ago Cloudera announced a real time platform, Impala, an open source real time SQL engine.
For context on where were are today and where we are going Mike reviewed that:
- Other real time capabilities have been added including Cloudera search. In the single year since they have been announced over 5000 enterprises have added Impala and Search. Real time has always mattered.
- Now that real time and search are both available, more work can be done on the real platform. Now other applications and uses can be supported on Hadoop. This platform is attracting work and attracting data. And it is attracting more and more users.
- At enterprise deployments are showing a strong trend. Hadoop is emerging as an enterprise data hub. This meme is big in the industry now.
What is a data hub? A scale-out, affordable, reliable platform. Can hold any data in any format for as long as you want it. It is a storage layer with security built in that can do access control, auditing, logging, providence of data. And a secure storage substrate would also require a rich collection of engines for working with data. You want query, search, machine learning and analytics in place without moving it out. That collection of capabilities is hugely valuable and lets you work with the data where it lives. But still this is not a hub. A hub needs to connect to the infrastructure you already rely on. That makes a hub and makes this concept very virtuous. This is something new. It is an enterprise data hub. This is a very big deal.
Cloudera announced, via Mike, the release of Cloudera 5, the industry’s first Enterprise Data Hub.
Bottom line of this new Enterprise Data Hub capability: Scale out storage, security, good data governance capability, a rich collection of engines for working on the data in place and delivering results to your systems and people.
For more see Mike expand on this concept here