Hadoop Warehouse
Enough with the Metaphors, lets get down to Business
Posted on October 2nd, 2015

The Hadoop World with Strata ( ) thrown in for broader relevance  was a great place to be. It was a whirlwind of activity, glimpses of fast changing landscape mixed with healthy doses of salesmanship and Sparkling (Spark was everywhere and in every mouth) reminder of what's in and what's out and how some older vendors (Syncsort and less so IBM) have been inventive enough to lay a stake to claim a piece of the pie everywhere, including here.

The conference floor was littered with BI, Data Discovery vendors supporting Hadoop and it was well nigh impossible to walk a meter or two without bumping into at least a vendor in that space if not two.

Of course there was talk of Data Lakes, Reservoirs, Irrigation (What? STOP! that's when you know you've gone too far with metaphors) and a horde of animals wild and domesticated running rampant across the terrain of the conference. Apache Kafka appeared to be a form of relief (in name only though) from this horde as you took a moment, Kafkaesque, if you like to ponder how are these Lakes, Kudu's, Hawq's, RedPoints, tamr's, Dato's and countless others require you to "Splunk" your money in for returns that may or may not be worth your time.

There were some interesting speakers in the Keynotes and two stood out in my mind, first Joseph Sirosh, "What 0-50 million users in 7 days can teach us about big data", was brilliant in getting the message across about how Azure can scale your business if you so need from 0 to 50 million users in a week without blatantly promoting Microsoft and more importantly that in todays business climate if you put something out to the world it scales up faster than you could envision its growth in your wildest dreams, this is especially true for any small person putting an App or Website out there with just a tiny hope that it will become a thing substantial in due course. Due course can now be a matter of minutes to just a few hours, be prepared for it. Here is the link to his entertaining and informative talk. ( )

The second and most outstanding was Maciej Ceglowski with "Haunted by Data". It was a passionate plea for some reflection to everybody working in Data Sciences and an early warning to not lose our sense of right and wrong while power of the data beckons us to arm ourselves further to compete. I will say no more except to say, it was rollickingly hilarious and yet the most meaningful oration of the entire conference. Watch it ( ).

Lastly the Security layers and infrastructure in and around Hadoop has to improve and there were some promising trends in this area though this is surprisingly still lagging given as one of the speakers I pointed out above shows, this things grows like Hulk given the right conditions.
‚ÄčIn conclusion, I'll say I'll be there next year to find how this industry has progressed and what I can get my clients from it.

‚ÄčThe large majority of vendors barring Microsoft, Intel, Tableu and a handful of others were short on appealing to all Enterprises except those willing to spare money for experimentation. I found most vendors lacking a show of comparison to established tools in the Enterprise infrastructure such as Kafka (Vendor:Confluent) against MQ Series or MSMQ, MemSQL and VoltDB against Oracle Times 10 (or whatever Oracle sells now in this space), or some of the Database (Cloudera and other Hadoop ecosystem vendors) offerings against say Oracle Real Application Clusters. They appeared to just assume that since its from the "Cool" Open Source world, its got to be good, with all these corporations running rampant in this "cool" world, I don't think its quite so cool any longer, so lets recognize that and get down to business and fast.

I have no doubt that the architecture of Hadoop and many of these vendors whose products are utilizing the amazing engineering within Hadoop (directly or in spirit) in most cases beats those slightly older technologies, however it will be good to see a head-to-head comparisons to help move some of these pre-Hadoop systems into the Hadoop infrastructure. Thats just the plea from this lone Enterprise Data and Solutions Architect.

Posted in not categorized    Tagged with hadoop, cloudera, kafka, strata, voltdb, memsql, platfora, kudu, hortonworks, mapr, microsoft, intel, pivotal, consulting, scale, enterprise, data warehouse, mq series


Leave a Comment