BI on Big Data

Business Intelligence on Big Data?

“That’s easy!” you think. Just connect to hadoop with Hive jdbc/odbc driver and you good to go!

Well… Not the best idea 🙂

For BI you need SQL engine which is multi-tenant and, at least for moderate amount of data, very responsive (sub second if possible).

When you get to know Hive you’ll see that it won’t satisfy you there (at least before release 2, which in self is a great subject for another post).

So what are your options? What should you consider?

This presentation may shed some light – BI on Hadoop: What are your options? Tomer Shiran (Dremio)

With the permission of Tomer I’ll put some slides that, in my opinion, gives best summary.

You have three options to integrate BI with Big Data:

BI & Big Data - 3 options

My view is that viable options are ETL to DWH and SQL-on-Big-Data. Monolith solutions seems less promising (stronger vendor lock-in, and at the end of the day you still wants to run fast SQL on Hadoop :))

So let’s focus on this two solutions:

ETL to RDBMS

Not much to add – Big Data is engine munging your TBs of data and splitting output to DWH.

I think there is one more “Pros” worth mentioning though. If you have well designed and established security and user access roles on DWH it may be good idea to benefit from this!

SQL-on-BD

SQL-on-BD Solutions

Here you’ve got more to think about. For particular use case different SQL-on-Big-Data solution may be better. No silver bullet here 🙂

Even Hive may be a good fit if you doesn’t care about interactivity.

So it may seem a bit overwhelming which solution to choose…

Hopefully to the rescue comes Tomer and this super slide:

BI on BD Heuristic

Remember this one slide and you’ve got basic understanding of BI on Big Data.

To get more on the subject please go to the original presentation.

If you have possibility than I really recommend you to watch the video of Tomer’s talk on Strata+Hadoop 🙂

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *