Big Data vs Important Data vs You

Big Data is everywhere. Big Data is being talked about, collected, processed. Huge efforts are made to make sense out of Big Data. Big Data is about money - if you can come to some conclusions analysing big data, you can save or earn money - that is what the big data is about. The humankind floods the storage servers with its waterfall of data, most often not very important from the original source point of view, mostly meaningless from a single entity point of view, but meaningful (hopefully!) en mass.

Big data promises the Holy Grail in FinTech institutions, with risk assessment, decision making, fraud detection and many other domains, hidden usually from non-regulatory eyes. Why are big data systems so popular? They promise to run on commodity hardware, for a fraction of price tag that would be attached to proprietary, legacy solutions, big data or not. They promise massive horizontal scalability, handling burst workloads and much more. Of course, when you look at more complex scenarios, you'll find out that the clusters are built with top-shelf dedicated server hardware, that licencing is not that cheap at all, and that the scalabilty has to be precisely designed and baked into the application - otherwise you won't go anywhere.

Big Data vs RDBMS

The market for the NoSQL, big data solutions has grown tremendously in the recent years, but the fragmentation begins to take its toll. There are first signs that the NoSQL vendors are folding, being taken over by bigger enterprises, or completely disappear from the market. This is good - the fragmentation of the market has been confusing for the customers, who have difficulty with choosing the right solution. And often the solutions prove to be rather less sparkling than advertised and require a lot of hand holding, 24x7 support, unplanned releases, patches etc.


Let the strongest survive.

In the meanwhile, the RDBMS market is strong, very strong. The companies who have not-so-big data are not going to invest millions and millions of dollars to replace their legacy systems with big data solutions, with all the fancy document oriented, key value pairs or hybrid systems which require complete rewrites of their applications. The RDBMS systems hold, you know, the boring kind of data. Transactions, inventories, orders, balances, accounts and the likes. The systems offer one, very important feature for these data - transactional processing. 

The transactions are logged write-first to the permanent storage, the old way, invented in seventies or earlier. Owners of these systems know that their databases can be restored to a point of time in case it is required - if their DBAs did not screw up. 

This is the Important Data.

Choices, choices

There is a number of decisions to be made when you think about launching a new project and its underlying data store technology. A great article by Deb Haldar, 10 questions to ask yourself before choosing a NoSQL database outlines just some of them.

How do you choose the right database technology for your project? 
If your lean towards using a big data, NoSQL solution, in many cases the decision is based on emotions and preconceptions rather than cool analysis. I have seen blind implementations of systems using Hive as a RDBMS engine. Instead, an analysis should be performed and answers to a number of questions have to be found.
You have to ask yourself, among others:

  • How much data is going to be stored in the system?
  • What is the support model of the technology?
  • Are your developers familiar with the technology?
  • How many concurrent users? 
  • Do you need transactions? 
  • How are you going to implement reporting on your data?
  • Do you have some downstream dependencies on the data? 
  • What technology do they use?
  • How are you going to access the data?
  • Do you need to support multiple schemas?
  • What is the cost of the infrastructure?
  • Do you need geographical distribution of the data store?
  • How do you handle upgrades to your application?

The article written by Deb Haldar begins with a brief story of a project that was almost cancelled because wrong decisions about the data architecture were made. Then it proceeds to discuss a number of the questions from the list above and more - well worth reading!


Since you read that far, please check out the articles describing how you can use SelectCompare - it's very easy!

Leave a comment

Please note, comments must be approved before they are published