Big Data is everywhere. Big Data is being talked about, collected, and processed. Huge efforts are made to make sense of Big Data. Big Data is about money - if you can come to some conclusions analyzing big data, you can save or earn money - that is what big data is about. Humankind floods the storage servers with its waterfall of data, most often not very important from the source point of view, mostly meaningless from a single entity point of view, but meaningful (hopefully!) en mass.
Big data promises the Holy Grail in FinTech institutions, with risk assessment, decision making, fraud detection, and many other domains, hidden usually from non-regulatory eyes. Why are big data systems so popular? They promise to run on commodity hardware, for a fraction of the price tag that would be attached to proprietary, legacy solutions, big data or not. They promise massive horizontal scalability, handling burst workloads, and much more. Of course, when you look at more complex scenarios, you'll find out that the clusters are built with top-shelf dedicated server hardware, that licensing is not that cheap at all, and that the scalability has to be precisely designed and baked into the application - otherwise you won't go anywhere.
The market for the NoSQL, big data solutions has grown tremendously in recent years, but fragmentation begins to take its toll. There are first signs that the NoSQL vendors are folding, being taken over by bigger enterprises, or completely disappearing from the market. This is good - the fragmentation of the market has been confusing for the customers, who have difficulty choosing the right solution. And often the solutions prove to be rather less sparkling than advertised and require a lot of hand-holding, 24x7 support, unplanned releases, patches, etc.
Let the strongest survive.
In the meanwhile, the RDBMS market is strong, very strong. The companies who have not-so-big data are not going to invest millions and millions of dollars to replace their legacy systems with big data solutions, with all the fancy document-oriented, key-value pairs or hybrid systems which require complete rewrites of their applications. The RDBMS systems hold, you know, the boring kind of data. Transactions, inventories, orders, balances, accounts, and the like. The systems offer one, a very important feature for these data - transactional processing.
The transactions are logged write-first to the permanent storage, the old way, invented in the seventies or earlier. Owners of these systems know that their databases can be restored to a point in time in case it is required - if their DBAs did not screw up.
This is the Important Data.
There are many decisions to be made when you think about launching a new project and its underlying data store technology. A great article by Deb Haldar, 10 questions to ask yourself before choosing a NoSQL database outlines just some of them.
How do you choose the right database technology for your project?
If you lean towards using a big data, NoSQL solution, in many cases the decision is based on emotions and preconceptions rather than cool analysis. I have seen blind implementations of systems using Hive as an RDBMS engine. Instead, an analysis should be performed and answers to some questions have to be found.
You have to ask yourself, among others:
- How much data is going to be stored in the system?
- What is the support model of the technology?
- Are your developers familiar with the technology?
- How many concurrent users?
- Do you need transactions?
- How are you going to implement reporting on your data?
- Do you have some downstream dependencies on the data?
- What technology do they use?
- How are you going to access the data?
- Do you need to support multiple schemas?
- What is the cost of the infrastructure?
- Do you need geographical distribution of the data store?
- How do you handle upgrades to your application?
The article written by Deb Haldar begins with a brief story of a project that was almost canceled because wrong decisions about the data architecture were made. Then it proceeds to discuss a number of the questions from the list above and more - well worth reading!
Since you read that far, please check out the articles describing how you can use SelectCompare - it's very easy!
- Data comparison with SelectCompare - the manual
- Write SELECT statements for Excel
- Create an Excel baseline for your data
- How to compare data with a baseline
- Configuration of an ODBC data source for Oracle
- How to install Cherry City OLEDB provider for MySQL
- Activation of Cherry City OLEDB provider for MySQL