Comparison of data processing architectures


John Ryan wrote an excellent post comparing different data architectures and their pros and cons. As it turns out there are four major options you can choose from, and you should choose carefully.

Symmetric Multiprocessing (SMP) - the architecture we all know and love. Basically these are systems which are based on a single machine with attached storage. The machine of course can work in HA configuration, and there are some multi-machine configurations as well, but the principle stays the same. Examples of such engines include Oracle, SQL Server, PostgreSQL, MySQL and many others.

Massively Parralel Processing (MPP) - the idea behind it is that the data is distributed across multiple nodes using a number of algorithms, and each of these nodes can process the data independently from another, contributing eventually to the consolidated result returned to the client. Examples of such engines are Teradata, Netezza, Microsoft PDW and others.

Hadoop - this term is very loose, as it actually describes a family of open source products that work within the environment, which is a cluster of servers using shared file system HDFS. The Hadoop based systems have certain advantages over MPP systems, but they should not be considered a direct upgrade step from them. Hadoop based systems have one or another way of accessing the data via SQL - these implementations have their specifics which can bite you if you do not understand them properly.

Elastic Parallel Processing (EPP) - an evolution of the MPP architecture, that separates the data storage and compute layers of the solution. This allows for abstraction of data access and have system more optimally balanced from the spend point of view - you usually need far fewer machines to compute your data than machines for its storage. Examples of such solutions are usually up there in the cloud - Azure SQL Data Warehouse, Snowflake, Amazon Aurora.

comparison of data architectures

Here is the link to the article, which covers the topic in greater depth, providing an overview of use cases that might help you do decide on your solution. 

 


SelectCompare can compare your data as long as you can connect to your data source and write a SELECT query. That's easy!

Write your queries with filters and aggregations and receive the comparison results in no time. Reuse the queries and comparison definitions - save time!

Try SelectCompare today!


Featured Products



Leave a comment


Please note, comments must be approved before they are published