Comparison of data processing architectures

John Ryan wrote an excellent post comparing different data architectures and their pros and cons. As it turns out there are four major options you can choose from, and you should choose carefully.

Symmetric Multiprocessing (SMP) - the architecture we all know and love. These are systems that are based on a single machine with attached storage. The machine of course can work in HA configuration, and there are some multi-machine configurations as well, but the principle stays the same. Examples of such engines include Oracle, SQL Server, PostgreSQL, MySQL, and many others.

Massively Parallel Processing (MPP) - the idea behind it is that the data is distributed across multiple nodes using several algorithms, and each of these nodes can process the data independently from another, contributing eventually to the consolidated result returned to the client. Examples of such engines are Teradata, Netezza, Microsoft PDW, and others.

Hadoop - this term is very loose, as it describes a family of open-source products that work within the environment, which is a cluster of servers using a shared file system HDFS. The Hadoop-based systems have certain advantages over MPP systems, but they should not be considered a direct upgrade step from them. Hadoop-based systems have one or another way of accessing the data via SQL - these implementations have their specifics which can bite you if you do not understand them properly.

Elastic Parallel Processing (EPP) - an evolution of the MPP architecture, that separates the data storage and compute layers of the solution. This allows for the abstraction of data access and has the system more optimally balanced from the spending point of view - you usually need far fewer machines to compute your data than machines for its storage. Examples of such solutions are usually up there in the cloud - Azure SQL Data Warehouse, Snowflake, and Amazon Aurora.

comparison of data architectures

Here is the link to the article, which covers the topic in greater depth, providing an overview of use cases that might help you do decide on your solution. 

 

Leave a comment

Please note, comments must be approved before they are published