Data blog RSS



Excel data comparison - use cases

Excel is a de-facto standard for data exchange in the offices. It is important to be sure that the data in the excel spreadsheets is correct. It is difficult and time consuming to analyze each line of a report, especially if reports contain dozens of columns and hundreds of lines. But how do you know that you collected all the data? How do you know that the report contains all the required inputs? SelectCompare can facilitate use cases that bring huge value to the organization by limiting the amount of time spent on validation of data.

Continue reading



Version 1.3 of SelectCompare is now available

This SelectCompare version 1.3 implements a number of performance improvements. Some of the users reported problems with excessive usage of memory during export of data to Excel spreadsheets and the speed of data comparison of larger data sets. These issues have been resolved. Excel workbooks also contain now charts providing visual statistics of the data comparison.

Continue reading




Table partitioning in relational databases

I would like to write shortly about table partitioning in relational databases. Table partitioning is basically dividing your data in a table into horizontal chunks, that can be (depending on the DB technology you use) indexed separately and stored on different disks. This allows you to address certain performance issues, if a table is large, and there are many inserts into it, and there is a requirement of providing reports on the data in this table. That table might be for example a transaction registry from your retail network. Partitioning allows separate 'read only' partitions from the active partitions. For example if you insert a lot of transactions to your registry table, they usually have a timestamp associated with them. You...

Continue reading



Comparison of data processing architectures

John Ryan wrote an excellent post comparing different data architectures and their pros and cons. As it turns out there are four major options you can choose from, and you should choose carefully. Symmetric Multiprocessing (SMP) - the architecture we all know and love. Basically these are systems which are based on a single machine with attached storage. The machine of course can work in HA configuration, and there are some multi-machine configurations as well, but the principle stays the same. Examples of such engines include Oracle, SQL Server, PostgreSQL, MySQL and many others. Massively Parralel Processing (MPP) - the idea behind it is that the data is distributed across multiple nodes using a number of algorithms, and each of these...

Continue reading