Call: +1 972 342 1432
Name *
Phone No. *
Security Code *
Name *
Email *
Question *
Security Code *

Get more from your data

Scalability Experts Blog

Big Data Physics - The Wave Particle Duality of Data


Big Data Physics - The Wave Particle Duality of Data
Although Big Data discussions tend to focus on the explosion of social media and documents, one of the challenges facing data scientists and analysts is the huge influx of machine born data; data that originates from a sensor, a GPS device, a wireless event. Storing this amount of data, then processing it in a timely manner isn’t really possible in traditional business intelligence products. One way to understand better what design patterns are needed to architect these kind of systems is to steal a concept from the world of physics – the wave particle duality of light.

Particle Data
Current business intelligence and analytics systems manage particle data; i.e. data has to hit the drive as bits and every piece is cleaned, transformed, and then aggregated for various reports. Every piece of data is important (a purchase for instance) and must be stored. This pattern has proved to be very successful for most business born data; e-commerce, office visits, other transactional systems.
The current process of extract, transform, load and analysis has worked well for particle data.

Wave Data
Machine born data follows a completely different pattern. Sensors (including GPS devices) talk about themselves all the time. I am at this location, now this location. I am reporting a temperature of 40 degrees, now 45, etc. These data stream in real time and tend to be of importance within the context of the previous reported values. The analysis that needs to be done is an aggregated across time; the “wave” characteristic of the data.

Some examples of these kind of data are financial trading fraud, manufacturing plant floor, GPS-based transportation, RFID tracking, and home automation and smart grid systems. The data needs to be evaluated in real time against along the wave; i.e. an increase in stock sales over a certain period of time, a rise in temperate in too short of a time, etc.

Wave data is not only compared against itself (temperate was 45 degrees at last reading, now 55 for one sensor) but against previous wave patterns. For example, a stock fraud system may have a wave pattern that is seen as the common pattern for a given period of time in a given day. There may not be any large changes within that wave, and only by comparing the wave to a previous wave can anomalous patterns be discovered. 

What does Wave Data look like?
Wave data tend to look very similar, no matter what the business system or source. It tends to be machine born, and includes a time stamp, and ID or identifier, and then a value. Almost all wave data systems start with this basic data model:

Timestamp, ID, Value

How do you know you are dealing with Wave Data?
A key indicator that you are dealing with wave data rather than particle data is that you can’t process the data quickly enough – save it to disk and then aggregate it – to service the business need.  A base data model that looks like the one above is also a key indicator. The other key indicator is a timestamp as wave data moves through time
How do you process and store Wave Data?
Wave data needs to be processed in flight – before it hits the disk. It also generally needs to be saved to disk for more traditional data analysis. However, the amount of data is huge and not well served by the traditional relational database system because of the relational engine overhead. Most RDBMs do not include any time series compression technologies. Some manufacturing systems can reduce the disk footprint of wave data by up to 50X.

Current NoSQL databases are mostly targeted at Entity Attribute Value data patterns, which also do not include time-series functionality. The algorithms needed to process wave data in flight are also not generally available in current RDBMs or NoSQL systems, and are much more complex than simple aggregation functions available in most systems. For example, wave data usually needs to be aggregated through a sliding window function – the average over the last half hour.

What do we need now to better manage Wave Data?
A commodity time-series database needs to be developed that can efficiently store wave data. Common functionality libraries can be developed to support many of the need aggregates; windows functions, thresholds and alerting functions, etc. Predictive analysis products need to be created that can lay over a common times series data model.

As machine born data grows (Wikipedia reports that a study shows that humans are surrounded by between 1,000 and 5,000 track-able objects) commodity applications that can hide some of the complexity of these systems need to be developed. Currently most businesses dealing with this kind of data are building their own proprietary systems. However, with the advent of inexpensive wireless (such as that offered by the ZigBee chip) these systems will become commonplace and used by every size business.

To learn more about Scalability Experts Expertise visit our Big Data Solutions page.


comments powered by Disqus