Years ago when I was a system administrator in a data center, one of my jobs was to manage performance and predict systems growth. I would collect volumes of data from the VMS clusters and use a statistical tool to report, archive, reduce and do basic trend analysis. I was able to statistically reduce the data for historical purposes and use simple trend line analysis to predict growth. It was a good solution at the time and helped me set the budget for new data center hardware. The what-if scenarios were definitely limited.
Recently I’ve been exploring the world of IoT data, also called big data. In many ways, it’s a different name for what I did as a system administrator and for what we’ve always done in terms of collecting and analyzing data. However, with IoT, we have a larger variety and much higher volume of data. I’ve also come to realize that there are many more tools and options today to make sense of the data — but people don’t really know what to do with the tools to make the data they collect even more valuable.
Are Sensors Adding Even More Data?
One method of collecting data is through sensors. We have sensors everywhere: from jet engines to air conditioners to genomics to retail. All the data we’re collecting has value at the time it’s collected and over time will lose its value and may even become a liability. A biological gene study can be conducted for months, and the data collected at the start remains valuable until the end of the study. Whereas other data, like temperature in an office space, may quickly lose its value.
The data we’re collecting is generated by many devices and they may conform to a standard — but there are too many standards to know them all. For example, in facilities there are standards for HVAC, electrical, water, security and so on. It’s further complicated because OEM manufacturers can modify the standards to better meet their needs and still say they are running to the standard. This means we can see the information but not know what it means. This is one of the reasons why we need gateway devices and intelligence at the edge.
I’ve also seen a lot of variance in the types of data we’re seeing. It is both structured and unstructured data and comes in a wide variety of formats. We need to be able to store that data in a way that is most appropriate to its format and where it will provide its greatest value. This is where the cloud can add tremendous value. By using the cloud, you could set up a JSON-compliant database like Cloudant in minutes versus on premises where it would take much longer.
In the world of IoT, we have sensors, gateways, connectivity, data platforms and analytics ‒ all of which need end-to-end security. A single sensor can generate a tremendous amount of data over time, and many sensors could potentially swamp many networks. For example, we mentioned sensors on jet engines earlier. Commercial airliners have multiple engines that generate a huge amount of raw data during a flight. So when multiple planes are on the ground at an airport, the airlines will want to gather the sensor data from those planes and turn it into something usable as quickly as possible. Just imagine how much it takes to download and analyze terabytes of data from all those sensors in all those planes in the short time that they are on the ground.
Realize the Potential of Sensors
In this scenario, the goal for me would be to get all the critical data from the engine and plane but to do so in the most expedient manner. If I select a particular temperature sensor, I would assume it will start at a relatively low temperature (plane parked on ground), see a relatively fast rise (starting the engine), see a spike (taxi and takeoff), stay basically flat (flying) and finally cool down as the plane lands. I may want all the raw data during startup, takeoff and landing, but I really want only the anomalies from level flight. I may also want to average the temperature data during level flight versus recording the same number every few milliseconds. This is where smart sensors and gateways can add tremendous value by reducing the volume of data and capturing the really important stuff.
The data collected can then be sent to the cloud versus a central office. The cloud will have the storage capacity and computing capacity to best handle all the data and ensure it is visible to all the people that would want to see it. It can then be processed further and moved to a central office for data warehousing or further examination.
Arrow has some great cloud and on-premises tools like IBM Bluemix, Watson IoT, SPSS, Splunk and others that will help customers collect, manage and get more value out of their data. The market is changing and evolving on a continual basis, as is Arrow’s line card. And we have the solutions needed to meet those changing needs.
Editor’s Note: This article was originally posted in August 2016 and has been updated for accuracy and comprehensiveness.