The next big thing in information and communication technology is the ‘Internet of Things (IoT)’. IoT will change the way devices, traditionally considered as unintelligent, start communicating through internet and in the process enrich our lives. For the end user, everything appears to happen automatically, but in reality the onus is on IT organizations to coordinate all actions and ensure smooth operations of these devices through web services and databases. At any given time, the IoT ecosystem will have multiple devices, systems and tools that will generate significant volumes of data. Storing, processing and retrieving such large volumes of data is a crucial part of IoT. This blog is an effort to analyze the existing types of databases, the extent of their usability in the digital world and the emerging need for Database of Things (DoT), a database for today and beyond.
What are the different kinds of databases available for data storage, processing and retrieval in the industry?
A traditional database is a relational database with entities stored in a hard disk. In the recent times, there have been alternate database strategies gaining traction like in-memory databases. The DoT involves humungous volumes of data, which the traditional RDBMS cannot be scaled to manage. Importantly, they need to be fast, nimble and responsive to the real-time requirements. The current in-memory databases are ideal for such situations. Although the memory price has come down, it will not be a viable proposition to keep such a high volume of data always in memory. Therefore, the need for storing and retrieving data from big data databases is of paramount importance.
Big data caters to the 4Vs of data – Volume, Variety, Velocity and Veracity. Big data database supports multiple formats of data, facilitates faster storage of data, meets scalability requirements and ensures data consistency. Given the characteristics of data generated in IoT, big data databases are potentially the suitable options to consider. However, while the big data appliances like Hadoop and NoSQL databases meet requirements on analytics such as the trend analysis quite well, they do not meet up to the demands and rigor of online data processing. One of the main criticisms with big data appliances is the lack of essential RDBMS features. This raises the question of whether the standard big data appliances have the capability to handle IoT requirements or do we have to look for alternate database systems?
Let us further explore the online data processing with relevance to IoT and identify the avenues where we can improve/blend the RDBMS and big data database to design a good DoT architecture.
Let’s understand what is expected out of DoT in the world of IoT. IoT needs real-time data processing capabilities from databases to take on-time decisions. Big data stores like Hadoop are not of much help here. Also, unlike these analytical platforms, real-time data processing in IoT related operations involves financial and other transactions which need transactional integrity. Hence, the databases catering to DoT must have robust transactional integrity features equaling to traditional RDBMS. Another aspect which goes hand-in-hand with transactional integrity is high availability that cannot be ensured with mere redundancy features as done in most big data databases. High availability and agile disaster recovery have to be ensured on par with traditional databases. A DoT should be able to meet time stamped data (temporal) which will facilitate storing data in a time series. In mature IoT applications it is necessary to store a set of data of one device in relation to data of another device and this can be achieved by a geometric space (spatial). These two features (spatial and temporal) will enable spatiotemporal scalability in a DoT.
As database professionals, the challenge for us is to identify the data storage requirement and define the relevant DoT architectures to cater to future IoT related requirements. As explained above, the DoT architecture should comprise traditional RDBMS features, current NoSQL/ big data features as well as recent in-memory features. While it is too early to decide which databases are the best DoTs, it will be worthwhile to do some research on available options as part of our IoT implementation plans. Oracle, MS SQL Server and IBM DB2 are the pioneers of scalability, spatial, temporal and in-memory technology and also support big data in various ways. However, as single offerings, client will have to pay a higher price to get enough IoT scalability out of them. Most NoSQL databases including MongoDb and TempoDB claim that they can handle the time stamped data, scalability, rich query and index support but they have a long way to go in terms of transactional integrity features which would require enhancements on most of these databases. ScaleDB is a product which has transactional integrity, analytical processing as well as high availability but they have not implemented spatiotemporal scalability yet. Spatiotemporal features are still more of an academic concept than an industry solution but recently SpaceCurve has worked on a fully DoT database with such capabilities. Anyway, we don’t have many reviews on them as they are not open source.
Summing up, database professionals have to think out of the box when catering to IoT applications. DoT databases should be scalable, fast and reliable. Current open source big data and NoSQL databases do not cater well to these requirements and traditional databases also have issues with the same. Since a new breed of databases named DoTs are emerging in the market with scalability, performance and transactional integrity features, database professionals will have to understand the IoT requirements, evaluate and use the correct DoT databases catering to the requirements while also helping the community improve these databases, as this is the emerging area in the database technology space.