The development of the Internet of Things (IoT) is closely related to a considerable increase in the number and variety of devices connected to the Internet. Sensors have become a regular component of our environment, as well as smart phones and other devices that continuously collect data about our lives even without our intervention. With such connected devices, a broad range of applications has been developed and deployed, including those dealing with massive volumes of data.
In this paper, we introduce a Distributed Data Service (DDS) to collect and process data for IoT environments. One central goal of this DDS is to enable multiple and distinct IoT middleware systems to share common data services from a loosely-coupled provider. In this context, we propose a new speciﬁcation of functionalities for a DDS and the conception of the corresponding techniques for collecting, ﬁltering and storing data conveniently and efﬁciently in this environment. Another contribution is a data aggregation component that is proposed to support efﬁcient real-time data querying.
To validate its data collecting and querying functionalities and performance, the proposed DDS is evaluated in two case studies regarding as imulated smart home system, the ﬁrst case devoted to evaluating data collection and aggregation when the DDS is interacting with the UIoT middleware, and the second aimed at comparing the DDS data collection with this same functionality implemented within the Kaa middleware.
BIG DATA MANAGEMENT FOR IOT MIDDLEWARE
Figure 1 shows a general view of the different layers that compose an IoT middleware. In general, the upper layer is a component that has direct interaction with the application layer within the IoT architecture. The application layer receives request messages and sends responses related to services provided by the middleware. The lower layer interacts with the physical layer and exchanges binary information and control commands/responses with the physical devices.
Much effort has been made in the area of IoT data storage and processing, indicating that one central objective in this domain is to efﬁciently gather data generated by heterogeneous devices, then processing and storing both original and resulting data in a persistent data store. As mentioned before, to achieve this, IoT middleware systems usually implement a data collector. In a data collector, speciﬁcally projected for an IoT environment, is available through a REST API. Data are received in a message array format, and the collector splits these into single message packets and authenticates the sensor. After having been identiﬁed, the packets are put into a message queue in order to be processed.
DISTRIBUTED DATA SERVICE FOR IOT MIDDLEWARE
In Figure 3, processes related to consumers, ﬁlters and metadata are designed to handle large processing volumes since they can adapt to different parallel computing levels depending on environmental needs. These processes are executed according to a speciﬁc data ﬂow, beginning within the consumer, passing through the ﬁlter to reach the metadata creator. The modules for data capture, data ﬁltering and metadata creation can be instantiated in multiple different processes, each one with its respective consumers, ﬁlters and metadata.
Figure 5 gives an example of data compaction during a small time window, when there is a constant intense data ﬂow, and also in a larger time window for less intense trafﬁc. The idea is that, independent of the time window, the set of data will be grouped and then ordered continuously. In Phase 1, the time window t0 presents nine found data groups. In the second phase, the nine groups of data in the time window t0 are now sorted and compacted, while in time window t1, nine data groups are to be compacted, and two new data groups are arriving to be processed.
IMPLEMENTATION OF THE DDS
To process the large volume of data collected into a topic, the Kafka consumers have to be properly conﬁgured. In general, a Kafka consumer reads data partitions (P0, P1, P2, … PN) as the data arrive, as shown in Figure 8A. However, even achieving the reading of parallel partitions, the degree of Kafka parallelism is not enough to manage a large volume of data provided on a large network of devices/sensors under the control of an IoT middleware. In order to improve parallelism, a modiﬁcation was introduced in how the partitions are read by the Kafka consumer, by deﬁning a consumer for each existing partition, as shown in Figure 8 B. This modiﬁcation increases the degree of parallelism in partition reading, since instead of having only one consumer reading multiple partitions, now there is one consumer dedicated to reading only one partition.
CASE STUDY: SMART HOME SYSTEM SIMULATION
In order to support the huge data volume, the distributed data service was projected to function on a computational cluster based on a messaging system. Then, the DDS is cluster based and uses publish-subscribe messaging to handle read and write operations. One of the cluster nodes is selected as a master server node for the data collector. The other cluster nodes are slave servers that receive messages from the master server for storing and processing purposes. It is important to note that a physical cluster, as shown in Figure 9, was implemented. This cluster internally supports three virtual clusters, respectively: the Kafka, the Storm and the Cassandra cluster.
It is worth high lighting that the cluster environment was implemented over four virtual machines, as shown in Figure 10. This cluster internally supports two virtual clusters, respectively the Kaa and the MongoDB cluster.
Figure 11 presents the results, with the synchronous producer showing that the performance is proportional to the number of producers being executed. In addition, the number of homes that can be supported in the synchronous scenario is presented. For example, for two producers, the number of messages per second is 20,000 (2934 homes).
Unlike the UIoT-DDS simulation, the Kaa simulation involved exclusively the creation of synchronous messages. This is due to the fact that data collection in a Kaa platform can only store the received messages in the database and then reply with a conﬁrmation or acknowledgment back to the endpoint. Figure 13 displays the relationship between the message receiving speed and the number of homes supported by different numbers of Kaa nodes.
In this section, we intend to analyze ﬁrst the data collection and data aggregation results of the UIoT-DDS study, second, the Kaa collector results and, ﬁnally, the data collection comparing the UIoT-DDS to the Kaa collector in terms of data ingestion of a huge data volume. This comparison shows the better performance of DDS when facing a huge volume of data coming from different sources. It is important to highlight that for the sake of fairness, this comparison is done in terms of data ingesting when UIoT-DDS operates synchronously, since the Kaa collector only operates in synchronous mode.
CONCLUSIONS AND FUTURE WORKS
The next generation of the Internet is moving in an important developmental direction, as the IoT increasingly captures the attention of industry and academia. Selecting and applying database middleware technology, in a reasonable way, is a key question to solving the problem of IoT data management for a massive volume of data generated in real time. In this context, our results show that the designed DDS supports data management for IoT middleware performs well in collecting and retrieving data from an IoT middleware.
The choice of components and their articulation for collaborative data treatment are key factors for high velocity data collecting and analyzing. Besides this, the proposed DDS is able to process a variety of data coming from different IoT environments through a speciﬁc communications interface and metadata creation modules. Finally, the proposed DDS was shown to have better performance compared with a typical data collector, the Kaa middleware.
Source: University of Brasilia
Authors: Ruben Cruz Huacarpuma | Rafael Timoteo De Sousa Junior | Maristela Terto De Holanda | Robson De Oliveira Albuquerque | Luis Javier Garcia Villalba | Tai-hoon Kim