Distributed processing is a fast growing area of interest due to the exploding popularity of Internet of Things (IoT) and Unmanned Aerial Vehicles (UAV) technologies. IoT is a distributed processing structure by nature, while UAVs evolve from single-UAV applications towards multiple-UAV (teams). The demand for processing capabilities is expanding as well. The general purpose processors (e.g. CPUs) can be used for any type of application, however this flexibility is at the cost of operational efficiency. Application Specific Integrated Circuits (ASICs) are designed for certain types of application and have great operational efficiency, but they rarely can be used for other applications.
The reconfigurable chips–Field Programmable Gate Arrays (FPGAs) provide high operational efficiency along with the application flexibility–as they can be reprogrammed with the functionality that is required at the given time. All the above listed aspects are combined in the distributed processing system that is expected to consume low amount of electrical energy. This dissertation proposes a comprehensive solution for the problem of distributed processing equipped with reconfigurable units. The complete and detailed architecture is provided for each element. The design includes operational algorithms that together with the architecture constitute a complete solution for the stated problem.
The design of the units is flexible and allows any number and combination of CPUs, ASICs or FPGAs. Units in the proposed design are autonomous–the decisions are taken by individual units, instead of the central node, which is marginalized. The decentralized and autonomous approach provides more flexible and reliable design that is especially important for IoT and teamed UAV applications. The efficiency of the proposed solutions is defined as electrical energy consumption and operation time span, and is measured using dedicated experimentation system through numerous simulations.
Distributed processing systems operate on various scale: starting from System-on-Chip local structures, through multiprocessor structures (local), systems with the nodes located in the vicinity, to the big geographically dispersed structures, with many nodes present on various continents. The author has presented solutions for multiprocessor distributed processing systems [CZE12], public distributed computation systems (geographically spread nodes communicating over the internet) [CWT12], [CW12] and others. The author has also addressed the energy consumption aspects for the reconfigurable systems in [CSG13] and the approach of using the DHT in the distributed computing system with multiple data sources showing the efficiency of the proposed decentralization approach.
Distributed systems face multiple design problems, including architectural design, hardware limitations and efficiency. On top of that, energy consumption is the common consideration, especially critical for distributed systems based on mobile nodes or those using limited energy sources (battery power, combustible engines).
RECONFIGURABLE SYSTEM ARCHITECTURE–DPRS
Architecture of the proposed approach involves decentralized management. Instead of concentrating on controlling designated nodes (and dividing the roles in the system to management and non-management), it is proposed to move the management to the nodes to increase flexibility. This way each node can submit the computational task (application), which it manages as in the figure (Fig. 6).
The relations between objects interfaces are defined in the same way. IIS provides a complete definition of the inter-object communication relations layer and lets designers to separate object to-object relations from the hardware design (Fig. 8).
EXPERIMENTAL STUDY AND ANALYSIS
The selection of AL_RECONFIGURE_FPGA also impacts the processing time. AL_RECONFIGURE_FPGA_1 required the longest time of processing, and the difference compared to the remaining two algorithms increased with the increase of the task(s) size. AL_RECONFIGURE_FPGA_2 is 31%-38% faster, and AL_RECONFIGURE_FPGA_3 is 27% 44% faster – compared to AL_RECONFIGURE_FPGA_1 algorithm (Fig. 48).
Lower values of deciding reconfiguration, downloading bitstream and programming bitstream mean that the processing unit/node spends less time for non-processing self-management tasks, therefore improving the efficiency. Adding control nodes and obtaining knowledge about tasks are performed at the level of the entire node and are not considered to be related specifically to processing units. The detailed values of the average utilization per operation is shown in Tab. 9, Fig. 70.
Regarding node operations, Fig. 72 and Fig. 73 present the distribution of the operations for a) and b) cases respectively. Same as in Fig. 70 and Fig. 71, the operations are averaged using the same procedure. For a), the large idle time can be observed (17%) while in the optimized case b) idle stays below 1%. The share for processing blocks online is very similar (differences are the result of different moments of time when online blocks started processing, while they process till the end of the task processing).
The problem of optimization in distributed processing systems with reconfigurable computing elements has been presented and addressed in this work. Hardware architecture has been proposed in detail, providing a comprehensive description of the solution. Logical formats of the elements have also been designed and presented. The physical and logical structures have been designed in a layered and modular fashion, so that the operational algorithms possessed the flexibility to replace and use any transmission layer and/or data format.
In addition, the operational algorithms have been designed, described and experimented. The proposed solutions have been tested both separately and together (when possible), to clearly show their impact. Next key aspect of the proposed work is the design of the application definition that allows to describe various types of applications and execute them in the distributed DPRS environment. All those elements together, proposed, designed and tested in this work, constitute a complete distributed processing system with the ability to reconfigure, which is also partly decentralized and provides extensive autonomy to its components.
The investigated system included multiple mechanisms and most of the parts and mechanisms could be optimized using separate algorithms. Lowering the operational cost in one area could lead to cost increase in another, or cause difficulty in the other parts optimization. Therefore, the design of the operating principles and algorithms, as well as the system structure is very challenging. Despite these elements, the system parameters and applications properties are also greatly influencing system operation and energy expenditure.
Source: University of Nevada
Author: Grzegorz Chmaj