Wireless has become one of the most pervasive core technologies in the modern world. Demand for faster data rates, improved spectrum efficiency, higher system access capacity, seamless protocol integration, improved security and robustness under varying channel environments has led to the resurgence of programmable software defined radio (SDR) as an alternative to traditional ASIC based radios. Future SDR implementations will need support for multiple standards on platforms with multi-Gb/s connectivity, parallel processing and spectrum sensing capabilities. This dissertation implemented key technologies of importance in addressing these issues namely development of cost effective multi-mode reconfigurable SDR and providing a framework to map sequential wireless communication algorithms to the parallel domain.
Initially, a novel software defined radio platform using commercial off-the shelf components was successfully developed. This hybrid platform consists of an USRP N210 device performing the role of an RF front end, an NVIDIA Quadro 600 GPU functioning as the parallel computing node, and a commodity PC with PCIe backplane as the high-speed interconnect. Validation of the architectural concepts was demonstrated through real-world applications on the GNURadio software. Performance analysis and benefits of the proposed architecture over other custom solutions was also demonstrated. In the second project, we demonstrate an important application of GPU technology to SDR systems, namely the polyphase channelizer.
The proposed channelizer architecture exploits block and thread level processing in the GPU and delivers high throughput, arbitrary re-sampling of multiple channels. These characteristics make it attractive for a variety of communication receiver algorithms. Finally the third project will deal with critical high data rate data-flow between radio peripheral devices and parallel processing resources. Software routines for this project were written in C++ and are based on the UHD code from Ettus Research. In addition to enabling transfer of data, the software is also responsible for configuring the USRP devices. Analysis of performance metrics and data-flow bottlenecks, show the proposed architecture is capable of meeting the demanding requirements of current wireless standards.
SOFTWARE DEFINED RADIO BACKGROUND
The transistor radio is a prime example of an evolutionary analog device that was revolutionary in providing to a mass market a small, portable, hand held devices for receiving transmitted AM and FM radio waves. With billions produced, they are the most popular wireless communication device to date.
In a direct conversion receiver architecture also known as the homodyne receiver or the zero-IF receiver, the received RF signal is directly down converted to baseband using a single mixing stage. The received signal is first bandpass filtered, amplified by the LNA and then mixed using an I/Q demodulator. I/Q demodulators are required when detecting phase or frequency modulated signals.
SDR SYSTEM ARCHITECTURE AND PROTOTYPE
The BEE2 has three primary components: processing elements, memory elements, and interconnects. The primary and only processing element is the Xilinx Virtex-2 Pro 70 FPGAs. Each FPGA embeds a PowerPC 405 core thereby maximizing throughput and minimizing latency between the reconfigurable logic and the processor.
The massive discrepancy in floating point capability between CPU and GPU is due to the fact that the GPU is specialized for smaller size, compute intensive, massively parallel computation and therefore has more transistors devoted to processing rather than data caching and flow control. This is schematically illustrated in the Figure 22 above.
CUDA BASED PARALLEL PROCESSING FOR COMMUNICATIONS
The filtering process can be performed by multiply-add operation between the data samples and filter coefficients stored in two registers. A conceptual digital filter is shown in above figure.
A block diagram of the filter bank implementation for the two special cases is shown in Figure 44 and Figure 45.with output buffer to change the final sample rate.
Once the data has been re-arranged to be compatible with the row-wise operation of the PFB, the filtering kernel can be called. All the intermediate data structures between the shifting kernel and the filtering kernel are still resident on the device memory and are not mapped to the host memory. This will eliminate unnecessary overheads associated with host to device data transfers. The filtering kernel performs a lot of read and write operations from the global memory, so the data is moved to the shared memory for more efficient access.
FRAMEWORK FOR DATA TRANSFER BETWEEN USRP AND GPU
UHD functions can be split into two parts: (1) functions related to control and configuration and (2) API’s related to streaming data samples to and from the devices. A top level architecture of the UHD driver is given below (see Figure 48).
The UHD “transmit” and “receive” commands are blocking calls, meaning that it is not possible to setup the USRP2 and transmit at the same time. When a transmission begins, the USRP2s setting register which controls the transmission is set, and the Ethernet black box reroutes the data to the VITA deframer, which ensures that the data is correctly transmitted.
CONCLUSION AND FUTURE WORK
This work introduced as a first step, a novel reference architecture for a software defined radio platform, with a goal of providing a re-programmable, commodity commercial component based architecture that can be tailored for the broadest range of wireless communication applications, including; single narrowband channels, multiple narrow band channel repeater or base-station, singular cellular telephone signaling or cellular telephone base station, multi-format, multi-band home wireless access point or as an emergency wireless access point to restore wireless communication infrastructure.
The architecture employs one or more USRP devices to provide RF signal transmission and reception, high-speed interconnection sufficient to handle required data distribution and connection, GPU parallel processing accelerators, and multi-threaded, multi-core processor PC back-ends. A prototype version of this SDR was then built and tested so that the critical aspects and functionality could be demonstrated, assessed, and verified. In this work, the use of GPU acceleration for communication algorithm signal processing was demonstrated. The demonstration algorithm selected was for a wide-bandwidth polyphase filter bank channelizer.
Source: Western Michigan University
Author: Lalith Narasimhan