Skip to main content

You're using an out-of-date version of Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Gabriel  Falcao
    • In the multicore era the potential to increase the processing speed of compute-intensive applications is high. This i... moreedit
    Download (.pdf)
    The recent Digital Video Satellite Broadcast Standard (DVB-S2) has adopted a powerful FEC scheme based on the serial concatenation of Bose-Chaudhuri-Hocquenghen (BCH) and low-density parity-check (LDPC) codes. The high-speed requirements,... more
    The recent Digital Video Satellite Broadcast Standard (DVB-S2) has adopted a powerful FEC scheme based on the serial concatenation of Bose-Chaudhuri-Hocquenghen (BCH) and low-density parity-check (LDPC) codes. The high-speed requirements, long block lengths and adaptive encoding defined in the DVB-S2 standard, present complex challenges in the design of an efficient codec hardware architecture. In this paper, synthesizable, high throughput, scalable
    The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose... more
    The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.
    Research Interests:
    Download (.pdf)
    A new strategy is proposed for implementing computationally intensive high-throughput decoders based on the long length irregular LDPC codes adopted in the DVB-S2 standard. It is supported on manycore graphics processing unit (GPU)... more
    A new strategy is proposed for implementing computationally intensive high-throughput decoders based on the long length irregular LDPC codes adopted in the DVB-S2 standard. It is supported on manycore graphics processing unit (GPU) architectures, for performing parallel multi-threaded decoding of multiple codewords with reduced accesses to global memory. This novel approach is flexible and scal-able, and achieves throughputs superior to the 90 Mbit/s required by the DVB-S2 standard, while at the same time it improves error-correcting performances such as BER and error floors regarding conventional VLSI-based decoders.
    Research Interests:
    Download (.pdf)
    A novel wide-pipeline LDPC decoder approach for the WiMAX standard (802.16e) is proposed for execution on FPGA, using a high-level synthesis tool to reduce the development effort and design validation time that generates a wide-pipeline... more
    A novel wide-pipeline LDPC decoder approach for the WiMAX standard (802.16e) is proposed for execution on FPGA, using a high-level synthesis tool to reduce the development effort and design validation time that generates a wide-pipeline architecture. We develop optimized OpenCL-based kernels and analyze the integration of distinct configurations of SIMD and compute units to increase the level of parallelism. The decoding throughput surpasses the minimal requirements of 75 Mbit/s, a key figure of merit that ranks our design with other VLSI-based approaches. Furthermore, extra precision is deployed with 8-bit fixed-point arithmetic, delivering superior bit error rate performance and lower error floor regions.
    Research Interests:
    Download (.pdf)
    ... Gabriel Falcão Instituto de Telecomunicações Electrical & Comp. Eng. Dep. University of Coimbra, Portugal gff@co.it.pt Vitor Silva Instituto de Telecomunicações Electrical & Comp. Eng. Dep. University of Coimbra, Portugal... more
    ... Gabriel Falcão Instituto de Telecomunicações Electrical & Comp. Eng. Dep. University of Coimbra, Portugal gff@co.it.pt Vitor Silva Instituto de Telecomunicações Electrical & Comp. Eng. Dep. University of Coimbra, Portugal vitor@co.it.pt ...
    Recently, the development of Low-Density Parity-Check (LDPC) decoding solutions has been proposed for a vast set of architectures, ranging from dedicated hardware to fully programmable ones (e.g. Cell/B.E. and graphics processing units).... more
    Recently, the development of Low-Density Parity-Check (LDPC) decoding solutions has been proposed for a vast set of architectures, ranging from dedicated hardware to fully programmable ones (e.g. Cell/B.E. and graphics processing units). In this paper we propose efficient embedded programmable multicore architectures for achieving real-time LDPC decoding. The proposed multicore architectures allow to exploit data parallelism by decoding in parallel multiple codewords on the provided cores, with enough local memory capacity to store all data corresponding to the Tanner graph. Therefore, with this distributed memory and local computing approach, only a single shared bus is required to communicate the codewords. The proposed class of architectures can be prototyped on field programmable gate arrays or implemented on application specific integrated circuits, and it is validated by using the popular Cell processor, which relates very closely with the one here proposed. Finally, we discuss the related art of dedicated and programmable LDPC decoders, and discuss the advantages and disadvantages regarding the proposed solution.
    The recent Digital Video Satellite Broadcast Standard (DVB-S2) [1] [2] has adopted a powerful FEC scheme based on the serial concatenation of BCH and Low Density Parity Check (LDPC) codes. This new FEC structure, combined with the... more
    The recent Digital Video Satellite Broadcast Standard (DVB-S2) [1] [2] has adopted a powerful FEC scheme based on the serial concatenation of BCH and Low Density Parity Check (LDPC) codes. This new FEC structure, combined with the adoption of high order ...
    Abstract. Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digi-tal Video Broadcasting... more
    Abstract. Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digi-tal Video Broadcasting (DVB-S2) and WiMAX. LDPC codes ...
    Abstract In this paper we propose to show how signal processing algorithm designers can understand the nuances of multicore computing engines in order to conveniently exploit these powerful devices. This is illustrated by presenting... more
    Abstract In this paper we propose to show how signal processing algorithm designers can understand the nuances of multicore computing engines in order to conveniently exploit these powerful devices. This is illustrated by presenting source and channel coding, two ...
    Download (.pdf)
    Recently, the development of Low-Density Parity-Check (LDPC) decoding solutions has been proposed for a vast set of architectures, ranging from dedicated hardware to fully programmable ones (e.g. Cell/B.E. and graphics processing units).... more
    Recently, the development of Low-Density Parity-Check (LDPC) decoding solutions has been proposed for a vast set of architectures, ranging from dedicated hardware to fully programmable ones (e.g. Cell/B.E. and graphics processing units). In this paper we propose efficient embedded programmable multicore architectures for achieving real-time LDPC decoding. The proposed multicore architectures allow to exploit data parallelism by decoding in parallel multiple codewords on the provided cores, with enough local memory capacity to store all data corresponding to the Tanner graph. Therefore, with this distributed memory and local computing approach, only a single shared bus is required to communicate the codewords. The proposed class of architectures can be prototyped on field programmable gate arrays or implemented on application specific integrated circuits, and it is validated by using the popular Cell processor, which relates very closely with the one here proposed. Finally, we discuss the related art of dedicated and programmable LDPC decoders, and discuss the advantages and disadvantages regarding the proposed solution.
    Download (.pdf)
    Abstract In this paper we propose to show how signal processing algorithm designers can understand the nuances of multicore computing engines in order to conveniently exploit these powerful devices. This is illustrated by presenting... more
    Abstract In this paper we propose to show how signal processing algorithm designers can understand the nuances of multicore computing engines in order to conveniently exploit these powerful devices. This is illustrated by presenting source and channel coding, two ...
    Download (.pdf)
    Research Interests:
    Download (.pdf)
    Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been adopted by data transmission standards, such as DVB-S2 or WiMax. They are based on binary sparse parity check matrices and usually... more
    Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been adopted by data transmission standards, such as DVB-S2 or WiMax. They are based on binary sparse parity check matrices and usually represented by Tanner graphs. LDPC decoders require very intensive message-passing algorithms, also known as belief propagation. This paper proposes a very compact stream-based data structure to represent such a bipartite Tanner graph, which supports both regular and irregular codes. This compact data structure not only reduces the memory required to represent the graph but also puts it in an appropriate format to gather data into streams. This representation also allows to map the irregular processing behavior of the Sum Product Algorithm (SPA) used in LDPC decoding into the stream-based computing model. Stream programs were developed for LDPC decoding and the results show significant speedups obtained either using general purpose processors, or graphics processing units. The simultaneous decoding of several codewords was performed using the SIMD capabilities of modern stream-based architectures available on recent processing units.
    Recently, the development of Low-Density Parity-Check (LDPC) decoding solutions has been proposed for a vast set of architectures, ranging from dedicated hardware to fully programmable ones (e.g. Cell/B.E. and graphics processing units).... more
    Recently, the development of Low-Density Parity-Check (LDPC) decoding solutions has been proposed for a vast set of architectures, ranging from dedicated hardware to fully programmable ones (e.g. Cell/B.E. and graphics processing units). In this paper we propose efficient embedded programmable multicore architectures for achieving real-time LDPC decoding. The proposed multicore architectures allow to exploit data parallelism by decoding in parallel multiple codewords on the provided cores, with enough local memory capacity to store all data corresponding to the Tanner graph. Therefore, with this distributed memory and local computing approach, only a single shared bus is required to communicate the codewords. The proposed class of architectures can be prototyped on field programmable gate arrays or implemented on application specific integrated circuits, and it is validated by using the popular Cell processor, which relates very closely with the one here proposed. Finally, we discuss the related art of dedicated and programmable LDPC decoders, and discuss the advantages and disadvantages regarding the proposed solution.
    Research Interests:
    Low-Density Parity-Check (LDPC) codes are powerful error correcting codes adopted by recent communication standards. LDPC decoders are based on belief propagation algorithms, which make use of a Tanner graph and very intensive... more
    Low-Density Parity-Check (LDPC) codes are powerful error correcting codes adopted by recent communication standards. LDPC decoders are based on belief propagation algorithms, which make use of a Tanner graph and very intensive message-passing computation, and usually require hardware-based dedicated solutions. With the exponential increase of the computational power of commodity graphics processing units (GPUs), new opportunities have arisen to develop general purpose processing on GPUs. This paper proposes the use of GPUs for implementing flexible and programmable LDPC decoders. A new stream-based approach is proposed, based on compact data structures to represent the Tanner graph. It is shown that such a challenging application for stream-based computing, because of irregular memory access patterns, memory bandwidth and recursive flow control constraints, can be efficiently implemented on GPUs. The proposal was experimentally evaluated by programming LDPC decoders on GPUs using the Caravela platform, a generic interface tool for managing the kernels' execution regardless of the GPU manufacturer and operating system. Moreover, to relatively assess the obtained results, we have also implemented LDPC decoders on general purpose processors with Streaming Single Instruction Multiple Data (SIMD) Extensions. Experimental results show that the solution proposed here efficiently decodes several codewords simultaneously, reducing the processing time by one order of magnitude.
    Download (.pdf)
    Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digital Video Broadcasting (DVB-S2) and... more
    Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digital Video Broadcasting (DVB-S2) and WiMAX. LDPC codes are based on sparse parity-check matrices and use message-passing algorithms, also known as belief propagation, which demands very intensive computation. For that reason, VLSI dedicated architectures have been proposed in the past few years, to achieve real-time processing. This paper proposes a new flexible and programmable approach for LDPC decoding on a heterogeneous multicore Cell Broadband Engine (Cell/B.E.) architecture. Very compact data structures were developed to represent the bipartite graph for both regular and irregular LDPC codes. They are used to map the irregular behavior of the Sum-Product Algorithm (SPA) used in LDPC decoding into a computing model that expresses parallelism and locality of data by decoupling computation and memory accesses. This model can be used in general for exploiting capabilities of modern multicore architectures. For the Cell/B.E., in particular, stream-based programs were developed for simultaneous multicodeword LDPC decoding by using SIMD features and a low-latency DMA-based data communication mechanism between processors. Experimental results show significant throughputs that compare well with state-of-the-art VLSI-based solutions.
    Abstract Low-Density Parity-Check (LDPC) codes are powerful error cor-recting codes (ECC). They have recently been adopted by sev-eral data communication standards such as DVB-S2 and WiMax. LDPCs are represented by bipartite graphs, also... more
    Abstract Low-Density Parity-Check (LDPC) codes are powerful error cor-recting codes (ECC). They have recently been adopted by sev-eral data communication standards such as DVB-S2 and WiMax. LDPCs are represented by bipartite graphs, also called Tanner ...
    ... Gabriel Falcão Instituto de Telecomunicações Electrical & Comp. Eng. Dep. University of Coimbra, Portugal gff@co.it.pt Vitor Silva Instituto de Telecomunicações Electrical & Comp. Eng. Dep. University of Coimbra, Portugal... more
    ... Gabriel Falcão Instituto de Telecomunicações Electrical & Comp. Eng. Dep. University of Coimbra, Portugal gff@co.it.pt Vitor Silva Instituto de Telecomunicações Electrical & Comp. Eng. Dep. University of Coimbra, Portugal vitor@co.it.pt ...
    The recent Digital Video Satellite Broadcast Standard (DVB-S2) [1] [2] has adopted a powerful FEC scheme based on the serial concatenation of BCH and Low Density Parity Check (LDPC) codes. This new FEC structure, combined with the... more
    The recent Digital Video Satellite Broadcast Standard (DVB-S2) [1] [2] has adopted a powerful FEC scheme based on the serial concatenation of BCH and Low Density Parity Check (LDPC) codes. This new FEC structure, combined with the adoption of high order ...
    The scientific session DocGraf '16 aims to be a forum for debate and discussion on the role of digital technologies in research in Architecture and Urbanism. In this first session experts from various fields of knowledge share experiences... more
    The scientific session DocGraf '16 aims to be a forum for debate and discussion on the role of digital technologies in research in Architecture and Urbanism. In this first session experts from various fields of knowledge share experiences where the use of these new tools, specifically from the area of geomatics and computer graphics, proved to be a critical success factor in the projects in which have been integrated.
    The session will also feature the presentation of the project Architectural Democracy that focuses on the relationship between architecture, technology and policy and its implications in the context of citizenship, architectural practice and urban policies. The line of research focuses on ways to use technology to transform buildings in "open-source" interfaces to improve citizens' understanding of the everyday built environment and, therefore, the quality of architecture and citizenship.
    Research Interests:
    Download (.pdf)