ENGINEERING RESEARCH PAPERS

CUDA-Compute Unified Device Architecture 2016 IEEE PAPER




Accelerating Genetic Algorithm Using General Purpose GPU and CUDA
free download

Abstract Genetic Algorithm (GA) is one of most popular swarm based evolutionary search algorithm that simulates natural phenomenon of genetic evolution for searching solution to arbitrary engineering problems. Although GAs are very effective in solving many practical

Review of LLVM Compiler Architecture Enhancements for CUDA
free download

Abstract:Heterogeneous platforms are now becoming increasing omnipresent due to the availability of multicores at commodity prices. In order to benefit from the immense parallel capability of these multicores, more and more applications are now being

Implementation of Sorting Algorithms with CUDA: An Empirical Study
free download

Abstract: Sorting algorithms have been studied for more than 3 decades now. The aim of this paper is to implement some of the sorting algorithms using the CUDA language in a GPU environment provided by the Nvidia graphics cards. This empirical study is done for

of the Base Technology for the Smart Grid Security: Focusing on a Performance Improvement of the Basic Algorithm for the DDoS Attacks Detection Using CUDA
free download

ABSTRACT Since the development of Graphic Processing Unit (GPU) in 1999, the development speed of GPUs has become much faster than that of CPUs and currently, the computational power of GPUs exceeds CPUs dozens and hundreds times in terms of

Novel Method to Improve ACO Performance on the GPU Using CUDA for Nurse Roster Scheduling Problem
free download

Abstract: This paper shows the accomplishment of parallel Ant Colony Optimization algorithm on the Graphics Processing Unit (GPU) to solve nurse roster scheduling problem (NRSP). We put on the Schedule formation and pheromone update phases of Ant colony

Geometrical Modeling of Facial Regions and CUDA based Parallel Face Segmentation for Emotion Recognition
free download

Abstract Human emotions are expressed through body gestures, voice variations and facial expressions. Research in the area of facial expression recognition has been active for last 20 years for improving the system performance. This work proposes a novel geometrical

simCUDA: A C++ based CUDA Simulation Framework
free download

The primary objective of this report is to develop a CUDA simulation framework (simCUDA) that effectively maps the existing application written in CUDA to be executed on top of standard multi-core CPU architectures. This is done by specifically annotating the

Performances of a Parallel CUDA Program for a Biorthogonal Wavelet Filter
free download

Abstract:Parallel high-performance computing technologies have encountered tremendous growth, especially in the last decade, and they have made a strong impact in a variety of areas concerning mathematical and engineering fields. In this work we analyze

Efficient Parallel Implementation of Single Source Shortest Path Algorithm on GPU Using CUDA
free download

Abstract In today s world there are number of applications like routing in telephone networks, traveller information system, robotic path selection etc., where data can be represented as a graph and different graph algorithms are executed on it to fulfil the

CUDA Based Speed Optimization of the PCA Algorithm
free download

Abstract–Principal Component Analysis (PCA) is an algorithm involving heavy mathematical operations with matrices. The data extracted from the face images are usually very large and to process this data is time consuming. To reduce the execution time of these operations,

CUDA Accelerated Real-time Digital Image Stabilization in a Video Stream
free download

Abstract The most important step for successful video processing in computer vision is its stabilization. Often, it is required to process high resolution video in a real-time. In this paper, a new method for a real-time digital image stabilization in a video stream is presented.

Enabling predictable parallelism in single-GPU systems with persistent CUDAthreads
free download

Abstract:Graphics Processing Unit, or GPUs, have been successfully adopted both for graphic computation in 3D applications, and for general purpose application (GP-GPUs), thank to their tremendous performance-per-watt. Recently, there is a big interest in

Modified Levels of Parallel Odd-Even Transposition Sorting Network (OETSN) with GPU Computing using CUDA
free download

ABSTRACT Sorting huge data requires an enormous amount of time. The time needed for this task can be minimised using parallel processing devices like GPU. The odd-even transposition sorting network algorithm is based on the idea that each level uses an equal

6D Tracking with Compute Unified Device Architecture (CUDA) Technology
free download

Abstract A program code TrackKing for a 6D fully-coupled particle tracking in circular accelerators has been developed with the usage of parallel computations on Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA). We can track

NTRU Modular Lattice Signature Scheme on CUDA GPUs
free download

Abstract. In this work we show how to use Graphics Processing Units (GPUs) with Compute Unified Device Architecture (CUDA) to accelerate a lattice based signature scheme, namely, the NTRU modular lattice signature (NTRU-MLS) scheme. Lattice based schemes require

Programming GPUs with CUDA
free download

Rice University johnmc@rice.edu Programming GPUs with CUDA COMP 422 Lecture 23 12 April 2016 Page 2. Why GPUs 6 (Govindaraju, Manocha; 2005) Page 7. CUDACUDA = Compute Unified Device Architecture Software platform for parallel computing on Nvidia GPUs

Further topics on SWE and CUDA
free download

The shallow water equations describe the large scale evolution of water (or other liquid) waves, affected by gravity (and bathymetry) in a vertically integrated domain. Vertical flows are neglected. The unknowns in the equations are water height and momentum. What

Iris Recognition for Secured Internet Banking Using CUDA on GPU
free download

Abstract:Iris continues to become one of the emerging methods of biometric-based identification systems as the need for security system keeps on increasing day-by-day. With a few modifications, this project explains the iris recognition systems developed by John

GPU-BASED IMAGE PROCESSING AND COMPUTER VISION USING CUDA
free download

Abstract: Graphics and vision are inverse approximate each other, ordinarily Graphics processing unit (GPU) are used to convert 'numbers into the pictures'(computer graphics). In this paper we study the use of GPU in nearly The reverse way to assist in converting

Accelerating Rabin Karp algorithm on a multicore GPU using CUDA
free download

Abstract String Matching algorithms are responsible for finding occurrences of a pattern within a large text. Many areas of Computer Science require demanding string-matching procedures. Increasing the efficiency of string matching algorithm will automatically

Application of CUDA technology for calculation of ground states of few-body nuclei by Feynman's continual integrals method
free download

The possibility of application of modern parallel computing solutions to speed up the calculations of ground states of few-body nuclei by Feynman's continual integrals method has been investigated. These calculations may sometimes require large computational

A Parallel Version of Tree-Seed Algorithm (TSA) within CUDA Platform
free download

ABSTRACT: Recent years, the general purpose computing on graphical processing unit (GPGPU) has gained a huge popularity. The usage of GPGPU becomes widespread due to the fact that the production technology of central processing unit (CPU) reaches the

Accelerating High Arithmetic Intensity Storm Surge Model using CUDA
free download

Abstract GPUs (Graphic Processing Units) have opened the floodgates for high-performance computing, especially for programs that involve high-level arithmetic computations. Though parallel computing capability has been around for many years, it requires large and

Satellite image processing using CUDA and Hadoop architecture
free download

Abstract:With the advancement in digitalization vast amount of Image data is uploaded and used via Internet in today's world. With this revolution in uses of multimedia data, key problem in the area of Image processing, Computer vision and big data analytics is how to

Accelerated Transport System Simulation using CUDA
free download

Provides a high level interface for describing agents, abstracting the CUDA programming model [4] to efficiently render large population of individuals Accelerated Transport System Simulation using CUDA Peter Heywood, Paul Richmond, Steve Maddock and Matthew Jung

Real Time Processing of Microphone Array Information Applying GPU Unit andCUDA Platform
free download

Abstract:-Microphone arrays are the basic audio sensors delivering the appropriate information from which is possible to determine the direction of sound source arrival. There are a lot of methods and algorithms proposed for effective microphone array information

EFFICIENT IMAGE PROCESSING USING REACTION-DIFFUSION CNN IMPLEMENTED IN CUDA TECHNOLOGY
free download

This paper proposes an implementation model for reaction-diffusion Cellular nonlinear networks (RD-CNN) on CPU and GPU platforms. Efficient implementations are proposed in order to speed-up the computational model of the RD-CNN using nVidia's CUDA platform,

CUDA and GPU Acceleration of Image Processing
free download

CUDA stands for the" Compute Unified Device Architecture", which is a free software platform provided by NVidia. It enables users to control GPUs by writing programs akin to C++. All CUDA software can be downloaded from CUDA Zone. CUDA is very similar in

AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
free download

ABSTRACT Graphics Processing Units (GPUs) have been emerged as powerful parallel compute platforms for various application domains. A GPU consists of hundreds or even thousands processor cores and adopts Single Instruction Multiple Threading (SIMT)

Optimizing parallel reduction in CUDA
free download

Page 1. Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf Tuesday, September 11, 12 Page 2. 3 Parallel Reduction

Efficient sparse matrix-vector multiplication on CUDA
free download

Abstract The massive parallelism of graphics processing units (GPUs) offers tremendous performance in many high-performance computing applications. While dense linear algebra readily maps to such platforms, harnessing this potential for sparse matrix computations

NVIDIA CUDA software and GPU parallel computing architecture
free download

Page 1. NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Page 2. NVIDIA Corporation 2006-2008 2 Outline Page 7. NVIDIA Corporation 2006-2008 7 Millions of CUDA-enabled GPUs Total GPUs (millions) 25 50 2006 2007

Image convolution with CUDA
free download

Abstract Convolution filtering is a technique that can be used for a wide array of image processing tasks, some of which may include smoothing and edge detection. In this document we show how a separable convolution filter can be implemented in NVIDIA

GPU computing with NVIDIA CUDA.
free download

Page 1. GPU Computing with NVIDIA CUDA Ian Buck NVIDIA Page 2. Lush, Rich Worlds Stunning Graphics Realism Full Spectrum Warrior: Ten Hammers 2006 Pandemic Studios, LLC. All rights

Optimizing matrix transpose in CUDA
free download

The reader should be familiar with basic CUDA programming concepts such as kernels, threads, and blocks, as well as a basic understanding of the different memory spaces accessible by CUDA threads. A good introduction to CUDA programming is given in the

Particle simulation using cuda
free download

Abstract Particle systems [1] are a commonly used technique for simulating physical systems. In this document we will describe how to efficiently implement a particle system in

Automated dynamic analysis of CUDA programs
free download

ABSTRACT Recent increases in the programmability and performance of GPUs have led to a surge of interest in utilizing them for general-purpose computations. Tools such as NVIDIA's Cuda allow programmers to use a C-like language to code algorithms for

Enabling task parallelism in the cuda scheduler
free download

Abstract General purpose computing on graphics processing units (GPUs) introduces the challenge of scheduling independent tasks on devices designed for data parallel or SPMD applications. This paper proposes an issue queue that merges workloads that would

Cuda particles
free download

Abstract Particle systems [1] are a commonly used technique for simulating physical systems. In this document we will describe how to efficiently implement a particle system in

Introducing CURRENNT–the Munich open-source CUDA RecurREnt neural network toolkit
free download

Abstract In this article, we introduce CURRENNT, an open-source parallel implementation of deep recurrent neural networks (RNNs) supporting graphics processing units (GPUs) through NVIDIA's Computed Unified Device Architecture (CUDA). CURRENNT supports

On implementing graph cuts on cuda
free download

Abstract:The Compute Unified Device Architecture (CUDA) has enabled graphics processors to be explicitly programmed as general-purpose shared-memory multi-core processors with a high level of parallelism. In this paper, we present our preliminary

Accelerating matlab with cuda
free download

MATLAB ‡ is a powerful tool for prototyping and analysis. MATLAB could be easily extended via MEX files to take advantage of the computational power offered by the latest NVIDIA graphics processor unit (GPU). The graphic processor can be considered as a

Distributed genetic programming on GPUs using CUDA
free download

Abstract Using of a cluster of Graphics Processing Unit (GPU) equipped computers, it is possible to accelerate the evaluation of individuals in Genetic Programming. Program compilation, fitness case data and fitness execution are spread over the cluster of

Discrete cosine transform for 8x8 blocks with CUDA
free download

Abstract In this whitepaper the Discrete Cosine Transform (DCT) is discussed. The two- dimensional variation of the transform that operates on 8x8 blocks (DCT8x8) is widely used in image and video coding because it exhibits high signal decorrelation rates and can be

Compute unified device architecture (CUDA) based finite-difference time-domain (FDTD) implementation
free download

Abstract:Recent developments in the design of graphics processing units (GPUs) have made it possible to use these devices as alternatives to central processor units (CPUs) and perform high performance scientific computing on them. Though several implementations

Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA.
free download

Computational scientific simulations have long used parallel computers to increase their performance. Recently graphics cards have been utilised to provide this functionality. GPGPU APIs such as NVIDIA's CUDA can be used to harness the power of GPUs for

Imaging earth's subsurface using CUDA
free download

The main goal of earth exploration is to provide the oil and gas industry with knowledge of the earth's subsurface structure to detect where oil can be found and recovered. To do so, large-scale seismic surveys of the earth are performed, and the data recorded undergoes

Development of a CUDA Implementation of the 3 D FDTD Method
free download

Abstract The use of general-purpose computing on a GPU is an effective way to accelerate the FDTD method. This paper introduces flexibility to the theoretically best available approach. It examines the performance on both Tesla-and Fermiarchitecture GPUs, and

Optimizing cuda
free download

Page 1. S05: High Performance Computing with CUDA Optimizing CUDA Mark Harris NVIDIA Developer Technology Page 2. 2 S05: High Performance Computing with CUDACUDA is fast and efficient CUDA enables efficient use of the massive parallelism of NVIDIA GPUs

High performance computing with CUDA
free download

Page 1. High Performance Computing with CUDA Massimiliano Fatica NVIDIA Corporation Page 2. GPU Performance History Page 19. Oil Gas Finance Medical Biophysics Numerics Audio

General-purpose sparse matrix building blocks using the NVIDIA CUDAtechnology platform
free download

Abstract:We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floatingpoint co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear

CUSVM: A CUDA implementation of support vector classification and regression
free download

Abstract. This paper presents cuSVM, a software package for high-speed Support Vector Machine (SVM) training and prediction that exploits the massively parallel processing power of Graphics Processors (GPUs). cuSVM is written in NVIDIA's CUDA C-language GPU

CUDA/OpenGL fluid simulation
free download

Abstract This document describes an NVIDIA CUDA implementation of a simple fluids solver for the Navier-Stokes equations for incompressible flow. The CUDA algorithms are based on Jos Stam's FFT-based Stable Fluids system [1], and we refer the reader to this paper for

GPU acceleration of the long-wave rapid radiative transfer model in WRF usingCUDA Fortran
free download

Abstract. This paper presents the approach and results of porting the Long-Wave Rapid Radiative Transfer Model (RRTM) component of the Weather Research and Forecast (WRF) code to the GPU using CUDA Fortran. After a brief description of the RTTM code,

Numerical simulation of the complex Ginzburg-Landau equation on GPUs withCUDA
free download

ABSTRACT The Time Dependent Ginzburg Landau (TDGL) equation models a complex scalar field and is used to study a variety of different physical systems and exhibits phase transitional behaviours that necessitate study using numerical simulation methods. We

Implementation of a simple genetic algorithm within the cuda architecture
free download

The increasing interest of researchers in using low cost GPUs for applications requiring intensive parallel computing is due to the ability of these devices to solve parallelizable problems much faster than traditional sequential processors. The first applications of

cuHMM: a CUDA implementation of hidden Markov model training and classification
free download

Hidden Markov model (HMM) as a sequential classifier has important applications in speech and language processing [Rab89][JM08] and biological sequence analysis [Kro98]. In this project, we analysis the parallelism in the three algorithms for HMM training and

Interactive ray tracing with CUDA
free download

Page 1. Interactive Ray Tracing with CUDA David Luebke and Steven Parker NVIDIA Research Page 2. Ray Tracing Rasterization Build Page 33. remove100 NVIDIA 2008 33 Key Parallel Abstractions in CUDA 0. Zillions of lightweight threads Simple decomposition model

cudaBayesreg: Bayesian computation in CUDA
free download

Abstract Graphical processing units are rapidly gaining maturity as powerful general parallel computing devices. The package cudaBayesreg uses GPU–oriented procedures to improve the performance of Bayesian computations. The paper motivates the need for devising

Multi-view range image registration using CUDA
free download

Abstract: In this paper, we propose a real-time and on-line 3D registration system which acquires and registers multiview range images simultaneously. The proposed system implements a 3D registration technique using GPU programming techniques. To register

Accelerating braided b+ tree searches on a gpu with cuda
free download

Abstract. Previous work has shown that using the GPU as a brute force method for SELECT statements on a SQLite database table yields significant speedups. However, this requires that the entire table be selected and transformed from the B-Tree to row-column format.

Realtime dense stereo matching with dynamic programming in CUDA
free download

Abstract Real-time depth extraction from stereo images is an important process in computer vision. This paper proposes a new implementation of the dynamic programming algorithm to calculate dense depth maps using the CUDA architecture achieving real-time

Particle swarm optimization within the CUDA architecture
free download

The increasing interest of researchers in using low cost GPUs for applications requiring intensive parallel computing is due to the ability of these devices to solve parallelizable problems much faster than traditional sequential processors. The first applications of

Performance tuning for CUDA-accelerated neighborhood denoising filters
free download

Abstract:Neighborhood denoising filters are powerful techniques in image processing and can effectively enhance the image quality in CT reconstructions. In this study, by taking the bilateral filter and the non-local mean filter as two examples, we discuss their

Accelerating kernel density estimation on the GPU using the CUDA framework
free download

Abstract The main problem of the kernel density estimation methods is the huge computational requirements, especially for large data sets. One way for accelerating these methods is to use the parallel processing. Recent advances in parallel processing have

Benchmarking the NVIDIA 8800GTX with the CUDA Development Platform
free download

Signal length performance analysis–Long signals (tests 1, 3, 4): GPU calculation 1.5-16x faster than CPU–Short signal (test 2): CPU is faster by a factor of 2 Filter size performance

Gpu parallel computing architecture and cuda programming model
free download

CTA threads run concurrently SM assigns thread id# s SM manages thread execution CTA threads share data results In Memory and Shared Memory Synchronize at barrier instruction Per-CTA Shared Memory Keeps data close to processor Minimize trips to

CUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications.
free download

Abstract Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert