ENGINEERING RESEARCH PAPERS

GPU-Graphics Processing Unit 2016 IEEE PAPER




A Study of Flow inside a Centrifugal Pump: High Performance Numerical Simulations Using GPU cards
free download

Abstract The present work reviews calculations of a steady three-dimensional (3D) flow past a centrifugal pump with its diffuser channels using different hardwares. The open source CFD software OpenFOAM has been ported to the GPU platform with double precision. The

3.23 Dynamic Data Structures for the GPU
free download

Today's GPU programming environments feature few general-purpose data structures. Only a handful of those can be constructed on the GPU, and to first order, none of them can be updated on the GPU. We aim to develop a family of GPU data structures that permit

Transparent O loading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems
free download

ABSTRACT Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited o-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to signi cantly alleviate this bottleneck by directly connecting a logic layer to

GSI: A GPU stall inspector to characterize the sources of memory stalls for tightly coupled GPUs
free download

ABSTRACT In recent years the power wall has prevented the continued scaling of single core performance. This has led to the rise of dark silicon and motivated a move toward parallelism and specialization. As a result, energy-efficient high-throughput GPU cores are

Exploiting Core-Criticality for Enhanced GPU Performance
free download

ABSTRACT Modern memory access schedulers employed in GPUs typically optimize for memory throughput. They implicitly assume that all requests from different cores are equally important. However, we show that during the execution of a subset of CUDA applications,

A hybrid GPU technique for real-time terrain visualization
free download

Abstract Real-Time terrain visualization plays an important rule in multiple popular applications like geographical information systems, computer games, or civil or militar simulators, where hardware tessellation has become a de-facto standard nowadays in the

µC-States: Fine-grained GPU Datapath Power Management
free download

ABSTRACT To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big- Abstract:Spectral unmixing pursues the identification of spec-trally pure constituents, called endmembers, and their corresponding abundances in each pixel of a hyperspectral image. Most unmixing techniques have focused on the exploitation of spectral information alone.

GPU Acceleration and Interactive Visualization for Spatio-Temporal Networks
free download

Page 1. Author: Andrea Purgato GPU Acceleration and Interactive Visualization for Spatio-Temporal Network Committee: Angus Forbes Tanya Berger-Wolf Page 11. Andrea Purgato /42 GPU Programming Model (1/4) 11 GPU computational power increased

Implementing a GPU-based Machine Learning Library on Apache Spark
free download

As data storage becomes increasingly commoditized, companies are collecting transactional records on the order of several petabytes that are beyond the ability of typical database software tools to store and analyze. Analysis of this big data can yield business

Reducing Remote GPU Execution's Overhead with mrCUDA
free download

Background Our previous work [1] addressed the scattered idle-GPU problem in multi-GPU batch-queue systems, which cause the systems to have idle GPUs despite having jobs waiting. Our solution was to virtually consolidate unoccupied GPUs into some nodes using

PRISM-PSY: Precise GPU-Accelerated Parameter Synthesis for Stochastic Systems
free download

Abstract. In this paper we present PRISM-PSY, a novel tool that performs precise GPU- accelerated parameter synthesis for continuoustime Markov chains and time-bounded temporal logic specifications. We redesign, in terms of matrix-vector operations, the

Parallel Approaches to the String Matching Problem on the GPU
free download

Abstract: We design a family of parallel algorithms and GPU implementations for the exact string matching problem, based on Rabin-Karp (RK) randomized string matching. We describe and analyze three primary parallel approaches to binary string matching:

GPU implementation of the RRB-solver
free download

Ax= b(1.1) is considered, where A Rn× n is a large symmetric positive definite (SPD) pentadiagonal matrix, x Rn the solution vector, and b Rn the right-hand side (RHS) vector. Among other solution methods, see Figure 1, the Conjugate Gradient (CG)

GPU Computing and Its Applications
free download

Abstract-The graphics processing unit has become important part of today's mainstream computing system. GPU-accelerated computing is defined as the use of a graphics processing unit (GPU) together with a CPU (central processing unit). It developed in 2007

Improving mobile gaming performance through cooperative CPU-GPU thermal management
free download

ABSTRACT State-of-the-art thermal management techniques independently throttle the frequencies of high-performance multi-core CPU and powerful graphics processing units (GPU) on heterogeneous multiprocessor system-on-chips deployed in latest mobile

Building a Distributed, GPU based Machine Learning library
free download

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first

Building a Distributed, GPU-based Machine Learning Library
free download

As data storage becomes increasingly commoditized, companies are collecting transactional records on the order of several petabytes that are beyond the ability of typical database software tools to store and analyze. Analysis of this big data can yield business

Real-Time GPU-based Timing Channel Detection using Entropy
free download

Abstract: As line rates continue to grow, network security applications such as covert timing channel (CTC) detection must utilize new techniques for processing network flows in order to protect critical enterprise networks. GPU-based packet processing provides one means

A performance model and efficiency-based assignment of buffering strategies for automatic GPU stencil code generation
free download

Abstract:Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the

Improved hybrid/GPU algorithm for solving cardiac electro-physiology problems on Purkinje networks
free download

SUMMARY The cardiac Purkinje fibres provide an important stimulus to the coordinated contraction of the heart. We present a numerical algorithm for the solution of electrophysiology problems on the Purkinje network that is efficient enough to be used on

POSTER: Collective Dynamic Parallelism for Directive Based GPU Programming Languages and Compilers
free download

ABSTRACT Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel programming model, in which programs had to perform a sequence of kernel launches from the host CPU. In the latest releases of these devices, dynamic (or

A model-driven approach to warp/thread-block level GPU cache bypassing
free download

Abstract The high amount of memory requests from massive threads may easily cause cache contention and cache-miss-related resource congestion on GPUs. This paper proposes a simple yet effective performance model to estimate the impact of cache

Enabling GPU Virtualization in Cloud Environments
free download

Abstract: The use of accelerators, such as graphics processing units (GPUs), to reduce the execution time of computeintensive applications has become popular during the past few years. These devices increment the computational power of a node thanks to their parallel

Effective connectivity analysis in brain networks: a GPU-Accelerated Implementation of the Cox Method
free download

Abstract:The observation of interactions between neurons of a network can reveal important information about how information is processed within that network. Such observation can be established with the analysis of causality between the activities of the

A Fast Speckle Reduction Algorithm Based on GPU for Synthetic Aperture Sonar
free download

Abstract Synthetic aperture sonar (SAS) is a kind of high resolution imaging sonar, but speckle exists in SAS image for that SAS is a coherent imaging system, which makes it very difficult to visually and automatically interpret. In this paper, a fast speckle reduction

GPU Sharing for Image Processing in Embedded Real-Time Systems
free download

Abstract To more efficiently utilize graphics processing units (GPUs) when supporting real- time workloads, it may be beneficial to allow multiple tasks to issue GPU computations without blocking one another. For such an option to be viable, it is necessary to know the

Accelerating Genetic Algorithm Using General Purpose GPU and CUDA
free download

Abstract Genetic Algorithm (GA) is one of most popular swarm based evolutionary search algorithm that simulates natural phenomenon of genetic evolution for searching solution to arbitrary engineering problems. Although GAs are very effective in solving many practical

The Alea Reactive Dataflow System for GPU Parallelization
free download

Abstract The Alea reactive dataflow system represents a general, efficient, and memory-safe model for homogeneous programming of heterogeneous platforms. Programmers can describe computations as asynchronous dataflow graphs built from generic prefabricated

LVCSR System on a Hybrid GPU-CPU Embedded Platform for Real-Time Dialog Applications
free download

Abstract We present the implementation of a largevocabulary continuous speech recognition (LVCSR) system on NVIDIA's Tegra K1 hyprid GPU-CPU embedded platform. The system is trained on a standard 1000-hour corpus, LibriSpeech, features a trigram WFST-based

Modern Methods of Bundle Adjustment on the Gpu
free download

ABSTRACT: The task to compute 3D reconstructions from large amounts of data has become an active field of research within the last years. Based on an initial estimate provided by structure from motion, bundle adjustment seeks to find a solution that is

GPU-Based High Performance Password Recovery Technique for Hash Functions
free download

Due to the development of GPGPU (General Purpose Graphic Processing Unit) technology, GPU has been applied in many computation tasks as accelerators. In this paper, a new password recovery technique for the standardized hash functions, MD5 and SHA1, are

Towards Efficient Nonlinear Option Pricing with GPU Computing
free download

Nonlinear option pricing is a new approach for traders, hedge funds or banks to obtain more accurate option price and allow them to do fast model calibration using huge market data. The idea is to take into account nontrivial transaction costs, liquidity, market feedback and

DEFORMATION FINITE ELEMENT ANALYSIS ALGORITHM OF TURBINE BLADE BASED ON CPU+ GPU HETEROGENEOUS PARALLEL COMPUTATION
free download

Blade is one of the core components of turbine machinery. The reliability of blade is directly related to the normal operation of plant unit. However, with the increase of blade length and flow rate, non-linear effects such as finite deformation must be considered in strength

GPU-Based Multiple Back Propagation for Big Data Problems
free download

Abstract The big data era has become known for its abundance in rapidly generated data of varying formats and sizes. With this awareness, interest in data analytics and more specifically predictive analytics has received increased attention lately. However, the

Towards Agent-Based Simulation of Kernel P Systems using FLAME and FLAME GPU
free download

Abstract. This position paper discusses the challenges and opportunities of simulating kernel P systems (kP systems) using two powerful agent-based modelling frameworks on parallel architectures: FLAME (Flexible Large Scale Agent Modelling Environment) and

Erratum to: Fast Fingerprint Orientation Field Estimation Incorporating General Purpose GPU
free download

The second affiliation for author Ali Ismail Awad was published incorrectly. The correct affiliation is given below: Faculty of Engineering, Al Azhar University, Qena, Egypt. The online version

Opportunities for container environments on Cray XC30 with GPU devices
free download

Abstract Thanks to the significant popularity gained lately by Docker, the HPC community has recently started exploring container technology and potential benefits its use would bring to the users of supercomputing systems like the Cray XC series. In this paper, we

Building a Distributed, GPU-based Machine Learning Library
free download

As data storage becomes increasingly commoditized, companies are collecting transactional records on the order of several petabytes that are beyond the ability of typical database software tools to store and analyze. Analysis of this big data can yield business

GPU-based solution of nonlinear Maxwell's equations for inhomogeneous dispersive media
free download

A new method is based on Maxwell's equations coupled with time-varying electron density equation and includes the variety of nonlinear processes which take place under high- intensity ultrashort laser irradiation like Kerr effect, photoionization, avalanche and

Regular paper Implementation and Comparison of the Lifting 5/3 and 9/7 Algorithms in MatLab on GPU
free download

In order to accelerate the Discrete Wavelet Transform DWT, we have implemented and compared the lifting" Le Gall5/3" and" Cohen-Daubechies-Feauveau9/7"(CDF9/7) algorithms on a low cost NVIDIA's GPU. The suggested implementation is realized in

Granular Media Simulations on the GPU
free download

Abstract Numerical simulation of particulate materials using the discrete element method (DEM) is extremely important to many industrial processes with a wide range of applications such hopper flows in agriculture to tumbling mills in the mining industry. The DEM is

Novel Method to Improve ACO Performance on the GPU Using CUDA for Nurse Roster Scheduling Problem
free download

Abstract: This paper shows the accomplishment of parallel Ant Colony Optimization algorithm on the Graphics Processing Unit (GPU) to solve nurse roster scheduling problem (NRSP). We put on the Schedule formation and pheromone update phases of Ant colony

An Efficient GPU-Accelerated Implementation of Genomic Short Read Mapping with BWA-MEM
free download

ABSTRACT Next Generation Sequencing techniques have resulted in an exponential growth in the generation of genetics data, the amount of which will soon rival, if not overtake, other Big Data fields, such as astronomy and streaming video services. To become useful,

Design of a Dual-Warp Scheduler for Streaming Multi-Processors Based GP-GPU
free download

Abstract In this paper, a warp scheduler is proposed for the improvement of multi-core stream processor based GP-GPU performance. The proposed warp schedulers are divided into odd and even warps, which are issued separately by applying the dual-warp issue.

Simulating Peer to Peer Networks Using GPU High Perfomance Support
free download

Abstract:Peer-to-Peer networks are used by many applications to share resources between nodes. We have proposed a parallel version of a simulator for some aspects of a peerto-peer network performing file sharing. Being this analysis computationally

A Comparative Study on Exact Triangle Counting Algorithms on the GPU
free download

A Comparative Study on Exact Triangle Counting Algorithms on the {GPU}}, Proceedings of the 1st High

vFireLib: A Forest Fire Simulation Library Implemented on the GPU
free download

Abstract Forest fire simulation is a complex problem that requires an enormous amount of data processing. In order to operate a spread calculation in real-time, it becomes necessary to use parallel processing. Processing the spread calculations on the Graphics

Comparative Study of Computationally Intensive Algorithms on CPU and GPU
free download

Abstract This paper presents comparative study on one of the popular cryptographic algorithms AES algorithm, implemented using CUDA on GPU and on CPU. In present day scenario the AES algorithm suffers from very high CPU resource consumption, latency

Region-based memory management for expressive GPU programming
free download

Page 1. REGION-BASED MEMORY MANAGEMENT FOR EXPRESSIVE GPU PROGRAMMING Eric Holk give me the means and ability to accomplish this Ph.D. vii Page 8. Eric Holk REGION-BASED MEMORY MANAGEMENT FOR EXPRESSIVE GPU PROGRAMMING

Runtime Translation of the Java Bytecode to OpenCL and GPU Execution of the Resulted Code
free download

Abstract: Modern GPUs provide considerable computation power, often in the range of teraflops. By using open standards such as OpenCL, which provide an abstraction layer over the GPUs physical characteristics, these can be employed to solve general

Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
free download

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel

TIME PREDICTABILITY OF GPU KERNEL ON AN HSA COMPLIANT PLATFORM
free download

Abstract During recent years, the importance of utilizing more computational power in smaller computer systems has increased. The utilization of more computational power in smaller packages, the ability to combine more than one type of processor unit has become

Fast enclosure for matrix multiplication on a GPU
free download

Fast enclosure for matrix multiplication on a GPU Yusuke Morikura1), Yusuke Nozawa1), Kouta Sekine1), Masahide Kashiwagi1) and Shi'nichi Oishi1, 2) 1) Department of Applied Mathematics, Waseda University, 2) JST/CREST 1) 419 room Building 63, 3–4–1

Efficient Parallel Implementation of Single Source Shortest Path Algorithm onGPU Using CUDA
free download

Abstract In today s world there are number of applications like routing in telephone networks, traveller information system, robotic path selection etc., where data can be represented as a graph and different graph algorithms are executed on it to fulfil the

GPU performance modeling and optimization
free download

High computing capability is always in high demand, especially for modern emerging applications, such as physical, chemical and biological simulations, data mining, computational financing, highquality video processing, machine learning, big-data

Accelerating Fourier Descriptor for Image Recognition Using GPU
free download

Abstract: In the next few years, the rate of enhancement in GPUs (Graphics Processing Units) performance is expected to outshine that of CPUs (Central Processing Units), increasing the demand of the GPU as the processor chosen for image processing. In light

Harnessing GPU Computing Power to Improve Performance of SDN Controller
free download

Summary Software Defined Network (SDN) has shown substantial benefits over the legacy network and fueled the implementation of a variety of innovative and intelligent applications on SDN Controller. However, these applications put the performance of SDN controller

Real-Time Road Signs Recognition using Mobile GPU
free download

Abstract. This article shows an effective implementation of the algorithm for detection of road signs using video obtained by a camera installed in a vehicle. Road signs detection and recognition are implemented using CUDA and operate in real-time on a mobile GPU.

Topology-Aware GPU Selection on Multi-GPU Nodes
free download

Overlapping GPU communication with computation Highly overlapping GPU communication and computation is not always feasible–Leverage GPU hardware features (such as IPC) Improving GPU-to-GPU communication performance Only possible for

Fast Screen Space Curvature Estimation on GPU
free download

Abstract: Curvature is an important geometric property in computer graphics that provides information about the behavior of object surfaces. The exact curvature can only be calculated for a limited set of surfaces description. Most of the time, we deal with triangles,

GPU-based parallel method of temperature field analysis in a floor heater with a controller
free download

Abstract: A parallel method enabling acceleration of the numerical analysis of the transient temperature field in an air floor heating system is presented in this paper. An initial-boundary value problem of the heater regulated by an on/off controller is formulated. The analogue

SOP-GPU Documentation
free download

1.1 Method In the structure-based Self-Organized Polymer (SOP) model, each amino acid residue is usually represented by a single interaction center described by the corresponding C-atom, or two interaction centers described by the corresponding C and C atoms. In the

Investigating GPU-Accelerated Kernel Density Estimators for Join Selectivity Estimation
free download

Abstract Kernel Density Estimators are a well-known tool from statistics to estimate probability density functions based on samples drawn from an unknown distribution. The estimate is provided by centering local probability density functions-so called kernels-on

Adaptive GPU Array Layout Auto-Tuning
free download

ABSTRACT Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware,target for autotuning approaches. We

Enabling predictable parallelism in single-GPU systems with persistent CUDA threads
free download

Abstract:Graphics Processing Unit, or GPUs, have been successfully adopted both for graphic computation in 3D applications, and for general purpose application (GP-GPUs), thank to their tremendous performance-per-watt. Recently, there is a big interest in

Modified Levels of Parallel Odd-Even Transposition Sorting Network (OETSN) with GPU Computing using CUDA
free download

ABSTRACT Sorting huge data requires an enormous amount of time. The time needed for this task can be minimised using parallel processing devices like GPU. The odd-even transposition sorting network algorithm is based on the idea that each level uses an equal

A GPU IMPLEMENTATION OF THE FINITE-DIFFERENCE TIME-DOMAIN METHOD
free download

Advisor: Professor Marc Christensen Master of Science degree conferred August 1, 2016 Thesis completed August 1, 2016 Traditionally, optical circuit design is tested and validated using software which implement numerical modeling techniques such as Beam

GPU Parallel Implementation of Numerical Distribution Functions for Seasonal Unit Root Tests
free download

Abstract This paper describes a parallel implementation of multiple linear regression models that are run on a general-purpose Graphics Processing Unit. Seasonal unit root test statistics are obtained from each fitted regression model. The code has two main applications:

GPU ACCELERATING SUPER-RESOLUTION FOR CONVERTING HD TO 4K
free download

Background Audience expects high resolution videos/images to enjoy high quality visual experience. Currently the video content providers have no many true 4K video contents. In contrast, there are a large amount of vides/films on the resolution of 1080p in market. Thus

User-Defined Drag Models on the GPU
free download

Conclusions Adding the capability for user-defined functions to be evaluated at runtime on the GPU is feasible. Although performance degrades with function complexity, overall, the performance is useful for providing flexibility in a commercial software product. Effort has

A 3D reconstruction from real-time stereoscopic images using GPU
free download

Abstract:In this article we propose a new technique to obtain a three-dimensional (3D) reconstruction from stereoscopic images taken by a stereoscopic system in real-time. To parallelize the 3D reconstruction we propose a method that uses a Graphics Processors

GPU-Based Fast Signal Processing for Large Amounts of Snore Sound Data
free download

Abstract:Snore sound (SnS) data has been demonstrated to carry very important information for diagnosis and evaluation of sleep related breathing disorders with high prevalence, such as Primary Snoring and Obstructive Sleep Apnea (OSA)–a serious

GPU Acceleration of non-iterative and iterative algorithms in Fluorescence Lifetime Imaging Microscopy
free download

1. Summary Graphics Processing Unit (GPU) enhanced Fluorescence Lifetime Imaging Microscopy (FLIM) algorithms are presented, and their results are compared with the latest research results. The GPU based approaches are suitable for highly parallelized sensor

Finite Element Modelling of a Novel Cutting Deformation Mode of AA6061-T6 Tubes Employing Higher Order Element Formulations and GPU Computing
free download

Abstract Axial cutting of lightweight metal extrusions is a promising mechanism for impact energy absorption due to the combination of high force efficiency, specific energy absorption, and rate insensitivity. A wide range of force-deflection responses are possible

GVDB: Raytracing Sparse Voxel Database Structures on the GPU
free download

Abstract Simulation and rendering of sparse volumetric data have different constraints and solutions depending on the application area. Generating precise simulations and understanding very large data are problems in scientific visualization, whereas convincing

Iris Recognition for Secured Internet Banking Using CUDA on GPU
free download

Abstract:Iris continues to become one of the emerging methods of biometric-based identification systems as the need for security system keeps on increasing day-by-day. With a few modifications, this project explains the iris recognition systems developed by John

Parallel Monte-Carlo Simulations on GPU and Xeon Phi for Stratospheric Balloon Envelope Drift Descent Analysis
free download

Abstract:A performance evaluation of parallel Monte-Carlo simulations on GPU and MIC is presented and the application to stratospheric balloon envelope drift descent is considered. The experiments show that GPU and MIC permit one to decrease computing time by a

GpSense: A GPU-friendly method for common-sense subgraph matching in massively parallel architectures
free download

Abstract. In the context of common-sense reasoning, spreading activation is used to select relevant concepts in a graph of common-sense knowledge. When such a graph starts growing, however, the number of relevant concepts selected during spreading activation

Literature Survey on GPU Accelerated Circuit Simulation
free download

Abstract:Analysis, Testing and validation of electronic circuit is very crucial in industries of electronics and embedded systems. Instead of actual hardware testing, simulation software is used for this purpose. Very large circuit design like VLSI testing, affects speed and

Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
free download

Abstract Sparse matrix-vector multiplication (SpMV) is an important operation in computational science, and needs be accelerated because it often represents the dominant cost in many widely-used iterative methods and eigenvalue problems. We achieve this

Computer Vision on the GPU–Tools, Algorithms and Frameworks
free download

Abstract:In recent years, graphic processing units (GPUs) have emerged as an attractive alternative to CPUs for implementing algorithms in a wide range of applications. The focus of this work is to give an overview about the current state on using GPUs for computer vision.

Force Directed Placement: GPU Implementation
free download

Abstract: Graph layout has had important applications in many areas of computer science. When dealing with machine generated data, we often tend to see the data to have better understanding of a structural form. Many such data can be represented in the form of

LARGE-SCALE FREE-SURFACE FLOW SIMULATION USING LATTICE BOLTZMANN METHOD ON MULTI-GPU CLUSTERS
free download

Abstract. Turbulent free-surface flows around ship strongly affect maneuverability and safety. In order to understand the details of the turbulent flow and surface deformation, it is necessary to carry out high-order accurate and large-scale CFD simulations. We have

A Survey on Parallel Processing in a CPU-GPU Collaborative Environment Using Ant Colony Optimization, Artificial Neural Networks Genetic Algorithm
free download

Abstract-The purpose of this paper is to present a survey on various papers that shows various aspects of ACO, ANN, GA and their respective strategies that can be applied on a CPUGPU collaborative environment making use of the concept of parallel processing. The

GPU-Accelerated Real-Time Mesh Simplification Using Parallel Half Edge Collapses
free download

Abstract. Mesh simplification is often used to create an approximation of a model that requires less processing time. We present the results of our approach to simplification, the parallel half edge collapse. Based on the half edge collapse that replaces an edge with

Synchronization Trade-offs in GPU implementations of Graph Algorithms
free download

Abstract:Although there is an extensive literature on GPU implementations of graph algorithms, we do not yet have a clear understanding of how implementation choices impact performance. As a step towards this goal, we studied how the choice of synchronization

Parametric Multi-Step Scheme for GPU-Accelerated Graph Decomposition into Strongly Connected Components
free download

Abstract. The problem of decomposing a directed graph into strongly connected components (SCCs) is a fundamental graph problem that is inherently present in many scientific and commercial applications. Clearly, there is a strong need for good high-performance, eg,

Parallel Processing of Ray Tracing on GPU with Dynamic Pipelining
free download

bstract This article describes the technologies of parallel processing of ray tracing using central processing unit (CPU) and graphics processing unit (GPU). The one problem of parallel processing of ray tracing is imbalance among the pixels computation which leads

Implementation of the Particle Mesh Ewald method on a GPU
free download

Abstract The Particle Mesh Ewald (PME) method is used for efficient longrange electrostatic calculations in molecular dynamics (MD). In this project, PME is implemented for a single GPU alongside the existing CPU implementation, using the code base of an open source

Automated Verification of Functional Correctness of Race-Free GPU Programs
free download

Abstract. We study an automated verification method for functional correctness of parallel programs running on GPUs. Our method is based on Kojima and Igarashi's Hoare logic for GPU programs. Our algorithm generates verification conditions (VCs) from a program

A Two-layered Parallel Static Security Assessment for Large-Scale Grids Based onGPU
free download

Abstract:The development of smart grid and the increasing scale of power system brings more and more pressure to the conventional power system simulators. The graphic processing unit which features the massive concurrent threads and excellent floating point

Transform-Based Channel-Data Compression to Improve the Performance of a Real-Time GPU-Based Software Beamformer
free download

Abstract:Graphics processing unit (GPU)-based software beamforming has advantages over hardware-based beamforming of easier programmability and a faster design cycle since complicated imaging algorithms can be efficiently programmed and modified.

Accelerating Rabin Karp algorithm on a multicore GPU using CUDA
free download

Abstract String Matching algorithms are responsible for finding occurrences of a pattern within a large text. Many areas of Computer Science require demanding string-matching procedures. Increasing the efficiency of string matching algorithm will automatically

Using GPU for Interactive Ray Casting 3D Models Based on Perturbation Functions
free download

Abstract: This paper deals with the interactive ray casting of high-quality images, a method of defining free forms without approximating them with polygons or patches, issues of using perturbation functions for animation of the surfaces of 3D objects. A method for visualizing

GPU-Based Monte-Carlo Simulation for a Sea Ice Load Application
free download

ABSTRACT High Performance Computing (HPC) has recently been considerably improved, for instance General Purpose computation on Graphics Processing Units (GPGPU) has been developed to accelerate parallel computing by using hundreds of cores

Characterizing Performance and Power towards Efficient Synchronization ofGPU Kernels
free download

Abstract:There is a lack of support for explicit synchronization in GPUs between the streaming multiprocessors (SMs) adversely impacts the performance of the GPUs to efficiently perform inter-block communication. In this paper, we present several

Executing Database Query on Modern CPU-GPU Heterogeneous Architecture
free download

Abstract Graphics processor (GPU) have emerged as a powerful co-processor for general- purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. The execution time of database

GPU accelerated spectral finite elements on all-hex meshes
free download

Recent research efforts [1] have led to the development of 3D hex-dominant mesh generation systems that are fast and reliable. It is now possible (eg with Gmsh [2]) to create meshes of general 3D domains that contain over 80% of hexahedra in volume in a fully