GPU-Graphics Processing Unit IEEE PAPER 2017
APUNet: Revitalizing GPU as Packet Processing Accelerator.
free download
Abstract Many research works have recently experimented with GPU to accelerate packet processing in network applications. Most works have shown that GPU brings a significant performance boost when it is compared to the CPU-only approach, thanks to its highly-
Compiler techniques to reduce the synchronization overhead of gpu redundant multithreading
free download
ABSTRACT Redundant Multi-Threading (RMT) provides a potentially low cost mechanism to increase GPU reliability by replicating computation at the thread level. Prior work has shown that RMTs high performance overhead stems not only from executing redundant threads,
GPU Taint Tracking
free download
Without address space layout randomization, an attacker can predict where GPU data is stored.[Patterson, ISU thesis 2013]Without process isolation, an attacker can peek into another GPU process, steal encryption keys.[Pietro+, TECS 2016]Without page protection
Gravel: Fine-Grain GPU-Initiated Network Messages
free download
ABSTRACT Distributed systems incorporate GPUs because they provide massive parallelism in an energy-efficient manner. Unfortunately, existing programming models make it difficult to route a GPU-initiated network message. The traditional coprocessor model
Analyzing memory management methods on integrated CPU-GPU systems
free download
Abstract Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. On these systems, both the CPU and GPU share the same physical memory as opposed to using separate memory dies. Although integration eliminates the need to
GPU Multisplit: an extended study of a parallel algorithm
free download
1This paper is an extended version of initial results published at PPoPP 2016 [3]. The source code is available at https://github. com/owensgroup/GpuMultisplit. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted
Moving Object Detection by Connected Component Labeling of Point Cloud Registration Outliers on the GPU.
free download
Abstract: Using a depth camera, the KinectFusion with Moving Objects Tracking (KinFu MOT) algorithm permits tracking the camera poses and building a dense 3D reconstruction of the environment which can also contain moving objects. The GPU processing pipeline
A GPU deep learning metaheuristic based model for time series forecasting
free download
Abstract As the new generation of smart sensors is evolving towards high sampling acquisitions systems, the amount of information to be handled by learning algorithms has been increasing. The Graphics Processing Unit (GPU) architecture provides a greener
Computational Power Optimization of 3D Virtual Audio Techniques on DSP and GPU Platforms
free download
3D virtual audio techniques are used in many spatial audio applications such as home theater entertainment, gaming, teleconference and remote control. With binaural loudspeakers, these techniques are able to offer virtual surround sound effects to the listener
GPU Based Face Recognition System for Authentication
free download
ABSTRACT-Face has significant role in identifying a person for authentication purpose in public places such as airport security. Face recognition has many real world applications including surveillance and authentication. Due to complex and multidimensional structure of
Computing delaunay refinement using the GPU.
free download
Abstract We propose the first working GPU algorithm for the 2D Delaunay refinement problem. Our algorithm adds Steiner points to an input planar straight line graph (PSLG) to generate a constrained Delaunay mesh with triangles having no angle smaller than an input
Statistical Pattern Based Modeling of GPU Memory Access Streams
free download
ABSTRACT Recent research studies have shown that modern GPU performance is often limited by the memory system performance. Optimizing memory hierarchy performance requires GPU designers to draw design insights based on the cachememory behavior of
Energy Efficient Real-time Task Scheduling on CPU-GPU Hybrid Clusters
free download
AbstractConserving the energy consumption of large data centers is of critical significance, where a few percent in consumption reduction translates into millions-dollar savings. This work studies energy conservation on emerging CPU-GPU hybrid clusters
Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor
free download
GPU Parallel Program for the Bin Packing Problem
free download
The purpose of this paper is to explore the use of GPU computing for solving the famous bin packing problem. Specifically, a massively parallel seesaw search program was constructed using Nvidias CUDA API and the Parallel Java 2 Library. The speed and quality of the
High-Throughput Subset Matching on Commodity GPU-Based Systems
free download
Abstract Large-scale information processing often relies on subset matching for data classification and routing. Examples are publish/subscribe and stream processing systems, database systems, social media, and information-centric networking. For instance, an
Strategies for Regular Segmented Reductions on GPU
free download
Abstract We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby AbstractIn this contribution, an advanced numerical regression approach based on graphics processing unit (GPU) is introduced. The approach has been applied for real-time terahertz thickness measurements of individual layers within multi-layered structures for a
Parallel continuous collision detection for high-performance GPU cluster.
free download
Abstract Continuous collision detection (CCD) is a process to interpolate the trajectory of polygons and detect collisions between successive time steps. However, primitive-level CCD is a very time-consuming process especially for a large number of moving polygons.
Response-Time Bounds for Concurrent GPU Scheduling
free download
AbstractGraphics processing units (GPUs) have been receiving increasing attention in the real-time systems community as a potential solution for hosting workloads like those found in autonomous-driving use cases that require significant computational capacity. Allowing
Mini-Gunrock: A Lightweight Graph Analytics Framework on the GPU
free download
Abstract: Existing GPU graph analytics frameworks are typically built from specialized, bottom-up implementations of graph operators that are customized to graph computation. In this work we describe Mini-Gunrock, a lightweight graph analytics framework on the GPU.
CPU and GPU Behaviour Modelling Versus Sequential and Parallel Bias Field Correction Fuzzy C-means Algorithm Implementations
free download
Abstract The correction of images corrupted by bias field artefact is still challenging task both at accuracy level as on the computational plane. The work in this paper focus on the second constraint by giving mathematical models of experimental execution time per iteration ETPI
Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion.
free download
Page 1. Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion Minghua Shen and Guojie Luo FPGA-February 23, 2017 Peking University 1 Page 2.Motivation BackgroundSearch Space Reduction for Routing(4) (4) (3) (3) Page 9. Dynamic Parallelism
MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability
free download
ABSTRACT Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moores law slows down, and the number of transistors per die no longer grows at historical rates, the performance curve of single
gNUFFTW: Auto-Tuning for High-Performance GPU-Accelerated Non-Uniform Fast Fourier Transforms
free download
AbstractNon-uniform sampling of the Fourier transform appears in many important applications such as magnetic resonance imaging (MRI), optics, tomography and radio interferometry. Computing the inverse often requires fast application of the non-uniform
Cooperative kernels: GPU multitasking for blocking algorithms
free download
ABSTRACT There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (eg OpenCL) do not mandate fair scheduling, and GPU schedulers
Simulations of Coherent Synchrotron Radiation on Parallel Hybrid GPU/CPU Platform
free download
Abstract Coherent synchrotron radiation (CSR) is an effect of selfinteraction of an electron bunch as it traverses a curved path. It can cause a significant emittance degradation, as well as fragmentation and microbunching. Numerical simulations of the 2D/3D CSR effects have
Continuous and discrete models of melanoma progression simulated in multi-GPU environment
free download
Abstract. Existing computational models of cancer evolution mostly represent very general approaches for studying tumor dynamics in a homogeneous tissue. Here we present two very different models, continuous and discrete ones, of a specific cancer type melanoma
Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views.
free download
Abstract In this paper, we advocate a composable approach to programming systems with Graphics Processing Units (GPU): programs are developed as compositions of generic, reusable patterns. Current GPU programming approaches either rely on low-level,
Software Puzzle for GPU Inflated DoS Attack
free download
Abstract: Denial-of-service (DoS) and distributed DoS (DDoS) are among the major threats to cyber-security, and client puzzle, which demands a client to perform computationally expensive operations before being granted services from a server, is a well-known
Modular array-based GPU computing in a dynamically-typed language
free download
Abstract Nowadays, GPU accelerators are widely used in areas with large data-parallel computations such as scientific computations or neural networks. Programmers can either write code in low-level CUDA/OpenCL code or use a GPU extension for a high-level
Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation.
free download
Abstract Computer systems are increasingly featuring powerful parallel devices with the advent of many-core CPUs and GPUs. This offers the opportunity to solve computationally- intensive problems at a fraction of the time traditional CPUs need. However, exploiting
Towards Efficient Graph Traversal using a Multi-GPU Cluster
free download
AbstractGraph1 processing has always been a challenge, as there are inherent complexities in it. These include scalability to larger data sets and clusters, dependencies between vertices in the graph, irregular memory accesses during processing and traversals,
Detecting Bank Conflict of GPU Programs Using Symbolic ExecutionCase Study
free download
Abstract GPU (Graphics Processing Unit) is used in various areas. Therefore, the demand for the verification of GPU programs is increasing. In this paper, we suggest the method to detect bank conflict by using symbolic execution. Bank conflict is one of the bugs happening
GPU-GIST a case of generalized database indexing on modern hardware
free download
Abstract: A lot of different indexes have been developed for accelerating search operations on large data sets. Search trees, representing the most prominent class, are ubiquitous in database management systems but are also widely used in non-DBMS applications. An
A GPU Variant of Mbtrack and Its Application in SLS-2
free download
Abstract Mbtrack is a widely used multi-bunch tracking code for modeling collective instabilities in electron storage rings. It has been applied to the Swiss Light Source upgrade proposal (SLS-2) for the study of single bunch instabilities. However, an n-bunch simulation
Visual Analytics of Millions of GPU Threads
free download
Abstract Although the GPGPU has been widely used in various fields for algorithms acceleration, it is notorious for its programming difficulties because of many different concepts from general CPU programming and issues from the huge number of concurrent
Study of Parallel Image Processing with the Implementation of vHGW Algorithm using CUDA on NVIDIAS GPU Framework
free download
Abstract-This paper provides an effective study of the implementation of parallel image processing techniques using CUDA on NVIDIA GPU framework. It also discusses about the major requirements of parallelism in medical image processing techniques. Additional
Evaluation Of The Performance Of GPU Global Memory Coalescing
free download
AbstractNowadays, GPU is widely used for graphics and general-purpose parallel computations. In the GPU software development, memory coalescing is one of the most important optimization techniques, which reduces the number of memory transactions. In this
GPU-accelerated Video Transcoding Unit for Multi-access Edge Computing Scenarios
free download
AbstractThe exponential growth of video traffic and the outburst of novel video-based services is revealing the inadequacy of the traditional mobile network infrastructure. To respond to this and to many other demands coming from todays society, the 5G and the
Large Integer Arithmetic in GPU for Cryptography
free download
ABSTRACT Most computer nowadays support 32 bits or 64 bits of data type on various type of programming languages and they are sufficient for most use cases. However, in cryptography, the required range and precision are more than 64 bits which are The Raspberry Pi was created to meet a need to help younger people become involved in the IT field. As a low-cost computer, it can be used, experimented with, broken, and replaced. Initially expected to sell perhaps a few thousand, it has now sold more than 10
GPU-Centered Font Rendering Directly from Glyph Outlines
free download
Abstract This paper describes a method for rendering antialiased text directly from glyph outline data on the GPU without the use of any precomputed texture images or distance fields. This capability is valuable for text displayed inside a 3D scene because, in addition to
ANALYSIS OF RAY BATCHING ON THE GPU
free download
Abstract Due to the large amount of scene data in production renderers that use Monte Carlo techniques, efficient and fast ray tracing means batching up rays in some arbitrary amount. Hyperion, Disneys renderer, pools 33 million rays for any scene to be rendered, and this
Real-time 3D integral imaging system using a faster elemental image generation method using GPU parallel processing
free download
A novel method of faster computation of Elemental Image generation for real time integral imaging 3D display system, with the implementation of GPU parallel processing is proposed. Previous experiments were conducted to generate Real Time Integral Image and resulting
GPU Parallelization of Back-Propagation Neural Network
free download
Abstract: Graphics Processing Unit (GPU) can provide remarkable performance gains when compared to Central Processing Unit (CPU) for computational intensive application. GPU has acquired programmability to perform general purpose computation fast by running ten
INTELLIGENT SCHEDULING FOR SIMULTANEOUS CPU-GPU APPLICATIONS
free download
ABSTRACT Heterogeneous computing systems with both general purpose multicore central processing units (CPU) and specialized accelerators has emerged recently. Graphics processing unit (GPU) is the most widely used accelerator. To fully utilize such a
Accelerating GPU Hardware Transactional Memory with Snapshot Isolation
free download
ABSTRACT Snapshot Isolation (SI) is an established model in the database community, which permits write-read conflicts to pass and aborts transactions only on write-write conflicts. With the Write Skew anomaly correctly eliminated, SI can reduce the occurrence of
Inferring Scheduling Policies of an Embedded CUDA GPU
free download
Abstract Embedded systems augmented with graphics processing units (GPUs) are seeing increased use in safety-critical real-time systems such as autonomous vehicles. Due to monetary cost requirements along with size, weight, and power (SWaP) constraints,
Evaluating a CPU/GPU Implementation for Real-Time Ray Tracing
free download
Abstract Animated movies, CGI, and video games are commonplace in every day life. Virtual- and Augmented Reality are becoming more pervasive in society, and with them, the role of Computer Graphics becomes even more important. Part of creating these experiences is
Parallel Execution Optimization of GPU-aware Components in Embedded Systems
free download
AbstractMany embedded systems process huge amount of data that comes from the interaction with the environment. The Graphics Processing Unit (GPU) is a modern embedded solution that tackles the efficiency challenge when processing a lot of data. GPU
A unified GPU-CPU aeroelastic compressible URANS solver for aeronautical, turbomachinery and open rotors applications
free download
English abstract: For the aerodynamic design of aeronautical components Computational Fluid Dynamics (CFD) plays a fundamental role. Pure CFD analyses are usually sufficiently accurate for a wide range of problems. However, when the deformability of the structure
A Survey of Power Consumption Modeling for GPU Architecture
free download
Abstract: GPUs are of increasing interests in the multi-core era due to their high computing power. However, the power consumption caused by the rising performance of GPUs has been a general concern. As a consequence, it is becoming an imperative demand to
GPU-Based Acceleration for 3D OCT Imaging
free download
ABSTRACT We designed a graphics processing unit (GPU)-based acceleration to reconstruct the optical coherence tomography (OCT) images as sub-micrometer resolution with the spectral domain OCT (SD-OCT) system. GPU-based acceleration is the use of
RLAGPU: High-performance Out-of-Core Randomized Singular Value Decomposition on GPU
free download
Randomized Singular Value Decomposition (SVD)[1] is gaining attention in finding structure in scientific data. However, processing large-scale data is not easy due to the limited capacity of GPU memory. To deal with this issue, we propose RLAGPU, an out-of-core
Development of GPU-based fast reconstruction algorithm for Gamma ray imaging with insufficient conditions
free download
The purpose of this study is to develop a graphic processing unit (GPU)-based fast reconstruction algorithm for nuclear medicine image under insufficient conditions, and verification of the developed algorithm is carried out to achieve the purpose. Simple-pattern
GPU Scripting using PyCUDA
free download
Page 1. GPU Scripting using PyCUDABy Kushagra Trivedi Page 2. CONTENT AND THE LEARNING PROCESSINTRODUCTION OF PyCUDAPage 3. INTRODUCTIONPyCUDA is package that is available for python to use the power of CUDA compatible GPU processor.
Exposing Hidden Performance Opportunities in High Performance GPU Applications
free download
AbstractThe emergence of leadership class systems with nodes containing many-core accelerators, such as GPUs, has the potential to vastly increase the performance of distributed applications. Exploiting the additional parallelism that manycore accelerators
Efficient Semantic Search over Structured Web Data: A GPU Approach
free download
Abstract. Semantic search is an advanced topic in information retrieval which has attracted increasing attention in recent years. The growing availability of structured semantic data offers opportunities for semantic search engines, which can support more expressive
GPU-Accelerated SVM Training Algorithm Based on PC and Mobile Device
free download
(Support Vector Machine) which is suitable for Android operating system. SVM is widely used in the health-related applications. The SVM provides a potential classification technology based on the pattern recognition method and statistical learning theory. This
GPU accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model (version 2.52)
free download
Abstract. This paper presents an application of GPU accelerators in Earth system modelling. We focus on atmospheric chemical kinetics, one of the most computationally intensive tasks in climate-chemistry model simulations. We developed a software package that
Patch-Based Recursive Catmull-Clark Subdivision on the GPU
free download
Abstract Catmull-Clark subdivision is an algorithm that takes a coarse mesh of a 3D model as input and outputs a smooth mesh. It has many different applications from level of detail rendering to feature film production. Starting from the coarse control mesh a series of
DeepSpotCloud: Leveraging Cross-Region GPU Spot Instances for Deep Learning
free download
AbstractCloud computing resources that are equipped with GPU devices are widely used for applications that require extensive parallelism, such as deep learning. When the demand of cloud computing instance is low, the surplus of resources is provided at a lower price in
Enabling Asynchronous Coupled Data Intensive Analysis Workflows on GPU-accelerated Platforms via Data Staging
free download
ABSTRACT Enabled by the advanced network techniques as In niband and RDMA, data staging and in-situ/in-transit techniques are emerging as an a ractive approach for large scale data intensive workows. At the same time, accelerator based heterogeneous platforms
NUFFT: Fast Auto-Tuned GPU-Based Library
free download
Synopsis We present a fast auto-tuned library for computing non-uniform fast Fourier Transform (NUFFT) on GPU. The library includes forward and adjoint NUFFT using precomputation-free and fully-precomputed methods, as well as Toeplitz-based operation
Technical report: Crane-Fast and Migratable GPU Passthrough for OpenCL applications
free download
ABSTRACT General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing AbstractThe increasing need for computing power today justifies the continuous search for techniques that decrease the time to answer usual computational problems. To take advantage of new hybrid parallel architectures composed by multithreading and
Dynamic performance prediction for chunk-wise parallelization on heterogeneous CPU/GPU systems
free download
Abstract-Many aspects of heterogeneity in multicores such as performance variation may affect the overall execution time and cores efficiency. An effective mapping should support this variation. A complex challenge is cores load balancing to minimize the program
Accelerate Local Tone Mapping for High Dynamic Range Images Using OpenCL with GPU
free download
Abstract--Tone mapping has been used to transfer HDR (high dynamic range) images to low dynamic range. This paper describes an algorithm to display high dynamic range images. Although local tone-mapping operator is better than global operator in reproducing images
A modular GPU raytracer using OpenCL for non-interactive graphics
free download
ABSTRACT We describe the development of a modular plugin based raytracer renderer called RenderGirl suitable for running inside the OpenCL framework. We aim to take advantage of heterogeneous computing devices such as GPUs and many-core CPUs,
CUDA Optimized dynamic programming search for automatic speech recognition on a GPU platform
free download
Abstract-In a typical recognition process, there are substantial parallelization challenges in concurrently assessing thousands of alternative interpretations of a speech utterance to find the most probable interpretation. During this process, input signals are converted into
GPU accelerated investigation of a dual-frequency driven nonlinear oscillator
free download
Summary. The bifurcation structure of a dual-frequency driven, second order nonlinear oscillator (Keller Miksis equation) is investigated by exploiting the high computational resources of professional GPUs. The numerical scheme of the applied initial value problem
GPU Based Text Analytics
free download
This is the documentation for the ProjectGPU Based Text Analyticsof the Webis group. In this project we installed, configured and tested a new deep learning cluster for the group. The second part was to use the new cluster with deep learning software to get familiar with
GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed
free download
Abstract The push towards fielding autonomous-driving capabilities in vehicles is happening at breakneck speed. Semi-autonomous features are becoming increasingly common, and fully autonomous vehicles are optimistically forecast to be widely available in just a few
CUDA compatible GPU as an efficient hardware accelerator for Automatic Subtitle Generation
free download
Abstract: As avid audiences, we always face the need to find the right subtitle file for a particular video or audio file and these subtitles can be very helpful for deaf and hearing impaired persons as it allows them to perceive acoustic information in an alternative way.
GPU computations and memory access model based on Petri nets
free download
Abstract. In modern systems CPUs as well as GPUs are equipped with multi-level memory architectures, where different levels of the hierarchy vary in latency and capacity. Therefore, various memory access models were studied. Such a model can be seen as an interface
Reducing GPU Address Translation Overhead with Virtual Caching
free download
ABSTRACT Heterogeneous computing on tightly-integrated CPU-GPU systems is ubiquitous, and to increase programmability, many of these systems support virtual address accesses from GPU hardware. However, there is no free lunch. Supporting virtual memory
Computation of Synchrotron Radiation on Arbitrary Geometries in 3D with Modern GPU, Multi-Core, and Grid Computing
free download
Abstract Open Source Code for Advanced Radiation Simulation (OSCARS) is an open source project developed at Brookhaven National Laboratory for the computation of synchrotron radiation from arbitrary particle beams in arbitrary magnetic (and electric) fields
REGION GROWING IMAGE SEGMENTATION ON LARGE DATASETS USING GPU
free download
ABSTRACT Image segmentation is an important image processing, and it seems everywhere if we want to analyze what inside the image. There are varieties of applications of image segmentation such as the field of filtering noise from image, medical imaging, and
GPU Simulations of Violent Flows with Smooth Particle Hydrodynamics (SPH) Method
free download
Abstract Graphics processing unit (GPU) accelerated supercomputers have proved to be very powerful and energy effective for to accelerate the compute intensive applications and become the new standard for high performance computing (HPC) and a critical ingredient in CSE PROJECTS