Swizzle Inventor: Data Movement Synthesis for GPU Kernels
Utilizing memory and register bandwidth in modern architectures may require swizzles non- trivial mappings of data and computations onto hardware resources such as shuffles. We develop Swizzle Inventor to help programmers implement swizzle programs, by writing

GPU Accelerated Computation of Isotropic Chemical Shifts Offers New Dimension of Structure Refinement in Largescale Molecular Dynamics Simulation
Remaining components of the predictive model saw similar individual speedups through profiling. Implementing the newly optimized functional core into NAMD is NMRForces, a

Visbrain: A multi-purpose GPU -accelerated open-source suite for multimodal brain data visualization
We present Visbrain, a Python open-source package that offers a comprehensive visualization suite for neuroimaging and electrophysiological brain data. Visbrain consists of two levels of abstraction:(1) objects which represent highly configurable neurooriented

Scatter Correction for Industrial Cone-Beam Computed Tomography (CBCT) Using 3D VSHARP, a fast GPU -Based Linear Boltzmann Transport Equation
David Nisius1 1Varex Imaging Corporation, 2Varian Medical Systems February 13, 2019 Keywords: Cone-Beam Computed Tomography (CBCT), Scatter Correction, Finite-Element Boltzmann Transport Equation Solver Scatter Correction for Industrial Cone-Beam Computed Tomography The book was inadvertently published with chapter authors incorrect family name. This information has been updated from Pawan Kumar Updhyay to Pawan Kumar Upadhyay in the initially published version of chapter 25 The updated version of this chapter can be found at

Architecting Waferscale Processors-A GPU Case Study
Increasing communication overheads are already threatening computer system scaling. One approach to dramatically reduce communication overheads is waferscale processing. However, waferscale processors , , have been historically deemed impractical due to

Performant Anomaly Based Network Intrusion Detection Systems Using GPU
Networks have been exposed to attack classes such as probe, denial of service, root to local and user to root with increasing connectivity and exposure to the internet. Network Intrusion Detection Systems (NIDSs) alongside firewall are being widely used to identify and minimize

Cutter Engagement Feature Extraction Using Triple-Dexel Representation Workpiece Model and GPU Parallel Processing Function
For an accurate analysis of the cutting force, the cutter engagement feature (CEF) representing the contact area between the cutter and workpiece must be extracted for each small feed motion of the cutter. We previously proposed a method for accelerating the CEF

EraseMe: A Defense Mechanism against Information Leakage exploiting GPU Memory
ABSTRACT Graphics Processing Units ( GPU ) play a major role in speeding up computational tasks of the users, especially in applications such as high volume text and image processing. Recent works have demonstrated the security problems associated with

GPU -accelerated fixpoint algorithms for faster compiler analyses
Inter-procedural data-flow analyses are slow. We parallelize these predicate propagation fixpoint algorithms efficiently on a GPU . Our approach is (mostly) synchronization free even though the processed graphs in general are cyclic and have nodes with fan-in and fan-out

Matrix-factorization-(MF-) based collaborative filtering (CF) is known to be an effective approach to recommendation, which has been widely used in many recommender systems. Stochastic gradient descent (SGD) is one of the most popular algorithms for solving MF

Parallel Batch Dynamic Single Source Shortest Path Algorithm and Its Implementation on GPU based Machine
In this fast changing and uncertain world, to meet the users requirements the computer applications based on real world data always try to give responses in the minimum possible time. Single Source Shortest Path (SSSP) calculation is a basic requirement of applications

Architectural Support for Efficient GPU Multiprogramming.
ABSTRACT LIN, ZHEN. Architectural Support for Efficient GPU Multiprogramming. Graphics processing units (GPUs) have become the most prevalent accelerator in high-performance computing. Since more and more applications

Optimization of Building Patterns for Better Air Quality Using GPU -Based Large Eddy Simulation
Several earlier studies (eg [1 2]) pointed out the advantages of Large Eddy Simulation (LES) in modelling microscale dispersion, however, due to its high computational cost, most LES based studies focus on a single geometrical configuration without investigating the

KNN-Joins Using a Hybrid Approach: Exploiting CPU/ GPU Workload Characteristics
ABSTRACT K Nearest Neighbor (KNN) joins are used in many scientific domains for data analysis, and are building blocks of several well-known algorithms. KNN-joins find the KNN of all points in a dataset. However, KNN searches are computationally expensive, and many

GPU Accelerated Sparse Representation of Light Fields
We present a method for GPU accelerated compression of light fields. The approach is by using a dictionary learning framework for compression of light field images. The large amount of data storage by capturing light fields is a challenge to compress and we seek to

GPU CFD Applications
The purpose of this report is to review the potential application of advanced computing in the oil and gas industry such as simulation of Flows (laminar, turbulent, two phase etc.). Fluid mechanics are a vital part in the oil and gas application. A division of fluid mechanics that

GPU parallel Grad-Shafranov solver for real-time equilibrium re-construction
To achieve real-time control of tokamak plasmas, the equilibrium reconstruction have to be completed rapidly enough. For EAST experiment case, real-time equilibrium reconstruction is generally required to provide results within 1ms. A GPU parallel Grad-Shafranov solver is

Core software challenges of the GPU High Level Trigger 1 of LHCb
Our translation can target any vectorization-capable CPU, with a con gurable vector-width at compile time. Our algorithm design is not speci c for GPUs, but bene t any SIMD processor. Compatibility with x-processors can be achieved with a low-e ort translation.

GPU Accelerated Maximum Likelihood Analysis for Phylogenetic Inference
With the advancement of biology and computer science, the amount of DNA sequences has grown at a rapid rate giving rise to the analysis of phylogenetic trees with many taxa. The maximum likelihood analysis is commonly considered as the best approach in phylogenetic

