Scalable Language Modeling: WikiText-103 on a Single GPU in 12 hours
ABSTRACT Word-level language modeling (WLM) is one the foundational tasks of unsupervised natural language processing. Most modern architectures for WLM use several LSTM layers, followed by a softmax layer. Even with larger batch sizes and a multi- GPU

Arioc: GPUaccelerated alignment of short bisulfite-treated reads
ABSTRACT Motivation: The alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non- bisulfite-treated reads. In the analysis of BS-seq data, this can present an important

MUGAN: Multi- GPU accelerated AmpliconNoise server for rapid microbial diversity assessment
ABSTRACT Motivation: Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. A typical metagenomic sequencing produces a large amount of data (often in the order of terabytes or

Transparent Avoidance of Redundant Data Transfer on GPUenabled Apache Spark
ABSTRACT This paper presents an extension to IBMSparkGPU, which is an Apache Spark framework capable of compute-or memoryintensive tasks on a graphics processing unit ( GPU ). The key contribution of this extension is an automated runtime that implicitly avoids

An optimization of search for neighbour-particle in MPS method for Xeon, Xeon Phi and GPU by using directives
ABSTRACT Moving Particle Semi-implicit (MPS) method is a particle-base simulation used in fields such as computational fluid dynamics. Target fluids and objects are divided up into particles, and each particle interacts with its neighbour-particle. This process is called

Three-Dimensional Numerical Simulation of Droplet Evaporation Using the Lattice Boltzmann Method Based on GPUCUDA Accelerated Algorithm
ABSTRACT . The three-dimensional (3D) single component multiphase Shan-Chen lattice Boltzmann (LB) model is implemented with the GPUaccelerated algorithm based on the CUDA platform for the simulation of droplet evaporation. It is found that the speed-up of the

G-NET: Effective { GPU } Sharing in {NFV} Systems
ABSTRACT Network Function Virtualization (NFV) virtualizes software network functions to offer flexibility in their design, management and deployment. Although GPUs have demonstrated their power in significantly accelerating network functions, they have not been effectively

Power Modeling Approach for GPU Source Program
ABSTRACT Rapid development of information technology makes our environment become smarter and massive high performance computers are providing powerful computing for that. Graphics Processing Unit ( GPU ) as a typical high performance component is being widely

Implementation and experimental benchmark of a two-layer CPU+ GPU hydrodynamics model
Large-scale geophysical flows often exhibit clear separation of fluid masses due to density imbalances, posing specific conceptual and numerical challenges in mathematical modelling. Density-driven flows range from global ocean circulation to estuarine plumes and

Removing Video Background Using a Particle Filter Based on a GPU for Graphic Artists
ABSTRACT : In video mixing, chroma key is the best-known method for background removal. However, it can only be used by graphic artists or in specific studios. If users want to modify the background of a video without using chroma key, they must remove the background

Synchronous Multi- GPU Deep Learning with Low-Precision Communication: An Experimental Study
ABSTRACT Training deep learning models has received tremendous research interest recently. In particular, there has been intensive research on reducing the communication cost of training when using multiple computational devices, through reducing the precision

Performance of Medical Image Processing Algorithms Implemented in CUDA running on GPU based Machine
ABSTRACT This paper illustrates the design and performance evaluation of few algorithms used for analysing the medical image volumes on the massive parallel graphics processing unit ( GPU ) with compute unified device architecture (CUDA). These algorithms are selected

Project CrayOn: Back to the future for a more General-Purpose GPU
ABSTRACT General purpose of use graphics processing units (GPGPU) recapitulates many of the lessons of the early generations of supercomputers. To what extent have we learnt those lessons, rather than repeating the mistakes To answer that question, I review why the Cray

Graduation Internship at Deltares Performance comparison of implicit and explicit schemes for the shallow water equations on a GPU with FORTRAN90 code
The goal is to implement a shallow water solver on a GPU and to compare the GPU performance of several numerical methods. A second goal is to carry out an inundation simulation for a polder in The Netherlands with a GPU code on a high resolution mesh. For

Green Computing using GPU in Image Processing
ABSTRACT Green computing is the process of reducing the power consumed by a computer and thereby reducing carbon emissions. The total power consumed by the computer excluding the monitor at its fully computative load is equal to the sum of the power ABSTRACT Investigations on how to speed up channel simulation by using various approaches. A novel multidimensional matrix based convolution approach using FFTs is proposed versus iterative for loops or single dimension serial convolutions. Experiments

CrossBow: Scaling Deep Learning on Multi- GPU Servers
ABSTRACT With the widespread availability of servers with 4 or more GPUs, scalability in terms of the number of GPUs in a server when training deep learning models becomes a paramount concern. Systems such as TensorFlow and MXNet train using synchronous

Computing ridge lines on the GPU
ABSTRACT . Extracting a high-level description of a three-dimensional shape is a key to most computer graphics applications, helping human vision by highlighting meaningful parts of the object. Among the many trails to define a notion of high-level features, differential

ooc cuDNN: A Deep Learning Library Supporting CNNs over GPU Memory capacity
Convolutional Neural Networks (CNNs) have achieved notable success particularly in the field of image recognition and image processing. Although there are many libraries which accelerate computation of CNNs with a GPU few works can compute in case over GPU

Salus: Fine-Grained GPU Sharing Among CNN Applications
P Yu, M Chowdhury The minimum granularity of GPU allocation in modern cluster managers is the entire GPU [2, 9, 13] a deep learning (DL) job can have multiple GPUs, but each GPU belongs to exactly one job regardless of its utilization level [3 5]. While this enables multi-tenancy at the cluster