Implementing a GPU-based Machine Learning Library on Apache Spark
As data storage becomes increasingly commoditized, companies are collecting transactional records on the order of several petabytes that are beyond the ability of typical database software tools to store and analyze. Analysis of this big data can yield business

Machine Learning and Data Mining with Apache Spark
Abstract. This paper deals with the concepts of machine learning and data mining social networks, which are increasingly useful for businesses to know the consumers' sentiment towards their brand. This project, intended for use by engineers at Orange France, focuses

Performance Evaluation of Apache Spark on Cray XC Systems
Abstract:We report our experiences in porting and tuning the Apache Spark data analytics framework on the Cray XC30 (Edison) and XC40 (Cori) systems, installed at NERSC. Spark has been designed for cloud environments where local disk I/O is cheap and performance

When Apache Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration
FPGA-enabled datacenters have shown great potential for providing performance and energy efficiency improvement. In this paper we aim to answer one key question: how can we efficiently integrate FPGAs into stateof-the-art big-data computing frameworks like

SparkScore: Leveraging Apache Spark for Distributed Genomic Inference
Abstract:The method of the efficient score statistic is used extensively to conduct inference for high throughput genomic data due to its computational efficiency and ability to accommodate simple and complex phenotypes. Inference based on these statistics can