apache spark IEEE PAPER 2017

Closest-Pairs Query Processing inApache Spark
free download

Abstract Processing of spatial queries when the datasets involved are big can be accomplished efficiently in a parallel and distributed environment. The (K) Closest-Pair (s) Query, KCPQ, is a common query in many real-life applications involving geographical, or,

Increase the Performance of K-Means Clustering Algorithm UsingApache Spark
free download

Abstract Big data deals with large or complex traditional data. The term often refers to size and data. Big data presents a great challenge for database and data analytics research. It is used to get the predictive analysis from large data. It helps in decision making, and to take

SPARQL Graph Pattern Processing withApache Spark
free download

ABSTRACT A common way to achieve scalability for processing SPARQL queries is to choose MapReduce frameworks like Hadoop orSpark . Processing basic graph pattern (BGP) expressions generating large join plans over distributed data partitions is a major

Matrix Multiplications onApache Sparkthrough GPUs
free download

Abstract In this report, we consider the distribution of large scale matrix multiplications across a group of systems throughApache Spark , where each individual system utilizes Graphical Processor Units (GPUs) in order to perform the matrix multiplication. The purpose

Apache SparkStreaming
free download

Abstract This paper is the result of theAdvanced Database Systems seminar at the University of Applied Sciences in Rapperwil. The key point is to explain and understand the function of data stream management systems. The paper is split into two parts to cover a

An Investigation on Extensive Graphs of Distributed Prims Minimum Spanning Tree Construction UsingApache Spark
free download

Abstract: Minimum spanning trees are a standout amongst the most essential primitives utilized as a part of graph algorithms. They discover applications in various fields going from scientific categorization of Network design, Approximation algorithms for NP-hard problems,

An Information Theory-Based Feature Selection Framework for Big Data underApache Spark
free download

Abstract With the advent of extremely high-dimensional datasets, dimensionality reduction techniques are becoming mandatory. Of the many techniques available, feature selection is of growing interest for its ability to identify both relevant features and frequently repeated

Sampling Selection Strategy for Large Scale Deduplication in a Distributed System UsingApache Spark
free download

Abstract The generation of information from a wide range of sources has opened opportunities for the emergence of several new applications such as digital libraries, media streaming etc. that presuppose high quality data to provide reliable services. Data quality is

Data Analysis withApache Sparkand Zeppelin
free download

Abstract Over the last few years, Data Mining has become more and more important. In this paper we give an overview over Data Analysis withApache Sparkas proposed by Zaharia, et al.[1] and visualization of the results withApacheZeppelin. We mainly present this CSE PROJECTS