apache spark IEEE PAPER, IEEE PROJECT



Processing Large Raster and Vector Data in Apache Spark
free download

Spatial data processing frameworks in many cases are limited to vector data only. However, an important type of spatial data is raster data which is produced by sensors on satellites but also by high resolution cameras taking pictures of nano structures, such as chips on wafers

Geospatial Data Management in Apache Spark : A Tutorial
free download

The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics

ARFF data source library for distributed single/multiple instance, single/multiple output learning on Apache Spark
free download

Apache Spark has become a popular framework for distributed machine learning and data mining. However, it lacks support for operating with Attribute-Relation File Format (ARFF) files in a native, convenient, transparent, efficient, and distributed way. Moreover, Spark




Ibis Data Serialization in Apache Spark
free download

With the demand for real-time big data analytics, the efficiency and performance of big data tools have become increasingly more important. One of these tools is Apache Spark , and like most other distributed applications, serialization plays an important role in its

MaRe: Processing Big Data with application containers on Apache Spark
free download

Background Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in

Stroke Prediction using Distributed Machine Learning Based on Apache Spark
free download

Stroke is one of death causes and one the primary causes of severe long-term weakness in the world. In this paper, we compare different distributed machine learning algorithms for stroke prediction on the Healthcare Dataset Stroke. This work is implemented by a big data

Efficient Distributed Range Query Processing in Apache Spark
free download

Range queries are important in many diverse applications. In its simplest one-dimensional form, a range query is expressed by an interval [a, b] on the real line, whereas the answer consists of all elements e∈[a, b]. In this work, we focus on efficient range query processing

ARTIFICIAL INTELLIGENCE WITH BIG DATA AND UTILIZATION OF APACHE SPARK APPLICATION
free download

Among various type of applications in Artificial Intelligence, Big Data has emerged as a source of new opportunities. Various design considerations exist in this relatively new field where parallel processing frameworks can be engaged in a more economical fashion

Using Apache Spark for Distributed Computation on a Network of Workstations
free download

As data and computational demand continues to grow, the demand for high performance computing clusters grows, especially in the scientific community. Access to large computing resources, however, is expensive and precious. To solve this problem, we evaluate Apache

SparkGA2: Production-quality memory-efficient Apache Spark based genome analysis framework
free download

Due to the rapid decrease in the cost of NGS (Next Generation Sequencing), interest has increased in using data generated from NGS to diagnose genetic diseases. However, the data generated by NGS technology is usually in the order of hundreds of gigabytes per

DYNAMIC APACHE SPARK CLUSTER FOR ECONOMIC MODELING
free download

Modern econometric modeling of macroeconomic processes usually meets certain challenges due to the incompleteness and heterogeneity of the initial information, as well as huge data volumes involved. In the work, on the example of modeling the level of Apache Spark jobs are often characterized by processing huge data sets and, therefore, require runtimes in the range of minutes to hours. Thus, being able to predict the runtime of such jobs would be useful not only to know when the job will finish, but also for scheduling The original version of this article unfortunately contained a graphical mistake in Table 4. Table 4 shows a list of performance metrics associated to the DSM DSM DSM and DSM4 methods proposed in the paper. Due to a production problem, this table does not

Weather Prediction Model using Random Forest Algorithm and Apache Spark
free download

One of the greatest challenge that meteorological department faces are to predict weather accurately. These predictions are important because they influence daily life and also affect the economy of a state or even a nation. Weather predictions are also necessary since they

An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark
free download

On-going big data from social networks sites alike Twitter or Facebook has been an entrancing hotspot for investigation by researchers in current decades as a result of various aspects including up-to-date-ness, accessibility and popularity; however anyway there may

Detection of Social Network Based Cyber Crime Forensics Using Apache Spark
free download

Social networking has provided platform to cyber criminals to mask their criminal activities online which poses different challenges on law in tracking and uncovering the fake accounts as most hints are hidden within the posts. Simultaneously, this makes difficult for the forensic

Learning Apache Spark with Python
free download

This is a shared repository for Learning Apache Spark Notes. The version can be downloaded from HERE. The first version was posted on Github in ChenFeng ([Feng2017]). This shared repository mainly contains the self-learning and self-teaching notes from

Automatic, cloud-independent, scalable Spark cluster deployment in cloud
free download

time, respectively. Recently, a popular choice to convey big data analytics or machine learning is to use Apache Spark , which is an open source distributed, cluster computing system. For best 2 Apache Spark Apache Spark is an

Profiling Compiled SQL Query Pipelines in Apache Spark
free download

Abstract Users of Apache SparkTM regularly encounter the problem of suboptimal execution of SQL queries. A query optimizer, such as Sparks Catalyst optimizer, can often resolve potential problems in queries defined by the user, but these optimizations are bound by

Debugging Spark Applications
free download

Apache Spark is a distributed framework which is used to run analyses on large-scale data. Debugging Apache Spark applications is difficult as no tool, apart from log files, is available on the market 7 Conclusion and Future Work 32 A Setting Up an Apache Spark Cluster 34

Using Hidden Markov Models and Spark to Mine ECG Data
free download

Manual ECG analysis can take hours. We propose combining accurate Hidden Markov Model (HMM) techniques with Apache Spark to improve the speed of ECG analysis Apache Spark avoids this issue by using the concept of resilient distributed datasets (RDDs)

A COMPARISON OF MACHINE LEARNING TECHNIQUES FOR ANDROID MALWARE DETECTION USING APACHE SPARK
free download

Wide-scale popularity of Android devices has necessitated the need of having effective means for detection of malicious applications. Machine learning based classification of android applications require training and testing on a large dataset. Motivated by these

Improving Astronomical Online Services With Apache Spark and Docker
free download

Apache Spark is a cluster computing platform designed to be fast and general purpose. It extends the MapReduce model to support more types of computations (interactive queries, stream processing, etc.) and it offers APIs for Scala, Java, Python, R, Important feature CSE PROJECTS

FREE IEEE PAPER AND PROJECTS

FREE IEEE PAPER