apache spark IEEE PAPER, IEEE PROJECT
Processing Large Raster and Vector Data in Apache Spark
free download
Spatial data processing frameworks in many cases are limited to vector data only. However, an important type of spatial data is raster data which is produced by sensors on satellites but also by high resolution cameras taking pictures of nano structures, such as chips on wafers
Geospatial Data Management in Apache Spark : A Tutorial
free download
The volume of spatial data increases at a staggering rate. This tutorial comprehensively studies how existing works extend Apache Spark to uphold massive-scale spatial data. During this 1.5 hour tutorial, we first provide a background introduction of the characteristics
ARFF data source library for distributed single/multiple instance, single/multiple output learning on Apache Spark
free download
Apache Spark has become a popular framework for distributed machine learning and data mining. However, it lacks support for operating with Attribute-Relation File Format (ARFF) files in a native, convenient, transparent, efficient, and distributed way. Moreover, Spark
Ibis Data Serialization in Apache Spark
free download
With the demand for real-time big data analytics, the efficiency and performance of big data tools have become increasingly more important. One of these tools is Apache Spark , and like most other distributed applications, serialization plays an important role in its
MaRe: Processing Big Data with application containers on Apache Spark
free download
Background Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in
Stroke Prediction using Distributed Machine Learning Based on Apache Spark
free download
Stroke is one of death causes and one the primary causes of severe long-term weakness in the world. In this paper, we compare different distributed machine learning algorithms for stroke prediction on the Healthcare Dataset Stroke. This work is implemented by a big data
Efficient Distributed Range Query Processing in Apache Spark
free download
Range queries are important in many diverse applications. In its simplest one-dimensional form, a range query is expressed by an interval [a, b] on the real line, whereas the answer consists of all elements e∈[a, b]. In this work, we focus on efficient range query processing
ARTIFICIAL INTELLIGENCE WITH BIG DATA AND UTILIZATION OF APACHE SPARK APPLICATION
free download
Among various type of applications in Artificial Intelligence, Big Data has emerged as a source of new opportunities. Various design considerations exist in this relatively new field where parallel processing frameworks can be engaged in a more economical fashion
Using Apache Spark for Distributed Computation on a Network of Workstations
free download
As data and computational demand continues to grow, the demand for high performance computing clusters grows, especially in the scientific community. Access to large computing resources, however, is expensive and precious. To solve this problem, we evaluate Apache
SparkGA2: Production-quality memory-efficient Apache Spark based genome analysis framework
free download
Due to the rapid decrease in the cost of NGS (Next Generation Sequencing), interest has increased in using data generated from NGS to diagnose genetic diseases. However, the data generated by NGS technology is usually in the order of hundreds of gigabytes per
DYNAMIC APACHE SPARK CLUSTER FOR ECONOMIC MODELING
free download
Modern econometric modeling of macroeconomic processes usually meets certain challenges due to the incompleteness and heterogeneity of the initial information, as well as huge data volumes involved. In the work, on the example of modeling the level of Apache Spark jobs are often characterized by processing huge data sets and, therefore, require runtimes in the range of minutes to hours. Thus, being able to predict the runtime of such jobs would be useful not only to know when the job will finish, but also for scheduling The original version of this article unfortunately contained a graphical mistake in Table 4. Table 4 shows a list of performance metrics associated to the DSM DSM DSM and DSM4 methods proposed in the paper. Due to a production problem, this table does not
Weather Prediction Model using Random Forest Algorithm and Apache Spark
free download
One of the greatest challenge that meteorological department faces are to predict weather accurately. These predictions are important because they influence daily life and also affect the economy of a state or even a nation. Weather predictions are also necessary since they
An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark
free download
On-going big data from social networks sites alike Twitter or Facebook has been an entrancing hotspot for investigation by researchers in current decades as a result of various aspects including up-to-date-ness, accessibility and popularity; however anyway there may
Detection of Social Network Based Cyber Crime Forensics Using Apache Spark
free download
Social networking has provided platform to cyber criminals to mask their criminal activities online which poses different challenges on law in tracking and uncovering the fake accounts as most hints are hidden within the posts. Simultaneously, this makes difficult for the forensic
Learning Apache Spark with Python
free download
This is a shared repository for Learning Apache Spark Notes. The version can be downloaded from HERE. The first version was posted on Github in ChenFeng ([Feng2017]). This shared repository mainly contains the self-learning and self-teaching notes from
Automatic, cloud-independent, scalable Spark cluster deployment in cloud
free download
time, respectively. Recently, a popular choice to convey big data analytics or machine learning is to use Apache Spark , which is an open source distributed, cluster computing system. For best 2 Apache Spark Apache Spark is an
Profiling Compiled SQL Query Pipelines in Apache Spark
free download
Abstract Users of Apache SparkTM regularly encounter the problem of suboptimal execution of SQL queries. A query optimizer, such as Sparks Catalyst optimizer, can often resolve potential problems in queries defined by the user, but these optimizations are bound by
Debugging Spark Applications
free download
Apache Spark is a distributed framework which is used to run analyses on large-scale data. Debugging Apache Spark applications is difficult as no tool, apart from log files, is available on the market 7 Conclusion and Future Work 32 A Setting Up an Apache Spark Cluster 34
Using Hidden Markov Models and Spark to Mine ECG Data
free download
Manual ECG analysis can take hours. We propose combining accurate Hidden Markov Model (HMM) techniques with Apache Spark to improve the speed of ECG analysis Apache Spark avoids this issue by using the concept of resilient distributed datasets (RDDs)
A COMPARISON OF MACHINE LEARNING TECHNIQUES FOR ANDROID MALWARE DETECTION USING APACHE SPARK
free download
Wide-scale popularity of Android devices has necessitated the need of having effective means for detection of malicious applications. Machine learning based classification of android applications require training and testing on a large dataset. Motivated by these
Improving Astronomical Online Services With Apache Spark and Docker
free download
Apache Spark is a cluster computing platform designed to be fast and general purpose. It extends the MapReduce model to support more types of computations (interactive queries, stream processing, etc.) and it offers APIs for Scala, Java, Python, R, Important feature CSE PROJECTS