DATA STRUCTURES FOR MACHINE LEARNING

K. SAI BHARGAV V. RAJENDRA
DEPARTMENT OF COMPUTER SCIENCE (M. TECH)
VALLURUPALLY NAGESWARA RAO VIGNANA JYOTHI INSTITUTE OF ENGINEERING AND TECHNOLOGY, HYDERABAD, INDIA

1. ABSTRACT
Data structure is the organization of data into our memory and efficient retrieval of data from memory. Machine Learning is the process of making machine to learn and perform on its own with the given inputs and outputs. So both are different streams of computer field. So this paper brings a relation between data structures and machine learning algorithms by taking an algorithm from data structures and machine learning. In this paper we are going to discuss about two algorithms which are used in information retrieval and they are K-Nearest Neighbor(KNN) and K Dimensional – Tree (KD-Tree).KD-Tree (K Dimensional – Tree) is a data structure for partitioning data points into a k-dimensional space. KD-Trees are widely used in several applications like range searches and machine learning algorithm like nearest neighbor searches and creating point clouds. KD-Trees are special case of binary search trees invented by Jon Louis Bentley in the year 1975. It is an improvement over KNN. It is useful for representing data in an efficient manner. In KD-Tree the data points are organized and partitioned on the basis of some specific rules and conditions. KNN (K-Nearest Neighbor) is a supervised machine learning classification algorithm which gives information regarding what group something belongs to, for example, type of tumor, the favorite sport of a person etc. This paper focusses on the parallel implementation of both the KD-Tree and KNN algorithms for information retrieval & inverted index. However this implementation is inefficient for large datasets. Construction of a nearest neighbor graph is an important task in machine learning applications. However, it is computationally expensive, especially in constructing such graphs for huge amount of datasets and when the data is of high dimensional. Python’s open source machine learning library Scikit-learn uses k-d trees for implementation of knn algorithm. Here we proposed an Inverted index which is a counter for curse of dimensionality in k-d trees from data structures.

2. INTRODUCTION
KD-Tree is a data structure useful when organizing data by several criteria all at once.
KD-Tree data structure is much like the regular binary search tree that you are familiar with. Except each node would typically hold not just 1 value, but a list of values. When traversing a KD-Tree, we cycle through the index to the values list, based on the depth of a particular node.
The “K”in the name “KD-Tree”refers to the number of properties that each data point has. In our example, since it’s a 2 dimensional plane, we have exactly 2 properties – the X coordinate and the Y coordinate. So the root node starts off arbitrarily picking one of those. Let’s suppose it’s the X coordinate. Then its children, the second level, alternate to using the Y coordinate. The grandchildren nodes cycle back to using the X coordinate, and so on as shown in the below two figure.

KD-Tree Structure
When inserting a new node, just like in the typical binary search tree, we compare the node’s value and go left if it’s smaller, right if it’s bigger. Consider the example where we need to insert set of values and the KD-Tree would be as shown in the below figure.
Visually you can think of the root node dividing the XY plane into left and right sides, since we decided to start off using the X coordinate. This is shown in the below two figures where the nodes are visually plotted on the graph consisting of X coordinate and Y coordinate and are plotted accordingly as shown in the below two figures.

we walk down the tree and keep calculating the shortest distance to the target point from each of the point and we group the target point into any one of data set points.
KD-Trees fails to handle high dimensional data, i.e. with n number of data points and n dimensional space it can’t give accurate results and may miss nearest neighbors. So to overcome such situations we are discussing a new algorithm inverted index which do not miss any nearest neighbor and to increase KNN efficiency.



LIMITATIONS AND DISCUSSIONS
Construction of a nearest neighbor graph is often a necessary step in many machine learning applications. However, constructing such a graph is computationally expensive, especially when the data is high dimensional. Python’s opensource machine learning library Scikit-learn uses k-d trees and ball trees to implement nearest neighbor graph construction. However, this implementation is inefficient for large datasets.
k-d trees can be used only for the 2-dimensional and 3-dimensional data and so on up to some limited dimensions. As we go on increasing the datasets and the dimensions of the plane then it will be more difficult to analyze the data and retrieve the information from the available datasets. So, in such cases k-d trees fails to cope up with the high dimensional datasets as we discussed earlier. So here in such cases comes the aid of inverted indexing, where they can be able to handle high dimensional data in an efficient manner resulting in accurate information retrieval from the given datasets which are of high dimensional.

data-structures-for-machine-learning
FREE DOWNLOAD


@ engpaper.com published paper

PUBLICATION PROCEDURE WITH US ENGPAPER.COM

ENGPAPER.COM PUBLISHED PAPERS



[31] Akbarzhon Madaminov, “Recommendation Systems”, Engpaper Journal, https://www.engpaper.com/recommendation-systems.htm
[32] Aathi oli.S , “REVIEW PAPER ON PHISHING ATTACKS”, Engpaper Journal, https://www.engpaper.com/review-paper-on-phishing-attacks.htm
[33] Rania Fernando, “IoT based – Street Light Controlling System”, Engpaper Journal, https://www.engpaper.com/iot-based-street-light-controlling-system.htm
[34] K. SAI BHARGAV, V. RAJENDRA, “Study on Data Structures for Machine Learning”, Engpaper Journal, https://www.engpaper.com/data-structures-for-machine-learning.htm
[35]Brundha P, Guruprasad K N, Amith V Hiremath,Sirisha R, Chandrakanth G Pujari , “Face Detection Based Smart Attendance System Using Haar Cascade Algorithm”, Engpaper Journal, https://www.engpaper.com/face-detection-based-smart-attendance-system.htm
[36] Afsana Nadaf , “RFID BASED LIBRARY MANAGEMENT SYSTEM”, Engpaper Journal, https://www.engpaper.com/rfid-based-library-management-system.htm
[37] Mr. Vedant Thube, Neha Thakur, Mr. Siddhesh Balsaraf,Ms. Priyanka Hanchate, Dr. S. D. Sawarkar , “Accident Prevention using Eye Drowsiness & Yawning Detection”, Engpaper Journal, https://www.engpaper.com/accident-prevention-paper.htm
[38] Abhishek A Hishobkar, Rutuja Gaonkar, Jagdish Chintamani , “DIGITAL DIARY”, Engpaper Journal, https://www.engpaper.com/digital-diary.htm
[39] Pooman Suryavanshi, Aryan Ghadge, Manali Kharat , “TAXI SERVICE for VISUALLY IMPAIRED”, Engpaper Journal, https://www.engpaper.com/taxi-service-for-visually-impaired.htm
[40] Mr. Pankaj yadav, Shila Jawale, Mr. Ashutosh Mahadik, Ms. Neha Nivalkar, Dr. S. D. Sawarkar , “NEWS ARTICLES CLASSIFICATION”, Engpaper Journal, https://www.engpaper.com/news-articles-classification.htm
[41] Rahul Chavan, Manvee Bhoir, Gaurav Sapkale, Anita Mhatre, “Smart Tourist Guide System”, Engpaper Journal, https://www.engpaper.com/smart-tourist-guide-system.htm
[42] Rutik Desai, Akash Jadhav,Suraj Sawant ,Neha Thakur , “Accident Detection Using ML and AI Techniques”, Engpaper Journal, https://www.engpaper.com/accident-detection-using-ml-and-ai-techniques.htm
[43] Anagha Vishe,Akash Shirsath, Sayali Gujar, Neha Thakur , “Student Attendance System using Face Recognition”, Engpaper Journal, https://www.engpaper.com/student-attendance-system-using-face-recognition.htm
[44] Ms.Sayali Patekar, Shila jawale, Ms.Pranali Kurhade, Mr.Shubham Khamkar , “Smart Classroom Application”, Engpaper Journal, https://www.engpaper.com/smart-classroom-application.htm
[45] DOSHI SAKSHI, DEVYANI CHAUDHARI, POOJA GAIKWAD, RUTUJA CHABUKSWAR,MRS. SUJATA KOLHE, “TOURISM SIMPLIFIED THROUGH VOICE”, Engpaper Journal, https://www.engpaper.com/tourism-simplified-through-voice.htm
[46] Afreen Fathima,Samreen Jameel, Pathan Ahmed khan , “ACCIDENT DETECTION AND ALERTING SYSTEM”, Engpaper Journal, https://www.engpaper.com/accident-detection-and-alerting-system.htm
[47] Suman Zareen, Tuba Masood, Pathan Ahmed khan, “E-Commerce Web Application with Augmented Reality”, Engpaper Journal, https://www.engpaper.com/e-commerce-web-application-with-augmented-reality.htm
[48] Lok Shan CHAN, “Selection of Waterfall and Agile Methodologies in Software Testing”, Engpaper Journal, https://www.engpaper.com/selection-of-waterfall-and-agile-methodologies-in-software-testing.htm
[49] Barve Rutu, “CLOUD COMPUTING SYSTEM FOR GAMING”, Engpaper Journal, https://www.engpaper.com/cloud-computing-system-for-gaming.htm
[50] Harshvardhan Singh, “Machine Learning: Fake News Blocking”, Engpaper Journal, https://www.engpaper.com/machine-learning-fake-news-blocking.htm
[51] M.Al Batahari, “SERVERS ROOM MONITORING SYSTEM USING IOT”, Engpaper Journal, https://www.engpaper.com/servers-room-monitoring-system-using-iot.htm
[52] AYUSHI ANKITA RAKSHIT, “VIRTUAL MASTER USING PYTHON”, Engpaper Journal, https://www.engpaper.com/virtual-master-using-python.htm
[53] Baldeep Kaur, “REAL TIME SLEEP DROWSINESS DETECTION USING FACE RECOGNITION”, Engpaper Journal, https://www.engpaper.com/real-time-sleep-drowsiness-detection-using-face-recognition.htm
[54] Suchitav Khadanga, “Two Stage CMOS Operational Amplifier From Specification to Design”, Engpaper Journal, https://www.engpaper.com/opamp-operational-amplifier-design.htm
[55] nidhi sharma, “Introduction to Remote Sensing”, Engpaper Journal, https://www.engpaper.com/introduction-to-remote-sensing.htm
[56] Rohith N Reddy, “COVID-19 Detection using SVM Classifier”, Engpaper Journal, https://www.engpaper.com/covid-19-detection-using-svm-classifier.htm
[57] Swapnil Kole, “COVID-19 Database on Consortium Blockchain”, Engpaper Journal, https://www.engpaper.com/covid-19-database-on-consortium-blockchain.htm
[58] TejalLengare, PallaviSonawane, PrachiGunjal, ShubhamDhire, Prof.Shaikh.J.N , “Accident Detection & Avoidance System in Vehicles”, Engpaper Journal, https://www.engpaper.com/accident-detection-and-avoidance-system-in-vehicles.htm
[59] Abhishek Pawshekar, Deepti More, Akash Khade, Pratiksha Wagh, Ganesh Ubale, “Augmented Reality: to converting and placing object into 3D model”, Engpaper Journal, https://www.engpaper.com/augmented-reality-survey.htm
[60] ABDUL KHADER.A.S, A.R.PRADEEP, Dr.T.V.Mallesh, S.R.Ramesh, “PARAMETRIC BEHAVIOUR OF BOX GIRDER BRIDGES UNDER DIFFERENT RADIUS OF CURVATURE & VARYING SPANS”, Engpaper Journal, https://www.engpaper.com/parametric-behaviour-of-box-girder-bridges.htm
[61] Prof.Ubale.G.S, Pranjal Adhav,Pooja Gaikwad, Sushama Nadavade ,Pooja Kale , “Iot based Bridge Monitoring System”, Engpaper Journal, https://www.engpaper.com/iot-based-bridge-monitoring-system.htm
[62] Divya Deewan, Priyanka Maheshwari, Sanjay Jain, “A REVIEW OF BATTERY-SUPERCAPACITOR HYBRID ENERGY STORAGE SYSTEM SCHEMES FOR POWER SYSTEM APPLICATION”, Engpaper Journal, https://www.engpaper.com/hybrid-energy-storage-system-schemes-for-power-system-application.htm
[63] Prof.Ansari.M.B, Pranjal Adhav,Pooja Gaikwad,Sushama Nadavade,Pooja Kale, “Survey on MyHelper IOT based Bridge Monitoring System”, Engpaper Journal, https://www.engpaper.com/survey-on-myhelper-iot-based-bridge-monitoring-system.htm
[64] Shreyas.S.J, Saddam hussain, Chaithra E, “COMPARATIVE STUDY ON SEISMIC RESPONSE OF MASONRY INFILLED RC FRAME BUILDINGS AND MIVAN BUILDINGS WITH DIFFERENT PERCENTAGE OF WALL OPENINGS”, Engpaper Journal, https://www.engpaper.com/seismic-response-of-masonry-infilled-rc-frame-buildings-and-mivan-buildings.htm
[65] Yusuf Ali Hassan, “Somali Power-Grid Significant Challenges”, Engpaper Journal, https://www.engpaper.com/somali-power-grid-significant-challenges.htm
[66] Ahmed N. Elhefnawy, “Refractive IR Objective Optical Design Operating in LWIR band For Military Observation Applications”, Engpaper Journal, https://www.engpaper.com/refractive-ir-objective-optical-design.htm
[67] S MANJULA, D SELVATHI and SUCHITAV KHADANGA, “Design of low-power CMOS transceiver front end for 2.4-GHz WPAN applications”, Engpaper Journal, https://www.engpaper.com/low-power-cmos-transceiver-front-end.htm
[68] Suchitav Khadanga, “Fabrication of MEMS Pressure Sensor on thin film membrane”, Engpaper Journal, https://www.engpaper.com/fabrication-of-mems-pressure-sensor-on-thin-film-membrane.htm
[69] Suchitav Khadanga and Dr. K.R.Suresh Nair, “An Introduction to Bluetooth”, Engpaper Journal, https://www.engpaper.com/an-introduction-to-bluetooth.htm
[70] Suchitav Khadanga and S. Ahmad, “DESIGN AND FABRICATION OF LOW COST MICROWAVE OSCILLATOR”, Engpaper Journal, https://www.engpaper.com/design-and-fabrication-of-low-cost-microwave-oscillator.htm
[71] Ameen Ahmed, Noushad S, Suchitav Khadanga, K.R.Suresh Nair, P.K.Radhakrishnan, “DEVELOPMENT OF LOW PHASE NOISE SMALL FOOT PRINT SURFACE MOUNT VOLTAGE CONTROLLED OSCILLATOR”, Engpaper Journal, https://www.engpaper.com/development-of-low-phase-noise-small-foot-print-surface-mount-voltage-controlled-oscillator.htm
[72] Suchitav Khadanga , “Synchronous programmable divider design for PLL Using 0.18 um cmos technology”, Engpaper Journal, https://www.engpaper.com/synchronous-programmable-divider-design-for-pll.htm
[73] Kavya.G.R, Shivaraju.G.D, Dr. T V Mallesh, S R Ramesh, “PROGRESSIVE COLLAPSE RESISTANCE OF FLAT SLAB BUILDING”, Engpaper Journal, https://www.engpaper.com/progressive-collapse-resistance-of-flat-slab-building.htm Copyright protected @ ENGPAPER.COM and AUTHORS https://www.engpaper.com