DATA STRUCTURES FOR MACHINE LEARNING
K. SAI BHARGAV V. RAJENDRA
DEPARTMENT OF COMPUTER SCIENCE (M. TECH)
VALLURUPALLY NAGESWARA RAO VIGNANA JYOTHI INSTITUTE OF ENGINEERING AND TECHNOLOGY, HYDERABAD, INDIA
Data structure is the organization of data into our memory and efficient retrieval of data from memory. Machine Learning is the process of making machine to learn and perform on its own with the given inputs and outputs. So both are different streams of computer field. So this paper brings a relation between data structures and machine learning algorithms by taking an algorithm from data structures and machine learning. In this paper we are going to discuss about two algorithms which are used in information retrieval and they are K-Nearest Neighbor(KNN) and K Dimensional – Tree (KD-Tree).KD-Tree (K Dimensional – Tree) is a data structure for partitioning data points into a k-dimensional space. KD-Trees are widely used in several applications like range searches and machine learning algorithm like nearest neighbor searches and creating point clouds. KD-Trees are special case of binary search trees invented by Jon Louis Bentley in the year 1975. It is an improvement over KNN. It is useful for representing data in an efficient manner. In KD-Tree the data points are organized and partitioned on the basis of some specific rules and conditions. KNN (K-Nearest Neighbor) is a supervised machine learning classification algorithm which gives information regarding what group something belongs to, for example, type of tumor, the favorite sport of a person etc. This paper focusses on the parallel implementation of both the KD-Tree and KNN algorithms for information retrieval & inverted index. However this implementation is inefficient for large datasets. Construction of a nearest neighbor graph is an important task in machine learning applications. However, it is computationally expensive, especially in constructing such graphs for huge amount of datasets and when the data is of high dimensional. Python’s open source machine learning library Scikit-learn uses k-d trees for implementation of knn algorithm. Here we proposed an Inverted index which is a counter for curse of dimensionality in k-d trees from data structures.
KD-Tree is a data structure useful when organizing data by several criteria all at once.
KD-Tree data structure is much like the regular binary search tree that you are familiar with. Except each node would typically hold not just 1 value, but a list of values. When traversing a KD-Tree, we cycle through the index to the values list, based on the depth of a particular node.
The “K”in the name “KD-Tree”refers to the number of properties that each data point has. In our example, since it’s a 2 dimensional plane, we have exactly 2 properties – the X coordinate and the Y coordinate. So the root node starts off arbitrarily picking one of those. Let’s suppose it’s the X coordinate. Then its children, the second level, alternate to using the Y coordinate. The grandchildren nodes cycle back to using the X coordinate, and so on as shown in the below two figure.
When inserting a new node, just like in the typical binary search tree, we compare the node’s value and go left if it’s smaller, right if it’s bigger. Consider the example where we need to insert set of values and the KD-Tree would be as shown in the below figure.
Visually you can think of the root node dividing the XY plane into left and right sides, since we decided to start off using the X coordinate. This is shown in the below two figures where the nodes are visually plotted on the graph consisting of X coordinate and Y coordinate and are plotted accordingly as shown in the below two figures.
we walk down the tree and keep calculating the shortest distance to the target point from each of the point and we group the target point into any one of data set points.
KD-Trees fails to handle high dimensional data, i.e. with n number of data points and n dimensional space it can’t give accurate results and may miss nearest neighbors. So to overcome such situations we are discussing a new algorithm inverted index which do not miss any nearest neighbor and to increase KNN efficiency.
LIMITATIONS AND DISCUSSIONS
Construction of a nearest neighbor graph is often a necessary step in many machine learning applications. However, constructing such a graph is computationally expensive, especially when the data is high dimensional. Python’s opensource machine learning library Scikit-learn uses k-d trees and ball trees to implement nearest neighbor graph construction. However, this implementation is inefficient for large datasets.
k-d trees can be used only for the 2-dimensional and 3-dimensional data and so on up to some limited dimensions. As we go on increasing the datasets and the dimensions of the plane then it will be more difficult to analyze the data and retrieve the information from the available datasets. So, in such cases k-d trees fails to cope up with the high dimensional datasets as we discussed earlier. So here in such cases comes the aid of inverted indexing, where they can be able to handle high dimensional data in an efficient manner resulting in accurate information retrieval from the given datasets which are of high dimensional.
@ engpaper.com published paper
PUBLICATION PROCEDURE WITH US ENGPAPER.COM
 Akbarzhon Madaminov, “Recommendation Systems”, Engpaper Journal, https://www.engpaper.com/recommendation-systems.htm
 Aathi oli.S , “REVIEW PAPER ON PHISHING ATTACKS”, Engpaper Journal, https://www.engpaper.com/review-paper-on-phishing-attacks.htm
 Rania Fernando, “IoT based – Street Light Controlling System”, Engpaper Journal, https://www.engpaper.com/iot-based-street-light-controlling-system.htm
 K. SAI BHARGAV, V. RAJENDRA, “Study on Data Structures for Machine Learning”, Engpaper Journal, https://www.engpaper.com/data-structures-for-machine-learning.htm
Brundha P, Guruprasad K N, Amith V Hiremath,Sirisha R, Chandrakanth G Pujari , “Face Detection Based Smart Attendance System Using Haar Cascade Algorithm”, Engpaper Journal, https://www.engpaper.com/face-detection-based-smart-attendance-system.htm
 Afsana Nadaf , “RFID BASED LIBRARY MANAGEMENT SYSTEM”, Engpaper Journal, https://www.engpaper.com/rfid-based-library-management-system.htm
 Mr. Vedant Thube, Neha Thakur, Mr. Siddhesh Balsaraf,Ms. Priyanka Hanchate, Dr. S. D. Sawarkar , “Accident Prevention using Eye Drowsiness & Yawning Detection”, Engpaper Journal, https://www.engpaper.com/accident-prevention-paper.htm
 Abhishek A Hishobkar, Rutuja Gaonkar, Jagdish Chintamani , “DIGITAL DIARY”, Engpaper Journal, https://www.engpaper.com/digital-diary.htm
 Pooman Suryavanshi, Aryan Ghadge, Manali Kharat , “TAXI SERVICE for VISUALLY IMPAIRED”, Engpaper Journal, https://www.engpaper.com/taxi-service-for-visually-impaired.htm
 Mr. Pankaj yadav, Shila Jawale, Mr. Ashutosh Mahadik, Ms. Neha Nivalkar, Dr. S. D. Sawarkar , “NEWS ARTICLES CLASSIFICATION”, Engpaper Journal, https://www.engpaper.com/news-articles-classification.htm
 Rahul Chavan, Manvee Bhoir, Gaurav Sapkale, Anita Mhatre, “Smart Tourist Guide System”, Engpaper Journal, https://www.engpaper.com/smart-tourist-guide-system.htm
 Rutik Desai, Akash Jadhav,Suraj Sawant ,Neha Thakur , “Accident Detection Using ML and AI Techniques”, Engpaper Journal, https://www.engpaper.com/accident-detection-using-ml-and-ai-techniques.htm
 Anagha Vishe,Akash Shirsath, Sayali Gujar, Neha Thakur , “Student Attendance System using Face Recognition”, Engpaper Journal, https://www.engpaper.com/student-attendance-system-using-face-recognition.htm
 Ms.Sayali Patekar, Shila jawale, Ms.Pranali Kurhade, Mr.Shubham Khamkar , “Smart Classroom Application”, Engpaper Journal, https://www.engpaper.com/smart-classroom-application.htm
 DOSHI SAKSHI, DEVYANI CHAUDHARI, POOJA GAIKWAD, RUTUJA CHABUKSWAR,MRS. SUJATA KOLHE, “TOURISM SIMPLIFIED THROUGH VOICE”, Engpaper Journal, https://www.engpaper.com/tourism-simplified-through-voice.htm
 Afreen Fathima,Samreen Jameel, Pathan Ahmed khan , “ACCIDENT DETECTION AND ALERTING SYSTEM”, Engpaper Journal, https://www.engpaper.com/accident-detection-and-alerting-system.htm
 Suman Zareen, Tuba Masood, Pathan Ahmed khan, “E-Commerce Web Application with Augmented Reality”, Engpaper Journal, https://www.engpaper.com/e-commerce-web-application-with-augmented-reality.htm
 Lok Shan CHAN, “Selection of Waterfall and Agile Methodologies in Software Testing”, Engpaper Journal, https://www.engpaper.com/selection-of-waterfall-and-agile-methodologies-in-software-testing.htm
 Barve Rutu, “CLOUD COMPUTING SYSTEM FOR GAMING”, Engpaper Journal, https://www.engpaper.com/cloud-computing-system-for-gaming.htm
 Harshvardhan Singh, “Machine Learning: Fake News Blocking”, Engpaper Journal, https://www.engpaper.com/machine-learning-fake-news-blocking.htm
 M.Al Batahari, “SERVERS ROOM MONITORING SYSTEM USING IOT”, Engpaper Journal, https://www.engpaper.com/servers-room-monitoring-system-using-iot.htm
 AYUSHI ANKITA RAKSHIT, “VIRTUAL MASTER USING PYTHON”, Engpaper Journal, https://www.engpaper.com/virtual-master-using-python.htm
 Baldeep Kaur, “REAL TIME SLEEP DROWSINESS DETECTION USING FACE RECOGNITION”, Engpaper Journal, https://www.engpaper.com/real-time-sleep-drowsiness-detection-using-face-recognition.htm
 Suchitav Khadanga, “Two Stage CMOS Operational Amplifier From Specification to Design”, Engpaper Journal, https://www.engpaper.com/opamp-operational-amplifier-design.htm
 nidhi sharma, “Introduction to Remote Sensing”, Engpaper Journal, https://www.engpaper.com/introduction-to-remote-sensing.htm
 Rohith N Reddy, “COVID-19 Detection using SVM Classifier”, Engpaper Journal, https://www.engpaper.com/covid-19-detection-using-svm-classifier.htm
 Swapnil Kole, “COVID-19 Database on Consortium Blockchain”, Engpaper Journal, https://www.engpaper.com/covid-19-database-on-consortium-blockchain.htm
 TejalLengare, PallaviSonawane, PrachiGunjal, ShubhamDhire, Prof.Shaikh.J.N , “Accident Detection & Avoidance System in Vehicles”, Engpaper Journal, https://www.engpaper.com/accident-detection-and-avoidance-system-in-vehicles.htm
 Abhishek Pawshekar, Deepti More, Akash Khade, Pratiksha Wagh, Ganesh Ubale, “Augmented Reality: to converting and placing object into 3D model”, Engpaper Journal, https://www.engpaper.com/augmented-reality-survey.htm
 ABDUL KHADER.A.S, A.R.PRADEEP, Dr.T.V.Mallesh, S.R.Ramesh, “PARAMETRIC BEHAVIOUR OF BOX GIRDER BRIDGES UNDER DIFFERENT RADIUS OF CURVATURE & VARYING SPANS”, Engpaper Journal, https://www.engpaper.com/parametric-behaviour-of-box-girder-bridges.htm
 Prof.Ubale.G.S, Pranjal Adhav,Pooja Gaikwad, Sushama Nadavade ,Pooja Kale , “Iot based Bridge Monitoring System”, Engpaper Journal, https://www.engpaper.com/iot-based-bridge-monitoring-system.htm
 Divya Deewan, Priyanka Maheshwari, Sanjay Jain, “A REVIEW OF BATTERY-SUPERCAPACITOR HYBRID ENERGY STORAGE SYSTEM SCHEMES FOR POWER SYSTEM APPLICATION”, Engpaper Journal, https://www.engpaper.com/hybrid-energy-storage-system-schemes-for-power-system-application.htm
 Prof.Ansari.M.B, Pranjal Adhav,Pooja Gaikwad,Sushama Nadavade,Pooja Kale, “Survey on MyHelper IOT based Bridge Monitoring System”, Engpaper Journal, https://www.engpaper.com/survey-on-myhelper-iot-based-bridge-monitoring-system.htm
 Shreyas.S.J, Saddam hussain, Chaithra E, “COMPARATIVE STUDY ON SEISMIC RESPONSE OF MASONRY INFILLED RC FRAME BUILDINGS AND MIVAN BUILDINGS WITH DIFFERENT PERCENTAGE OF WALL OPENINGS”, Engpaper Journal, https://www.engpaper.com/seismic-response-of-masonry-infilled-rc-frame-buildings-and-mivan-buildings.htm
 Yusuf Ali Hassan, “Somali Power-Grid Significant Challenges”, Engpaper Journal, https://www.engpaper.com/somali-power-grid-significant-challenges.htm
 Ahmed N. Elhefnawy, “Refractive IR Objective Optical Design Operating in LWIR band For Military Observation Applications”, Engpaper Journal, https://www.engpaper.com/refractive-ir-objective-optical-design.htm
 S MANJULA, D SELVATHI and SUCHITAV KHADANGA, “Design of low-power CMOS transceiver front end for 2.4-GHz WPAN applications”, Engpaper Journal, https://www.engpaper.com/low-power-cmos-transceiver-front-end.htm
 Suchitav Khadanga, “Fabrication of MEMS Pressure Sensor on thin film membrane”, Engpaper Journal, https://www.engpaper.com/fabrication-of-mems-pressure-sensor-on-thin-film-membrane.htm
 Suchitav Khadanga and Dr. K.R.Suresh Nair, “An Introduction to Bluetooth”, Engpaper Journal, https://www.engpaper.com/an-introduction-to-bluetooth.htm
 Suchitav Khadanga and S. Ahmad, “DESIGN AND FABRICATION OF LOW COST MICROWAVE OSCILLATOR”, Engpaper Journal, https://www.engpaper.com/design-and-fabrication-of-low-cost-microwave-oscillator.htm
 Ameen Ahmed, Noushad S, Suchitav Khadanga, K.R.Suresh Nair, P.K.Radhakrishnan, “DEVELOPMENT OF LOW PHASE NOISE SMALL FOOT PRINT SURFACE MOUNT VOLTAGE CONTROLLED OSCILLATOR”, Engpaper Journal, https://www.engpaper.com/development-of-low-phase-noise-small-foot-print-surface-mount-voltage-controlled-oscillator.htm
 Suchitav Khadanga , “Synchronous programmable divider design for PLL Using 0.18 um cmos technology”, Engpaper Journal, https://www.engpaper.com/synchronous-programmable-divider-design-for-pll.htm
 Kavya.G.R, Shivaraju.G.D, Dr. T V Mallesh, S R Ramesh, “PROGRESSIVE COLLAPSE RESISTANCE OF FLAT SLAB BUILDING”, Engpaper Journal, https://www.engpaper.com/progressive-collapse-resistance-of-flat-slab-building.htm Copyright protected @ ENGPAPER.COM and AUTHORS https://www.engpaper.com