High Impact Factor : 4.396 icon | Submit Manuscript Online icon |

Efficient Pattern Detection in DNA Sequences Using Frequent Item set Mining and Random Forest


Shaikh Sajida Hussain , KJ College of Engineering and Management Research,Pune; Hole Vishakha Govind, KJ College of Engineering and Management Research,Pune; Nimbalkar Deepika Suresh, KJ College of Engineering and Management Research,Pune; Londhe Komal Ramesh, KJ College of Engineering and Management Research,Pune


DNA Sequence, Pattern Mining, Frequent Itemset Mining, Random Forest Classification


The information contained in the human genome is akin to a blueprint of the human body. This is due to the fact that the genes contain valuable information about the various processes required in the human body. This information is necessary to allow for the replication and replacement of the damaged cells in the body. The genes can also allow for the effective treatment of various diseases that can be vital for the human survival. Therefore, for this purpose the vast amount of information must be extracted from the genes through the DNA. It is a very extensive procedure to be performed manually, thus the Frequent Itemset Mining paradigm comes to the rescue. The Frequent Itemsets can allow for the identification of the defective or the frequently occurring genes effectively. Therefore, in this publication, the DNA patterns are extracted using the frequent Itemset Mining approach through the Linear Clustering and Entropy estimation to achieve the candidate sets. The resultant candidate sets obtained are then effectively classified through the Random Forest Classification. The methodology has been experimented extensively to reveal that it achieves significant improvements over the conventional approaches.

Other Details

Paper ID: IJSRDV8I70148
Published in: Volume : 8, Issue : 7
Publication Date: 01/10/2020
Page(s): 227-232

Article Preview

Download Article