Identification of Protein Coding Regions in Genomic DNA Using Unsupervised FMACA Based Pattern Classifier

Abstract

Genes carry the instructions for making proteins that are found in a cell as a specific sequence of nucleotides that are found in DNA molecules. But, the regions of these genes that code for proteins may occupy only a small region of the sequence. Identifying the coding regions play a vital role in understanding these genes. In this paper we propose a unsupervised Fuzzy Multiple Attractor Cellular Automata (FMCA) based pattern classifier to identify the coding region of a DNA sequence. We propose a distinct K-Means algorithm for designing FMACA classifier which is simple, efficient and produces more accurate classifier than that has previously been obtained for a range of different sequence lengths. Experimental results confirm the scalability of the proposed Unsupervised FCA based classifier to handle large volume of datasets irrespective of the number of classes, tuples and attributes. Good classification accuracy has been established.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…