A High-Performance DNA Multiple Pattern Matching Algorithm Based on Index Binding and ASCII Hashing

Document Type : Original Article

Authors

1 Institute of Management Information Systems, Suez 14028, Egypt

2 Department of Mathematics and Computer Science, Faculty of Science, Port Said University, Port Said 42521, Egypt

3 Faculty of Computers and Artificial Intelligence, Benha University, Benha 13511, Egypt

4 Faculty of Computers and Artificial Intelligence, Damietta University, 34511 Damietta, Egypt

Abstract

Beyond health, the exponential trend in the volume of genomic data, combined with broad access to personal genome sequencing, has generated a pressing demand for fast and  scalable algorithms to explore patterns in DNA sequences. Rapid and specific identification of DNA sub-sequences is of critical importance in various applications,  such as personalized medicine, evolutionary biology, and forensic sciences. A new algorithm for exact multiple pattern matching in DNA sequences is presented in this paper. The approach is based on combining an index binding technique with a fast and lightweight hashing method that utilizes ASCII encoding. The method builds an  index table for each test pattern nucleotide and computes an integer hash value using the ASCII values of the test pattern characters. This system of dual filter s is three to four orders of magnitude more efficient than comparing them all. Experimental evaluations using real genomic data demonstrate that the proposed algorithm achieves improved performance compared to established algorithms such as KMP, Boyer-Moore, and Rabin-Karp, particularly in terms of execution time and comparison complexity. The proposed algorithm maintains a time complexity of O(N⋅M) in the worst case, with reduced overhead in typical usage scenarios, making it a practical solution for large-scale DNA sequence analysis.

Keywords

Main Subjects