Bioinformatics
Bioinformatics involves the manipulation, searching, and data mining of DNA  sequence data. The development of techniques to store and search DNA sequences  have led to widely applied advances in computer science, especially string  searching algorithms, machine learning and database theory.[120] String  searching or matching algorithms, which find an occurrence of a sequence of  letters inside a larger sequence of letters, were developed to search for  specific sequences of nucleotides. In other applications such as text editors,  even simple algorithms for this problem usually suffice, but DNA sequences cause  these algorithms to exhibit near-worst-case behaviour due to their small number  of distinct characters. The related problem of sequence alignment aims to  identify homologous sequences and locate the specific mutations that make them  distinct. These techniques, especially multiple sequence alignment, are used in  studying phylogenetic relationships and protein function. Data sets representing  entire genomes' worth of DNA sequences, such as those produced by the Human  Genome Project, are difficult to use without annotations, which label the  locations of genes and regulatory elements on each chromosome. Regions of DNA  sequence that have the characteristic patterns associated with protein- or  RNA-coding genes can be identified by gene finding algorithms, which allow  researchers to predict the presence of particular gene products in an organism  even before they have been isolated experimentally.