Restriction enzymes, like transcription factors, yield a gradual, though sharp, range of affinities for different sites [4] and are thus also best represented by PSFM.
[5][6] The existence of something akin to DNA binding sites was suspected from the experiments on the biology of the bacteriophage lambda[7] and the regulation of the Escherichia coli lac operon.
From then on, DNA binding sites for many transcription factors, restriction enzymes and site-specific recombinases have been discovered using a profusion of experimental methods.
However, the development of DNA microarrays and fast sequencing techniques has led to new, massively parallel methods for in-vivo identification of binding sites, such as ChIP-chip and ChIP-Seq.
Even though NCBI contemplates DNA binding site annotation in its reference sequences (RefSeq), most submissions omit this information.
There are, however, several private and public databases devoted to compilation of experimentally reported, and sometimes computationally predicted, binding sites for different transcription factors in different organisms.
Most of them rely on the principles of information theory and have available web servers (Yellaboina)(Munch), while other authors have resorted to machine learning methods, such as artificial neural networks.
[21] MEME[22] and Consensus [23] are classical examples of deterministic optimization, while the Gibbs sampler[24] is the conventional implementation of a purely stochastic method for DNA binding motif discovery.
Recent advances in sequencing have led to the introduction of comparative genomics approaches to DNA binding motif discovery, as exemplified by PhyloGibbs.