FASTA format

[4] The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-processing tools and scripting languages.

[5][6] Most people preferred the bigger font in 80-character modes and so it became the recommended fashion to use 80 characters or less (often 70) in FASTA lines.

[7] The first line in a FASTA file started either with a ">" (greater-than) symbol or, less frequently, a ";"[8] (semicolon) was taken as a comment.

Modern bioinformatics programs that rely on the FASTA format expect the sequence headers to be preceded by ">".

Running different bioinformatics programs may require conversions between "sequential" and "interleaved" FASTA formats.

In the original Pearson FASTA format, one or more comments, distinguished by a semi-colon at the beginning of the line, may occur after the header.

Some databases and bioinformatics applications do not recognize these comments and follow the NCBI FASTA specification.

The following list describes the NCBI FASTA defined format for sequence identifiers.

The compression of FASTA files requires a specific compressor to handle both channels of information: identifiers and sequence.

For example, the algorithm MFCompress[14] performs lossless compression of these files using context modelling and arithmetic encoding.

A tree-based approach to sorting multi-FASTA files (TREE2FASTA[24]) also exists based on the coloring and/or annotation of sequences of interest in the FigTree viewer.

Additionally, the Bioconductor Biostrings package can be used to read and manipulate FASTA files in R.[25] Several online format converters exist to rapidly reformat multi-FASTA files to different formats (e.g. NEXUS, PHYLIP) for use with different phylogenetic programs, such as the converter available on phylogeny.fr.