Parameter word

In the mathematical study of combinatorics on words, a parameter word is a string over a given alphabet having some number of wildcard characters.

[1] The set of strings matching a given parameter word is called a parameter set or combinatorial cube.

Parameter words can be composed, to produce smaller subcubes of a given combinatorial cube.

They have applications in Ramsey theory and in computer science in the detection of duplicate code.

Each wildcard character is required to appear at least once, but may appear multiple times, and the wildcard characters must appear in the order given by their indexes: the first wildcard character in the word must be

For 1-parameter words, the subscripts may be omitted, as there is no ambiguity between different wildcard characters.

strings (0-parameter words), obtained by substituting a symbol of

[1] In a combinatorial cube, each copy of a particular wildcard character must have the same replacement.

A generalization of parameter words allows different copies of the same wildcard character to be replaced by different characters from the alphabet, in a controlled way.

The first occurrence of each wildcard character must be assigned the identity element of the group.

Then, the strings represented by a labeled parameter word are obtained by choosing a character of

The corresponding combinatorial lines form seven of the eight lines of three cells in a row of the tic-tac-toe board; for instance, the one-parameter word

It is possible to obtain this line as a combinatorial line (without including any other combinations of cells that would be invalid for tic-tac-toe) by using a group with two elements, and an action in which the non-identity element swaps the alphabet letters

There are eight labeled one-parameter words of length two for this action, seven of which are obtained from the unlabeled one-parameter words by using the identity label for all wildcards.

This will necessarily produce a word of length

at least once, in ascending order, so it produces a valid

This notion of composition can also be extended to composition of labeled parameter words (both using the same alphabet and group action), by applying the group action to the non-wildcard substituted characters and composing the group labels for the wildcard substituted characters.

integers belong to distinct subsets.

Partitions of this type can be placed into a bijective equivalence with the parameter words, by creating a word with a character for each of the

, setting this character value to be either an integer in

-Stirling numbers obey a simple recurrence relation by which they may easily be calculated.

[3][4] In Ramsey theory, parameter words and combinatorial cubes may be used to formulate the Graham–Rothschild theorem, according to which, for every finite alphabet and group action, and every combination of integer values

, there exists a sufficiently large number

-dimensional combinatorial cube over strings of length

This result is a key foundation for structural Ramsey theory, and is used to define Graham's number, an enormous number used to estimate the value of

[1] In computer science, in the problem of searching for duplicate code, the source code for a given routine or module may be transformed into a parameter word by converting it into a sequence of tokens, and for each variable or subroutine name, replacing each copy of the same name with the same wildcard character.

If code is duplicated, the resulting parameter words will remain equal even if some of the variables or subroutines have been renamed.

More sophisticated searching algorithms can find long duplicate code sections that form substrings of larger source code repositories, by allowing the wildcard characters to be substituted for each other.

These are strings with wildcard characters that may be substituted independently of each other, without requiring that some of the substituted characters be equal or controlled by a group action.

However, because there is no repetition of wildcard symbols, partial words may be written more simply by omitting the subscripts on the wildcard symbols.