[1] Keywords with the same or similar meanings in a natural language sense tend to be "close" in units of normalized Google distance, while words with dissimilar meanings tend to be farther apart.
Objects can also be given by name, like 'the four-letter genome of a mouse,' or 'the text of Macbeth by Shakespeare.'
There are also objects that cannot be given literally, but only by name, and that acquire their meaning from their contexts in background common knowledge in humankind, like "home" or "red".
The probabilities of Google search terms, conceived as the frequencies of page counts returned by Google divided by the number of pages indexed by Google (multiplied by the average number of search terms in those pages), approximate the actual relative frequencies of those search terms as actually used in society.
Other text corpora include Wikipedia, the King James version of the Bible or the Oxford English Dictionary together with appropriate search engines.
In the primes versus non-primes case and the WordNet experiment the NGD method is augmented with a support vector machine classifier.
These rates are about agreement with the WordNet categories which represent the knowledge of researchers with PhDs which entered them.