[2] Imagine the following scenario: An application program requests a 1 MiB memory block.
When accessing this 1 MiB memory, each of the 256 page entries would be cached in the translation lookaside buffer (TLB; a cache that remembers virtual address to physical address translations for faster lookup on subsequent memory requests).
Cluttering the TLB is possibly one of the largest disadvantages of having several page entries for what could have been allocated in one single memory block.
This is a severe performance penalty and was possibly the largest motivation for augmenting the x86 architecture with larger page sizes.
If newer PSE-36 capability is available on the CPU, as checked using the CPUID instruction, then 4 more bits, in addition to normal 10 bits, are used inside a page-directory entry pointing to a large page.