Beautiful Soup (HTML parser)

Beautiful Soup is a Python package for parsing HTML and XML documents, including those with malformed markup.

It creates a parse tree for documents that can be used to extract data from HTML,[3] which is useful for web scraping.

[6] Richardson continues to contribute to the project,[7] which is additionally supported by paid open-source maintainers from the company Tidelift.

[9] Beautiful Soup represents parsed data as a tree which can be searched and iterated over with ordinary Python loops.

[10] The example below uses the Python standard library's urllib[11] to load Wikipedia's main page, then uses Beautiful Soup to parse the document and search for all links within.