>BeautifulSoup > Features: Excellent HTML/XML parser, easy web scraping interfac...

antisthenes · on Feb 20, 2024

Parsing HTML super-fast is very low on the list of priorities when web-scraping things. Yes, in practice.

Most of the time it won't even register on the scale, compared to the time spent sending/receiving requests and data.

thrdbndndn · on Feb 20, 2024

Beautiful Soup comes with a "html.parser", and by default it doesn't not use or even install lxml.

labaron · on Feb 21, 2024

lxml is written in Cython and is very efficient in my tests. Much faster than BeautifulSoup, which is pure Python.

What alternatives are 5x faster?

cmdlineluser · on Feb 20, 2024

I'm sorry but BeautifulSoup is not just a wrapper over lxml.

lxml even has a module for using beautifulsoup's parser.

> lxml can make use of BeautifulSoup as a parser backend

https://lxml.de/elementsoup.html

> A very nice feature of BeautifulSoup is its excellent support for encoding detection which can provide better results for real-world HTML pages that do not (correctly) declare their encoding.