Web Scraping of Haodoo Backup Using Python and BeautifulSoup Take 1

SUMMARY: This project aims to practice web scraping by extracting specific pieces of information from a website. The web scraping Python code leverages the BeautifulSoup module.

INTRODUCTION: Haodoo is a website that houses classic Chinese literature for its readers’ enjoyment. Haodoo in Chinese can be translated to “Good Reads” in English. It collects hard-to-find Chinese text/books and makes them available for online reading. The Haodoo collection includes over 3,500 titles of text and audiobooks.

In this Take1 iteration, we will scrape the website and obtain all the book titles and their assigned categories.

Starting URL: https://haodoo.org

The source code and HTML output can be found here on GitHub.