Python: how to extract all the URLs from a web page

In this article we will see how to extract all the URLs of the links present in a web page with Python using the BeautifulSoup module.

It's a matter of finding all the a elements and getting the value of their href attribute.

from bs4 import BeautifulSoup

def get_page_urls(html):
    soup = BeautifulSoup(html, 'html.parser')
    links = soup.find_all('a')
    urls = []
    for link in links:
        href = link.get('href', '')
        if href.startswith('http'): 
            urls.append(href)
    return urls

Our function will return a list containing the URLs of the links found.

Back to top