Web Scraping Fake Job Listings from a fake Job Posting Website
This Python script is designed to scrape job listings from the Real Python fake job listings website. It retrieves the job titles, companies, locations, and "Apply" links for Python-related jobs.
-
Sending HTTP Request and Parsing HTML
The script sends a GET request to the Real Python fake job listings URL using the
requests.get()
function from therequests
module. It then parses the HTML content of the page using BeautifulSoup from thebs4
library.import requests from bs4 import BeautifulSoup url = "https://realpython.github.io/fake-jobs/" page = requests.get(url) soup = BeautifulSoup(page.content, "html.parser")
-
Finding Job Listings
It searches for the container of job listings by finding an element with the ID "ResultsContainer".
results = soup.find(id="ResultsContainer")
-
Filtering Python Jobs
The script filters job listings to only those containing the word "python" in their titles. It finds all
<h2>
elements with the specified condition.python_jobs = results.find_all("h2", string=lambda text: "python" in text.lower())
-
Extracting Job Details
Initially, the script attempted to extract job details directly from the filtered
<h2>
elements. However, it found that these elements did not include complete job information, resulting in errors.''' # Extracting job details directly from <h2> elements for jobP in python_jobs: job_title = jobP.find("h2", class_="title is-5") company = jobP.find("h3", class_="subtitle is-6 company") location = jobP.find("p", class_="location") print(job_title.text.strip()) print(company.text.strip()) print(location.text.strip()) print() '''
Another attempt was made to extract job details from all job listings. This time, it looped over all job elements and extracted details such as title, company, and location.
''' # Extracting job details from all job listings for job_element in job_elements: job_title = job_element.find("h2", class_="title is-5") company = job_element.find("h3", class_="subtitle is-6 company") location = job_element.find("p", class_="location") print(job_title.text.strip()) print(company.text.strip()) print(location.text.strip()) print() '''
-
Extracting "Apply" Links
The final approach was to extract "Apply" links for each Python job listing. It iterates over the parent elements of the filtered
<h2>
elements and finds all<a>
elements within them.for job_element in python_job_elements: links = job_element.find_all("a") for link in links: link_url = link["href"] print(f"Apply here: {link_url}\n")
This successfully extracts the URLs for applying to Python job listings.
This README file provides an overview of the script's functionality and explains each part of the code, including the commented-out sections.