Skip to content

RasmusKard/imdb-watchlist-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDb Watchlist Scraper

Overview

This library provides a WatchlistScraper class for scraping IMDb watchlist data, including rating IDs and usernames, using the IMDb user ID.


Class Initialization

Parameters

  • userId (string, required):
    IMDb user ID string, formatted like 'ur125655832'.

  • headless (boolean, optional):
    Default: true.
    Determines whether the browser runs in headless mode. Set to false for testing purposes.

  • timeoutInMs (number, optional):
    Default: 3 minutes (180000 ms).
    The script timeout duration. If a timeout occurs, the watchlistGrabIds() function will console.error() and return null.

  • userAgent (string, optional):
    Default:
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36".
    Refer to Playwright documentation for user agent formatting details. This is used in Playwright's browser.newContext()


Usage

Main Function: watchlistGrabIds()

Description

watchlistGrabIds() opens a playwright browser, navigates to the watchlist and scrapes all of the IDs in that watchlist.

Return Value

On success, the function returns an object:

{
  idArr: string[], // Array of rating IDs
  username: string | null // IMDb username, or null if unavailable
}

Error cases

The function throws an error explicitly in the following cases:

  1. The target watchlist is privated
  2. Parsing time exceeds the time set in timeoutInMs
  3. Parsing completes but the returned idArr is empty

Example

Scraping All Rating IDs and Username by userId

const scraper = new WatchlistScraper({ userId: "ur125655832" });

try {
	const scrapingResults = await scraper.watchlistGrabIds();

	// username can be null
	const username = scrapingResults.username;
	if (!!username) {
		console.log(username);
	}

	// this is never null and always has at least 1 element
	const idArr = scrapingResults.idArr;
	console.log(idArr);
} catch (error) {
	// handle error
}

About

Scrape IMDb watchlist using Playwright

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors