NAV
javascript

Introduction

OpenTitles is a browser addon that tracks changes to over forty news sites, such as nos.nl, nytimes.com and theguardian.com. This addon adds a button to the headlines on these sites, which when clicked, will show all recent changes to the title of this article. Additionally, OpenTitles is available as an API and as a daily database dump that may be used for research purposes.

Download on Chrome WebstoreDownload for Firefox

OpenTitles is made by Floris de Bijl

Technical Overview

OpenTitles relies on RSS feeds and persistent ID's in order to keep track of articles. Every few minutes a scraper will pull all the RSS feeds for every site and compare the titles in the RSS feed to the titles in the database.

While titles and URL's may change, the ID will generally persist between changes to an article. We can therefore use the ID in the RSS feed to match an article to an ID in the database.

This approach has a few limitations, most notably the refresh rate and retention of the RSS feeds. Some RSS feeds are generated on demand, which is the best-case scenario for OpenTitles. In the worst cases the feed is only refreshed every hour, so changes made to titles in that time may not be picked up by the scraper.

Furthermore, RSS feeds usually only contain a few dozen articles, so sites with a high throughput might have an article on the RSS feeds for a few hours at most. Any changes made to the title after that can't be tracked, as the scraper relies on the RSS feed for indexing new titles.

A more robust version of OpenTitles would still use the RSS feed for indexing articles, but manually visit those articles for a set period of time to check for new titles. This is vastly more complex than the current approach and my time is very limited, so a rewrite using this technique is not on the roadmap at this point.

The source code for OpenTitles is available on Github. The project is split into five repositories: this website, the scraper, the API server, the definition and the client (i.e. the browser addon).

All components are made with Typescript, with the exception of this website.

Database Dump

Every 24 hours (Central European Time) a new database dump is generated using mongoexport and made available through https://dump.opentitles.info/. This data is free to use for any purpose.

API

To be expanded

The entry point for the API is https://api.opentitles.info/v2/country

The path to an article is as follows https://api.opentitles.info/v2/country/%country%/org/%org/article/%articleId%

For example: https://api.opentitles.info/v2/country/nl/org/NOS/article/2353584

Errors

The OpenTitles API uses the following error codes:

Error Code Meaning
404 Not Found -- The specified resource could not be found. Formatted as JSON with an error property describing the error and a lookat property with the path to a list of this resource.
500 Internal Error -- An error occured on the server side that could not be recovered from.
{
  'error': 'No such country',
  'lookat': '/v2/country'
}