Text Hoarder

Chrome Extension: save articles & see reading stats

Text Hoarder is a browser extension for Google Calendar that provides reader view, saving articles for later, and generation of stats based on your reading habits.

Try it out

Main features:

Screenshots

Reader view removes noise from any article
Customize reader view to your preferences
Save articles to a private GitHub repository
Stats based on your reading habits
Find out your most common websites and topics

Motivation

The extension's main feature is a reader mode. There are many "reader mode" extensions in the Google Web Store.

This one has two core differentiators:

The tricky parts

To keep the extension more user friendly, I initially wanted to generate the stats in the browser extension, without the need for a CLI utility. This proved tricky. GitHub provides an API for downloading the entire repository as a .tar.gz or .zip file, and such file can be extracted in-browser - however, that is not sufficient.

As part of the stats generation, I need access to file creation date and what Git tag is the file part of, which that endpoint did not provide. One could request each repository's file one by one and get the necessary metadata, but the number of API requests that would involve is impractical (I have almost 11,000 articles saved in my private text-hoarder-store repository). Thus, a CLI was required, to be able to checkout the repository locally and run Git commands on it.

Keeping the extension open and unopinionated was a bit of a challenge too. The extension needs to save files to a user's GitHub repository in a way that is easy to understand and manipulate (I settled on the "<year>/<domain>/<rest of the url>" naming convention). Converting a URL to a file path was a bit tricky too - many characters are restricted from file names on macOS and Windows. Windows also has a max file path length limitation. Also, the extension has to be careful not to overwrite any user files, and be friendly with any edits the user made locally.

Unexpectedly, many articles have query string parameters or even hash parts that are significant for loading the correct content - the extension needs to be smart about cleaning up UTM trackers and other noise from the URL, without breaking the reference to the original content.

Related projects

See also my Calendar Plus and Goodreads Stats browser extensions