Checksum calculation in Node.js

In this post, I am sharing an easy way of generating a checksum of arbitrary text or content of a file in Node.js .

The checksum (aka hash sum) calculation is a one-way process of mapping an extensive data set of variable length (e.g., message, file), to a smaller data set of a fixed length (hash). The length depends on a hashing algorithm.

Note that a one-way process means it is not possible to perform a reverse calculation, i.e., to calculate input data (message, file, etc.) out of the checksum value. Even though it is possible to find a text which produces the same checksum (for example using rainbow tables ), you will never know if the message is identical to the original one or not.

For the checksum generation, we can use node crypto module. The module uses createHash(algorithm) to create a checksum (hash) generator. The algorithm is dependent on the available algorithms supported by the version of OpenSSL on the platform. Some examples:

  • md5 for MD5 message-digest algorithm
  • sha1 for SHA-1 is a cryptographic hash function

To get a list of all available hash algorithms, you can use crypto.getHashes().

var crypto = require('crypto')

crypto.getHashes() // [ 'dsa', 'dsa-sha', ..., 'md5', ... ]

A simple method generating checksum value form static input:

var crypto = require('crypto')

function checksum(str, algorithm, encoding) {
  return crypto
    .createHash(algorithm || 'md5')
    .update(str, 'utf8')
    .digest(encoding || 'hex')

checksum('This is my test text') // e53815e8c095e270c6560be1bb76a65d
checksum('This is my test text', 'sha1') // cd5855be428295a3cc1793d6e80ce47562d23def

You can also calculate a checksum of a file content using the following approach. Note that this approach should be used for small files only. Large files should be handled differently, which I will describe shortly.

var crypto = require('crypto'),
  fs = require('fs')

// checksum function definition as above
// Note that content of the test.dat file is "This is my test text"

fs.readFile('test.dat', function (err, data) {
  checksum(data) // e53815e8c095e270c6560be1bb76a65d
  checksum(data, 'sha1') // cd5855be428295a3cc1793d6e80ce47562d23def

Let’s now check how to handle big files. And how big is big? That depends on the context. Sometimes it might be a few MB, and sometimes it might be one GB.

Code snippet is as follows:

var hash = crypto.createHash('md5'),
  stream = fs.createReadStream('mybigfile.dat')

stream.on('data', function (data) {
  hash.update(data, 'utf8')

stream.on('end', function () {
  hash.digest('hex') // 34f7a3113803f8ed3b8fd7ce5656ebec

Note that the hasher (checksum generator) is updated with every chunk of data coming from the file stream (data event) and digest is generated when all the stream data has been consumed (end event).