String split into n-character long tokens


In this post, I am sharing how to split a string into tokens which are of N characters in length. That might be very handy when it comes to number formatting e.g.

  • Separate a number in decimal format with thousands delimiter, from 1233232 to 1,233,232
  • Separate a number in binary format every fourth digit (a single byte), from 1011010010101001 to 1011 0100 1010 1001
  • Separate a number in hexadecimal format every second character, from 0xfab22883b0ada0 to 0xfa b2 28 83 b0 ad a0

For that purpose we can use the following regular expression .{1,X}(?=(.{X})+(?!.))|.{1,X}$ and string.match() function, where X indicates number of characters in a single token.

Some examples

Let’s split a number in a hexadecimal format into 2-character long tokens. The first token can have 1 or 2 characters.

console.log('fab22883b0ada0'.match(/.{1,2}(?=(.{2})+(?!.))|.{1,2}$/g))
console.log('ab22883b0ada0'.match(/.{1,2}(?=(.{2})+(?!.))|.{1,2}$/g))

That would give like:

['fa', 'b2', '28', '83', 'b0', 'ad', 'a0']
['a', 'b2', '28', '83', 'b0', 'ad', 'a0']

Let’s split a number in a binary format into 4-character long tokens. The first token can have 1 to 4 characters.

console.log('1110101010101101'.match(/.{1,4}(?=(.{4})+(?!.))|.{1,4}$/g))
console.log('10101010101101'.match(/.{1,4}(?=(.{4})+(?!.))|.{1,4}$/g))

That would give like:

['1110', '1010', '1010', '1101']
['10', '1010', '1010', '1101']

Alternatively, you can build a regular expression using RegExp JavaScript class.

function split(input, len) {
  return input.match(new RegExp('.{1,' + len + '}(?=(.{' + len + '})+(?!.))|.{1,' + len + '}$', 'g'))
}
console.log(split('11010101101', 4))
console.log(split('ab22883b0ada0', 2))

That would give like:

['110', '1010', '1101']
['a', 'b2', '28', '83', 'b0', 'ad', 'a0']

Tags:

#conversion#javascript#split#string


You may also be interested in: