Unicode BOM problem

Problem

I processed a JSON file using some tool and the resulting JSON text file would not be accepted by other tools. They would complain that this was a UTF-8 Unicode (with BOM) text file. I had to remove whatever this BOM was from my UTF-8 file.

Solution

BOM is a byte order mark added by some tools to UTF-8 files. BOM is this 3-byte sequence: 0xEF,0xBB,0xBF.

You could use any tool or process to remove these 3-byte sequences. If you are on Linux, the awesome sed tool can do the job:

$ sed -i '1s/^\xEF\xBB\xBF//' in.txt

Reference: How can I remove the BOM from a UTF-8 file?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.