Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

Unicode BOM problem

📅 2019-Apr-22 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ bom, sed, unicode ⬩ 📚 Archive

Problem

I processed a JSON file using some tool and the resulting JSON text file would not be accepted by other tools. They would complain that this was a UTF-8 Unicode (with BOM) text file. I had to remove whatever this BOM was from my UTF-8 file.

Solution

BOM is a byte order mark added by some tools to UTF-8 files. BOM is this 3-byte sequence: 0xEF,0xBB,0xBF.

You could use any tool or process to remove these 3-byte sequences. If you are on Linux, the awesome sed tool can do the job:

$ sed -i '1s/^\xEF\xBB\xBF//' in.txt

Reference: How can I remove the BOM from a UTF-8 file?


© 2022 Ashwin Nanjappa • All writing under CC BY-SA license • 🐘 @codeyarns@hachyderm.io📧