📅 2019-Apr-22 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ bom, sed, unicode ⬩ 📚 Archive
I processed a JSON file using some tool and the resulting JSON text file would not be accepted by other tools. They would complain that this was a UTF-8 Unicode (with BOM) text file. I had to remove whatever this BOM was from my UTF-8 file.
BOM is a byte order mark added by some tools to UTF-8 files. BOM is this 3-byte sequence: 0xEF,0xBB,0xBF
.
You could use any tool or process to remove these 3-byte sequences. If you are on Linux, the awesome sed tool can do the job:
$ sed -i '1s/^\xEF\xBB\xBF//' in.txt
Reference: How can I remove the BOM from a UTF-8 file?