Rants, Raves, and Rhetoric v4

Convert Little-endian UTF-16 to ASCII

February 27, 2019

hacker screen — Photo by Markus Spiske temporausch.com on Pexels.com

I generated some text files working with Get-Acl Powershell, but I did not know how to get Powershell to do some advanced features. (Basically, I wanted to the Select-String to include the next 2 lines and see whether a specific group was in that list. And maybe some exclusions.) So, I copied the files over to my Linux home to check there.

The basic most grep? Nothing.

I used ls -l and confirmed they have data. I used less to confirm I can see it.

I copied a string and did a grep for it. Nothing.

I did a dos2unix. That didn’t fix it. Finally, I did:

file filename.txt

That revealed the files had types of:

Original: Little-endian UTF-16 Unicode text, with CRLF line terminators
dos2unix converted: Little-endian UTF-16 Unicode text

Basically, this told me that the dos2unix fixed one problem but not both. The “with CRLF line terminators” means that Windows and Unix have philosophical differences in how to format text lines.

Little-endian is a geeky homage to Gulliver’s travels. It has to do with which direction one encodes the bits. But, it isn’t really the big problem here. UTF-16 is the problem because apparently, I need it to be UTF-8 for grep to read it. So, the fix is to use an encoding converting:

iconv -f utf-16 -t utf-8 filename.txt > filename_new.txt

Ezra S F

ascii, dos2unix, get-acl, grep, iconv, Linux, little-endian, powershell, utf-16

3 responses to “Convert Little-endian UTF-16 to ASCII”

Convert Little-endian UTF-16 to ASCII | Ezra S F

February 27, 2019

[…] Convert Little-endian UTF-16 to ASCII published February 27, 2019 at […]

Reply
psychocod3r

February 27, 2019

What exactly is the difference between UTF-8 and UTF-16? Iâ€™ve never understood this.

Reply
1. Ezra S F
  
  February 27, 2019
  
  UTF-8 uses 8 bits to encode a character where UTF-16 uses 16. 8 bits is a byte. So, UTF-16 files are twice as large. But the character set is larger. If one uses characters outside the normal 8 bit, then one can just use an extra byte to encode it rather than waste disk on useless encoding.
  
  Modern OSes prefer UTF-16.
  
  Reply

Convert Little-endian UTF-16 to ASCII

Share this:

Like this:

3 responses to “Convert Little-endian UTF-16 to ASCII”

Leave a ReplyCancel reply