Unicode Normalization

Files you’re reading may be any character set and this can cause strange things when you modify or pass the data on, as an example at stack exchange shows. This isn’t a problem with windows event logs, but windows applications use several different types of charsets.

Best practice is to convert everything to UTF-8. This is especially true when invoking modules such as json, that don’t handle other codes well.

NXLog has the ability to convert and can even to this automatically. However, there is some room for error. If you can, identity what the encoding is by looking at it in a hex editor and comparing to MS’s identification chart.

Here’s an snippet of a manual conversion of a powershell generated log. Having looked at the first part and identified it as UTF-16LE

...
<Extension charconv>
    Module xm_charconv
    AutodetectCharsets utf-8, utf-16, utf-32, iso8859-2, ucs-2le
</Extension>

<Input in1>
    Module      im_file
    File "E:/Imports/log.txt"
    Exec  $raw_event = convert($raw_event,"UTF-16LE","UTF-8");
</Input>
...

Notice however that the charconv module has an automatic directive. You can use that as long as what you have is included as marked in bold here.

<Extension charconv>
    Module xm_charconv
    AutodetectCharsets utf-8, utf-16, utf-16le, utf-32, iso8859-2
</Extension>


<Input sql-ERlogs>
    Module      im_file
    File 'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL\MSSQL\Log\ER*'
    ReadFromLast TRUE
    Exec        convert_fields("AUTO", "utf-8");
</Input>

If you’re curious what charsets are supported, you can type this command in any unix system to see the names.

iconv -i

Last modified February 5, 2025: NXLog initial (a242958)