Unicode Normalization
Files you’re reading may be any character set and this can cause strange things when you modify or pass the data on, as an example at stack exchange shows. This isn’t a problem with windows event logs, but windows applications use several different types of charsets.
Best practice is to convert everything to UTF-8. This is especially true when invoking modules such as json, that don’t handle other codes well.
NXLog has the ability to convert and can even to this automatically. However, there is some room for error. If you can, identity what the encoding is by looking at it in a hex editor and comparing to MS’s identification chart.
Here’s an snippet of a manual conversion of a powershell generated log. Having looked at the first part and identified it as UTF-16LE
...
<Extension charconv>
Module xm_charconv
AutodetectCharsets utf-8, utf-16, utf-32, iso8859-2, ucs-2le
</Extension>
<Input in1>
Module im_file
File "E:/Imports/log.txt"
Exec $raw_event = convert($raw_event,"UTF-16LE","UTF-8");
</Input>
...
Notice however that the charconv
module has an automatic directive. You can use that as long as what you have is included as marked in bold here.
<Extension charconv>
Module xm_charconv
AutodetectCharsets utf-8, utf-16, utf-16le, utf-32, iso8859-2
</Extension>
<Input sql-ERlogs>
Module im_file
File 'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL\MSSQL\Log\ER*'
ReadFromLast TRUE
Exec convert_fields("AUTO", "utf-8");
</Input>
If you’re curious what charsets are supported, you can type this command in any unix system to see the names.
iconv -i
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.