2023-07-07
research

Breaking Down the Syslog Protocol

The Syslog protocol is used for the transmission of event notification messages across networks as initially described in RFC 3164. It has since been deprecated in favor of the newer version defined in RFC 5424.

While working with several systems in the past, I’ve seen several systems refer to an event format as Syslog-compatible but fail when parsed using some tools like Vector’s parse_syslog.

This led me to this unnecessary exploration of the Syslog protocol through its white papers.

The networking-side of Syslog

Syslog generally sends over TCP connections. This means we’re operating at a much simpler compared to say, a web socket that transmits over HTTP/S. This means we have much less to work with compared to web sockets.

  • 514 UDP
  • 6514 TCP - TLS-enabled

RFC 5424

The focus of this post will focus primarily on this RFC as the previous one has been deprecated so theoretically this should be the more supported version.

The most important part of the RFC is section “6. Syslog Message Format” as it thoroughly explains the required format for compatibility. I’ve been googling for what a syslog formatted message would look like but it varies from article to article.

The ABNF (Augmented Backus-Naur Form) definition

The RFC describes the format using the Augmented Backus-Naur Form which is, based on cursory searching, a way to create a specification for communication.

Going through the ABNF would be a waste of everyone’s time so here’s a short snippet showing the highest-level definition.

SYSLOG-MSG      = HEADER SP STRUCTURED-DATA [SP MSG]

HEADER          = PRI VERSION SP TIMESTAMP SP HOSTNAME
				SP APP-NAME SP PROCID SP MSGID
...
SP              = %d32

SYSLOG-MSG = HEADER SP STRUCTURED-DATA [SP MSG]

The syslog message is defined as the HEADER element, then the SP element, then the STRUCTURED-DATA element and an optional SP and MSG after that. Note that the SP element is defined to be ASCII %d32 at the end of the notation and translates to SPACE.

That means that we can have a valid syslog message with just the HEADER and STRUCTURED-DATA elements as shown below.

<123>1 2023-01-01T12:02:01Z - - - - -

The HEADER element

HEADER          = PRI VERSION SP TIMESTAMP SP HOSTNAME
				SP APP-NAME SP PROCID SP MSGID
PRI             = "<" PRIVAL ">"
PRIVAL          = 1*3DIGIT ; range 0 .. 191
VERSION         = NONZERO-DIGIT 0*2DIGIT
HOSTNAME        = NILVALUE / 1*255PRINTUSASCII

APP-NAME        = NILVALUE / 1*48PRINTUSASCII
PROCID          = NILVALUE / 1*128PRINTUSASCII
MSGID           = NILVALUE / 1*32PRINTUSASCII

TIMESTAMP       = NILVALUE / FULL-DATE "T" FULL-TIME
FULL-DATE       = DATE-FULLYEAR "-" DATE-MONTH "-" DATE-MDAY
DATE-FULLYEAR   = 4DIGIT
DATE-MONTH      = 2DIGIT  ; 01-12
DATE-MDAY       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
						; month/year
FULL-TIME       = PARTIAL-TIME TIME-OFFSET
PARTIAL-TIME    = TIME-HOUR ":" TIME-MINUTE ":" TIME-SECOND
				[TIME-SECFRAC]
TIME-HOUR       = 2DIGIT  ; 00-23
TIME-MINUTE     = 2DIGIT  ; 00-59
TIME-SECOND     = 2DIGIT  ; 00-59
TIME-SECFRAC    = "." 1*6DIGIT
TIME-OFFSET     = "Z" / TIME-NUMOFFSET
TIME-NUMOFFSET  = ("+" / "-") TIME-HOUR ":" TIME-MINUTE

...

NILVALUE        = "-"
NONZERO-DIGIT   = %d49-57

The header element has the priority, version, timestamp, hostname, app-name, process ID, and message ID in it.

PRI = "<" PRIVAL ">" PRIVAL = 1*3DIGIT ; range 0 .. 191 VERSION = NONZERO-DIGIT 0*2DIGIT

The PRI element is defined to be an open bracket, any number from 0 to 191 then a close bracket and is a required element of the message. The VERSION element, on the other hand, is a NONZERO-DIGIT element followed by “zero to two” DIGIT elements where NONZERO-DIGIT is ASCII d49-d57 or “1-9” and a DIGIT is a NONZERO-DIGIT or ASCII d48 or “0”.

HOSTNAME = NILVALUE / 1*255PRINTUSASCII

Since the succeeding elements are somewhat similar, let’s just refer to the HOSTNAME elements for explaining this. The element is either a NILVALUE (the dash symbol “-“) or “1 to 255” instances of the PRINTUSASCII element or ASCII %d33-126. The same pattern applies for APP-NAME, PROCID, and MSGID.

The timestamp section is huge and simply put it is the syntax below with optional fractional seconds and mandatory timezone or offset specification where Z indicates that it’s GMT.

YYYY-MM-DDTHH-MM-SS.000000Z

The STRUCTURED-DATA element

STRUCTURED-DATA = NILVALUE / 1*SD-ELEMENT
SD-ELEMENT      = "[" SD-ID *(SP SD-PARAM) "]"
SD-PARAM        = PARAM-NAME "=" %d34 PARAM-VALUE %d34
SD-ID           = SD-NAME
PARAM-NAME      = SD-NAME
PARAM-VALUE     = UTF-8-STRING ; characters '"', '\' and
							 ; ']' MUST be escaped.
SD-NAME         = 1*32PRINTUSASCII
				; except '=', SP, ']', %d34 (")

Firstly, the STRUCTURED-DATA element is required but can be replaced with a NILVALUE or “-“. Otherwise it can be “one or more” SD-ELEMENT elements.

SD-ELEMENT = "[" SD-ID *(SP SD-PARAM) "]" SD-PARAM = PARAM-NAME “=” %d34 PARAM-VALUE %d34`

The SD-ELEMENT element is an open bracket, SD-ID element, and any number of SD-PARAM separated by an SP element. This would look something like the snippet shown below which is a single element and then any number of key=value pairs with some limitations on the characters that can be used for the different elements.

[structured_data_name key="value"]

The MSG element

MSG             = MSG-ANY / MSG-UTF8
MSG-ANY         = *OCTET ; not starting with BOM
MSG-UTF8        = BOM UTF-8-STRING
BOM             = %xEF.BB.BF
UTF-8-STRING    = *OCTET ; UTF-8 string as specified
				; in RFC 3629

The MSG element can be any number of OCTET characters or, when the byte-order mark (BOM) %xEF.BB.BF is specified, a UTF-8 string.

Conclusion

From all this reading, we can confidently specify the minimum Syslog-compliant message content. The data section is technically optional but a syslog message without data would not be as useful but that may depend on the use case.

<123>1 2023-01-01T12:02:01Z - - - - - DATA