You can edit almost every page by Creating an account. Otherwise, see the FAQ.

Line Delimited JSON

From EverybodyWiki Bios & Wiki


Line Delimited JSON is a standard for delimiting JSON in stream protocols (such as TCP).

Introduction[edit]

This is a minimal specification for sending and receiving JSON over a stream protocol, such as TCP.

The Line Delimited JSON framing is so simple that no specification had previously been written for this ‘obvious’ way to do it.

Example output[edit]

(With \r\n line separators)

{"some": "thing"}
{"foo": 17, "bar": false, "quux": true}
{"may": {"include": "nested", "objects": ["and", "arrays"]}}

Motivation[edit]

There is currently no standard for transporting JSON within a stream protocol (primarily plain TCP), apart from WebSockets, which is unnecessarily complex for non-browser applications.

An important use case is processing a large number of JSON objects where the receiver of the data should not have to receive every single byte before it can begin decoding it. The processing time and memory usage of a JSON parser trying to parse a multi-gigabyte (or larger) string is often prohibitive. Thus, a "properly" encoded JSON list of millions of lines is not a practical way to pass and parse data.[1]

There were numerous possibilities for JSON framing, including counted strings and ASCII control characters or non-ASCII characters as delimiters (DLE STX, ETX or WebSocket's 0xFFs).

Scope[edit]

The primary use case for LDJSON is an unending stream of JSON objects, delivered at variable times, over TCP, where each object needs to be processed as it arrives. e.g. a stream of stock quotes or chat messages.

Philosophy / requirements[edit]

The specification must be:

  • trivial to implement in multiple popular programming languages
  • flexible enough to handle arbitrary whitespace (pretty-printed JSON)
  • not contain non-printable characters
  • netcat/telnet friendly

Functional specification[edit]

Software that supports Line Delimited JSON[edit]

PostgreSQL[edit]

As of version 9.2 PostgreSQL has a function called row_to_json.[2] In addition PostgreSQL supports JSON as a field type, so this may output nested components in much the same way as MongoDB and other NoSQL databases.

 vine@ubuntu:~$ echo 'SELECT row_to_json(article) FROM article;' | sudo -u postgres psql—tuples-only
  {"article_id":1,"article_name":"ding","article_desc":"bellsound","date_added":null}
  {"article_id":2,"article_name":"dong","article_desc":"bellcountersound","date_added":null}
 vine@ubuntu:~$

Apache[edit]

Apache logs can be formatted as JSON lines by setting the LogFormat variable. For example, here is how to write logs for consumption by Logstash and Kibana: "Getting Apache to output JSON (for logstash 1.2.x)".

NGINX[edit]

NGIИX logs can likewise be formatted as JSON lines by setting the log_format variable, such as in this example: "Logging to Logstash JSON Format in Nginx".

jline[edit]

An example [1] of command-line tools for manipulating JSON lines in much the same way that grep, sort and other Unix tools manipulate CSV.

jq[edit]

sed for JSON, implemented in C and compiled to a standalone binary. [2]

pigshell[edit]

This is a shell-in-a-browser that has pipelines made up from objects [3].

Sending[edit]

Each JSON object must be written to the stream followed by the carriage return and newline characters 0x0D0A. The JSON objects may contain newlines, carriage returns and any other permitted whitespace. See http://www.json.org/ for the full specification.

All serialized data must use the UTF-8 encoding.

Receiving[edit]

The receiver should handle pretty-printed (multi-line) JSON.

The receiver must accept all common line endings: ‘0x0A’ (Unix), ‘0x0D’ (Mac), ‘0x0D0A’ (Windows).

Trivial implementation[edit]

A simple implementation is to accumulate received lines. Every time a line ending is encountered, an attempt must be made to parse the accumulated lines into a JSON object.

If the parsing of the accumulated lines is successful, the accumulated lines must be discarded and the parsed object given to the application code.

If the amount of unparsed, accumulated characters exceeds 16 MiB the receiver may close the stream. Resource constrained devices may close the stream at a lower threshold, though they must accept at least 1 KiB.

Implementations[edit]

MIME type and file extensions[edit]

When using HTTP/email the MIME type for Line Delimited JSON should be application/x-ldjson.

When saved in a file, the file extension should be .ldjson or .ldj

Many parsers handle Line Delimited JSON,[3] and standard content-type for "streaming JSON" suggests application/json; boundary=NL for the MIME type

See also[edit]

Notes and references[edit]

  1. "JSON.parse() on a large array of objects is using way more memory than it should". Retrieved 31 May 2015.
  2. "row_to_json". Retrieved 6 October 2014.
  3. trephine.org. "Newline Delimited JSON". trephine.org. Retrieved 2 July 2013.

External links[edit]


This article "Line Delimited JSON" is from Wikipedia. The list of its authors can be seen in its historical. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.