ansi.moe container specification

1. metadata

every file starts with an unsigned 8 bit integer in big-endian format, describing the length of the JSON metadata for the file. the metadata which follows is composed of a messagepack object, containing three top level keys: video_tracks, subtitle_tracks and attachments.

1.1 video tracks

video tracks are described by objects with the following keys.

key description
name (optional) name for the track
color_mode color mode used for encoding the track, either 'True' or 'EightBit'
compression compression used on this track. may be 'None' or 'Zstd'
width width of the video track
height width of the video track
codec_private (optional) codec specific data, as raw bytes. currently unused
encode_time time track was encoded, in seconds since the unix epoch
index the stream index, which is encoded into every packet. does not necessarily correspond to the track's index in subtitle_tracks.

1.2 subtitle tracks

subtitle tracks are described by objects with the following keys.

key description
name (optional) name for the track
encode_time time track was encoded, in seconds since the unix epoch
format format of the subtitle track. may be 'SubRip', 'SubStationAlpha', or 'Unknown'.
codec_private (optional) codec specific data, as raw bytes. if format is 'SubStationAlpha', this will contain the style and info headers.
index the stream index, which is encoded into every packet. does not necessarily correspond to the track's index in subtitle_tracks.

1.3 attachments

attachments are miscellaneous files and other data attached to the file, but that are not encoded into the packet format. the field is an array of an Attachment enum.

enum variant data comment
Binary byte array catch-all attachment type for misc. binary data.
Midi byte array a MIDI track; as the bytes of a standard MIDI file.

2. the packet format

the contents of the file are composed of packets of data, which contain the data of e.g a video frame. they are written in order to the container file. they are encoded as follows:

field data type
packet length 64bit unsigned int
compression marker 8bit unsigned int
uncompressed size (only present if data is compressed) 64bit unsigned int
stream index 32bit unsigned int
adler32 checksum 32bit unsigned int
presentation timestamp (ns) 64bit unsigned int
duration (ns) 64bit unsigned int

2.1 video packets

the duration of the packet is undefined; the presentation timestamp defines when the frame should be displayed to the user relative to the start of the video.

video frames are encoded utf8 strings containing a series of half-block unicode characters, where the upper half is colored using a foreground-setting ANSI escape code and the lower half is colored using a background-setting one, displaying two 'pixels' with one character.

2.2 subtitle packets

2.2.1 SRT packets

if the packet belongs to a SubRip subtitle track, the data will be the utf-8 encoded text of the subtitle. the start of the subtitle is used to set the presentation timestamp, and the end of the subtitle is used to set the subtitle duration.

2.2.2 SSA/ASS packets

if the packet belongs to a SubStationAlpha subtitle track, the data will be a utf-8 encoded SSA entry, stored as in the Matroska container: in the format 'ReadOrder, Layer, Style, Name, MarginL, MarginR, MarginV, Effect, Text'.

the start of the subtitle is used to set the presentation timestamp, and the end of the subtitle is used to set the packet duration.