Compiled UIX v4
Compiled UIX (often shortened to "UIB" for "UI Binary") is one of two formats used by Iris 4 for defining user interfaces. UIB is generated by passing one or more UIX XML files to the UIX compiler, which pre-processes the UI definition into custom bytecode alongside a slew of data tables. Although UIB bytecode instructions are much higher level, the interpreter is conceptually similar to emulating a CPU in that they are relatively simple instructions which are interpreted in a fetch + decode + execute loop.
Note that the format used by Iris 4 is significantly different than Iris 3's UIB, which can be identified with the file magic "UIX2008"
.
Nearly all of the markup-related code is in the Microsoft.Iris.Markup
namespace, including compiled UIX. Most of the functions responsible for loading compiled UIX files is in Microsoft.Iris.Markup.CompiledMarkupLoader
, with some key functionality being delegated to other classes. Most notably, the bytecode interpreter is implemented in Microsoft.Iris.Markup.Interpreter
.
Data format
Being a binary data format, UIB is capable of storing several primitive data types.
All integers are stored in little endian, as is standard for Windows.
Offsets
All offsets are stored as unsigned 32-bit integers (UInt32
), relative to the start of the file unless otherwise specified.
Offset ranges are typically specified with an inclusive start offset and exclusive end offset: \([\mathrm{Start}, \mathrm{End})\).
Strings
Strings are stored length-prefixed with a UInt16
preamble, followed by encoded characters. If the preamble is 0xFFFF
, the string is null. Otherwise, the most significant bit indicates whether the characters are UTF-8 encoded, and the remaining 15 bits are the number of characters (not bytes) in the string.
Length value (binary) | Meaning |
---|---|
1111 1111 1111 1111 |
Null string |
1xxx xxxx xxxx xxxx |
UTF-8 character encoding |
0xxx xxxx xxxx xxxx |
UTF-16 character encoding |
String references
Most strings are not stored inline, but instead as Int32
indexes into the Strings section of the Binary Data Table. This allows for common strings to be deduplicated, which can reduce file size.
String arrays
String arrays are lists of string references. This means that they are effectively Int32
arrays, where each item is an index into the Strings list in the Binary Data Table.
Integer arrays
Arrays of 32-bit integers are also stored prefixed with their length, where, similarly to strings, a 'negative length' is interpreted as a null array. Otherwise, each integer is stored one after the other. Both signed and unsigned integers (UInt32
and Int32
) can be stored with in this format, but the Iris library always reads the values as unsigned, requiring callers to cast the value to Int32
for signed integers.
Booleans
Boolean values are stored as a single byte, where 0
represents false
and 1
represents true
.
Enums
Enums are usually stored as 32-bit integers. Naturally, the meaning of a particular integer value depends which enum it is intended to be.
MarkupType
Name | Value |
---|---|
None |
0x00000000 |
UI |
0x01000000 |
Class |
0x02000000 |
Effect |
0x03000000 |
DataType |
0x04000000 |
DataQuery |
0x05000000 |
File structure
A custom binary format is used to store all compiler output. This format is divided into several sections, and can even be split across multiple files using shared Data Tables.
Header
The first four bytes are always 0x5549421A
, which spell out "UIB␚"
in ASCII. The next four bytes contain some representation of the UIB version, although the exact format is unknown. All known Iris 4 releases, including 4.0 and 4.8 Beta, use 1012
(0x3F4
).
Table of Contents
The Table of Contents begins at offset 0x0008
, with two offsets specifying the start and end of the Object Section. Locations 0x0010
and 0x0014
contain the start and end of the Line Number Table, respectively.
The last item stored in the Table of Contents is a reference to the Binary Data Table. This is composed of two fields, of which only one can be set at at time. The value at offset 0x0018
is the start of a string. If the string is not null, then it is used as the resource URI to load a shared Data Table from. If it is null, then the UInt32
at location 0x001A
is the offset to the Data Table embedded within the current file.
Dependencies
The Dependencies section is a list of UIX files to include, encoded as the unsigned 16-bit count followed by a series of entries. Each include is composed of a flag that stores whether the referenced file is UIX XML, and the reference's compiler name string. This name is almost always the URI the file was loaded from.
As an example, a Dependencies section with two includes might be stored as shown below. Note that all offsets are relative to the first byte of the dependency count.
Start offset | Value | Meaning |
---|---|---|
0x00 |
0x02000000 |
The list contains 2 includes |
0x04 |
0x00000000 |
dependencies[0] is compiled UIB |
0x05 |
0x05000000 |
The URI of the 1st dependency is the 6th string in the Data Table |
0x09 |
0x01 |
dependencies[1] is UIX XML |
0x10 |
0x02000000 |
The URI of the 2nd dependency is the 3rd string in the Data Table |
Type Export Declarations
The Type Export Declarations are composed of two tables: the Export Table and Alias Table.
The Export Table is a length-prefixed (UInt16
) list of exports, where each export is a type defined with a reference to the local name and the markup type.
The Alias Table allows a UIB file to export imported types under a different name. Each entry is exactly 10 (0x0A
) bytes long and is composed of the desired alias, the dependency to load it from, and the name of the target type. Both the alias and target type name are stored a string references. The dependency is always referred to using an index, either into the Type Import Table of the Shared Binary Data Table if one is specified, or the file's dependencies.
Offset into entry | Meaning |
---|---|
0x00 |
String reference to the desired Alias |
0x04 |
UInt16 index into the Dependencies list |
0x06 |
String reference to the target type name |
Binary Data Table
The Binary Data Table consists of several subtables, with each one containing a different types of constant data. These subtables are stored in the following order:
1. Strings
1. ???
Strings table
The Strings table is effectively a list of strings, though unlike the primitive string[]
, it is actually stored as char[][]
.
The first four bytes of the Strings table contain the length of the list as a 32-bit integer. Although this value cannot be negative, UIX.dll
ultimately uses this as an Int32
to allocate memory, so theoretically a maximum of 0x7FFFFFFF
or 2,147,483,647 strings can be stored in a single UIB file.
Following the string count is a series of offsets relative to the first entry in the offset table (the byte immediately after the string count bytes). This is used similarly to a jump table, where the first chunk of the table is an array of fixed-size offsets into the second chunk. When UIB file is being read from fixed memory, this allows Iris to jump directly to the requested string using its index without having to read the entire table or every string before it.
As an example, a string table with three entries might be stored as shown below. Note that all offsets are relative to first entry in the jump table.
Start offset | Value | Meaning |
---|---|---|
-0x04 |
0x03000000 |
The table contains 3 strings |
0x00 |
0x0C000000 |
strings[0] is located at offset 0x0C |
0x04 |
0x1D000000 |
strings[1] is located at offset 0x1D |
0x08 |
0x23000000 |
strings[2] is located at offset 0x23 |
0x0C |
0x08000000 |
strings[0] is 8 UTF-16 characters |
0x0D |
"Γεια σας" |
strings[0] character data |
0x1D |
0x05800000 |
strings[1] is 5 UTF-8 characters |
0x1E |
"Howdy" |
strings[1] character data |
0x23 |
0x08800000 |
strings[2] is 8 UTF-8 characters |
0x24 |
"MOREtext" |
strings[2] character data |
Constants Table
Type Import Table
Source Markup Import Tables
Line Number Table
Object Section
Export Table
Load passes
[Work in progress]
Compiled UIX is loaded in three main passes, listed in order of execution below. "Depersist" usually refers to reading and processing encoded information, such as type exports.
- Declare types
- Depersist Table of Contents
- Depersist Binary Data Table
- Depersist Dependencies
- Depersist Type Export Declarations
- Populate public model
- Depersist Type Import Table
- Depersist Type Export Definition
- Full
- Depersist Data Mappings Table
- Depersist Constants Table
- Depersist Line Number Table
- Depersist Object Section
TODO
Iris has two separate type systems-- the runtime types, which are your standard .NET types; and the markup type schemas, which of course are .NET types themselves, but are used to wrap runtime types for use by the UIX compiler and interpreter. Mapping from schema to runtime type and constructing instances from strings is easy enough, because those are both tasks the original UIX tooling has to do. Doing the reverse (finding the schema for a given runtime type, or encoding a runtime object into a string that can be parsed later) is much more difficult.