
Head-mounted devices for virtual and augmented reality come in different shapes and sizes from the minimal Google Glass to the fully immersive HTC vive. At itโs core, head-mounted displays (HMDs) consist of two primary structural elements: optics and image displays.
Optics
Before looking at (pun intended) the fundamentals of optics, it is important to understand the basic properties of the human eye.
Basic properties of the human eye
Field of View (FOV)
It
is defined as the total angular size of the image visible to both the
eyes. On an average, the horizontal binocular FOV is 200 deg out of
which 120 deg is a binocular overlap. The binocular overlap is
especially important for stereopsis and other depth cues discussed
further. The vertical FOV is approximately 130 deg.

Inter-pupillary distance (IPD)
As
the name suggests, it is the distance between the pupils of the eyes
and is an extremely important consideration for binocular viewing
systems. This distance varies from person to person, by gender and
ethnicity. An inaccurate IPD consideration can result in poor eye-lens
alignment, image distortion, strain on the eyes and headache. Mean IPD
for adults is around 63mm with majority in the 50โ75mm range. The
minimum IPD for children is around 40mm.

Eye relief
This
is the distance from the cornea of the eye to the surface of the first
optical element. It defines the distance at which the user can obtain
full viewing angles. This is an important consideration especially for
people who wear corrective lenses or spectacles. Eye relief for
spectacles is approximately 12mm. Enabling users to adjust the eye
relief is extremely important for head mounted displays.
Exit pupil
This is the diameter of light transmitted to the eye by an optical system.
Eye box
This is the volume within which users can place their pupils to experience the visuals wholly.
Ocularity
Ocularity is the measure of the number of eyes needed to see something. Since most of us have a maximum of two eyes, the limit of ocularity in smart displays is likewise.
Monocular display
This
display provides a single channel for viewing often through a small
display element and lens. The channel is positioned in front of one eye
where the user is free to view the real world completely through another
eye. Monocular displays are often used as information displays due to
their small form factor. However, these type of displays provide no
stereo depth cues and often result in very low contrast. Google Glass is
a type of monocular display.
Biocular
This
type provides a single viewing channel to both eyes by means of
internal reflections. Biocular displays lack stereopsis and are suitable
for close proximity tasks.
Binocular
Each
eye gets a separate view in these type of displays creating a
stereoscopic view. These display types provide the most depth cues and a
sense of immersion however, they are the heaviest, most complex and
computationally intensive displays.

Optical architectures
Optics in smart glasses serve three main purposes:
- Collimation of light such that the image appears at a greater distance than itโs physical distance.
- Magnification of the display image to make it appear larger than itโs actual size.
- Relaying of light patterns to the viewers eyes.
Distortion
There are two primary optical design systems, or architectures for AR and VR displays: pupil forming and non-pupil forming.

Non-pupil forming architecture
These
consist of a single lens and are commonly seen in popular immersive
displays such as HTC Vive, Oculus Rift and Sony PSVR. This type of
architecture uses a single magnifier to directly collimate light from
the display panel.
Pupil forming architectures
The
non-pupil forming architectures result in lighter and more compact
designs with a large eye box, however they create a significant
distortion when bending the light field. This effect is known as
pincushion distortion. In pupil forming architectures, another lens that
produces a barrel distortion is used to nullify the effect. These are
often used in non-immersive type displays such as Microsoftโs Hololens
and Google glass.

Waveguides
A waveguide as the name suggests is a physical structure in optics that guides a light wave to the userโs eye. This is done by means of internal reflection and the contraption controls the movement of light between entry and exit. There are four types of waveguides used in industry:
Holographic waveguide
This
is a fairly simple type of wave-guide with optical elements like lenses
used for in-coupling (entry) and out-coupling (exit) through a series
of internal reflections. This type of waveguide is used in Sonyโs Smart
Eyeglass displays.

Diffractive waveguide
Precise
surface relief gratings are used to achieve internal reflections for a
seamless overlay of 3D graphics through the display. These waveguides
are used in a number of Vuzix displays and Microsoftโs Hololens.

Polarized waveguide
Light
enters the waveguide and through a series of internal reflection on a
partially reflective polarized surface. Selected light waves cancel out
(polarization) exiting into the viewerโs eye. The method is used by the
Lumus DK-50 AR glasses.

Reflective waveguide
This
is similar to the holographic waveguide in which a single planar light
guide is used with one or more semi-reflective mirrors. This type of
waveguide can be seen in Epsonโs Moverio as well as Google Glass.

Display technologies
Display types
Fully immersive
These
are standard fully immersive virtual reality displays. These
stereoscopic displays are combined with sensors to track position and
orientation. They completely block the userโs view of the outside world
like in the book โReady Player Oneโ.
Optical see through
In
Optical see through glasses, the user views reality directly through
optical elements such as holographic wave guides and other systems that
enable graphical overlay on the real world. Microsoftโs Hololens, Magic
Leap One and the Google Glass are recent examples of optical see through
smart glasses.
Video see through
With
these type of smart glasses, the user views reality that is first
captured by one or two cameras mounted on the display. These camera
views are then combined with computer generated imagery for the user to
see. The HTC Vive VR headset has an inbuilt camera which is often used
for creating AR experiences on the device.

Imaging technologies
Imaging and display technologies have improved greatly in the past few decades. High end CRTs have been mostly replaced by four key imaging technologies:
Liquid Crystal Displays (LCD)
LCDs
are common in high definition televisions and have been used in ARVR
displays since 1980s. This display type consists of an array of cells
containing liquid crystal molecules sandwiched between two polarizing
sheets. This contraption rests between thin glass substrates printed
with millions of transistors. For colored LCDs, an additional substrate
containing red, green and blue filters is positioned over each cell of
the substrate. A single RGB liquid crystal cell is called a subpixel.
Three subpixels form one pixel.
A electric current is passed through
the glass substrates. Varying the current allows the LCD to modulate the
passage of light to create a precise color. If all subpixels are fully
open, it creates a white light.
Liquid crystal cells do not emit
their own light and require backlighting. The liquid crystal cells can
only vary the passage of light to create the desired color and
subsequently an image.

Organic Light Emitting Diode (OLED)
This
display technology is based on organic (carbon and hydrogen bonded)
materials that emit light when an electric current is applied. This is a
solid-state display technology where energy passed through the organic
sheet is released in the form of light, also known as
electroluminescence. Colors can be controlled by carefully crating
organic emission, however most manufacturers add red, green and blue
films in the OLED stack. There are two types of OLED panels:
- Passive Matrix OLED (PMOLED):
Like CRTs, this display type consists of a complex electronic grid to sequentially control individual pixels in each row. It does not contain storage capacitors making update rates slow and a high power consumption to maintain a pixelโs state. These are mainly used for simple character and iconic displays. - Active Matrix OLED (AMOLED):
Unlike PMOLEDs, AMOLEDs consist of a thin transistor layer that contains a storage capacitor to maintain each subpixelโs state providing greater control over individual pixels. In case on AMOLEDs, individual pixels can be completely switched off enabling deeper blacks and higher contrast. These are most suitable display types for near-eye virtual and augmented reality devices.

OLEDs and AMOLEDs in particular are far superior to LCDs. Their construction is relatively simpler and they can be extremely thin since there is no need of external backlighting. In addition to this they consume significantly less power, have faster refresh rates, high contrast, great color reproduction and higher resolutions. Most fully immersive HMDs utilize this technology.
Digital Light Projector (DLP) Microdisplay
Originally
developed by Texas Instruments, DLP chip is also referred to as a
Digital micromirror device (DMD). The display consists of about 2
million individually controlled micromirrors each which can be used to
represent a single pixel. Each of these micromirrors measure
approximately 5.4 microns. What is interesting about these displays is
that the retina of the eye itself serves as a display surface. RGB light
is reflected on these micromirrors which tilt towards and away from the
light source. As each micromirror can be reoriented in either direction
thousands of times in a second, varying the reflected color can produce
different shades of light on the retina.

DLP Microdisplays are one of the fastest display technologies in existence. Their speed of color refresh, low latency, low power consumption and extreme high resolutions (0.3 inch array diagonal enables a 1280 x 720 image) make them ideal candidates for building head-mounted displays.

Liquid Crystal on Silicon (LCoS) Microdisplay
LCoS
Displays lie somewhere in between LCD and DLP displays. LCD is a
transmissive technology where the image is generated and transmitted to
the user while DLP is a reflective technology where individual subpixels
are reflected through micromirrors. Light source is passed onto a
reflective surface. As the light reflects, it passes through a series of
sub filters that modulate the light intensity and color. Similar to DLP
displays, their small size enables considerable flexibility when
integrating with small form factor devices. Microsoftโs Hololens, Google
Glass and even the Magic Leap One uses an implementation of LCoS
Microdisplays.

Given the extreme resolutions of display technologies in development, it is almost certain that flat panel based HMDs might become a thing of the past for AR devices.
References:
1. Augmented Human โ Helen Papagiannis
2. Practical Augmented Reality โ Steve Aukstakalnis
Related articles:
1. Human eyeโs understanding of space for Augmented Reality
2. A machine and humanโs perception of the world in Augmented Reality
3. Fundamentals of display technologies for Augmented and Virtual Reality
4. Types of AR device
5. Building blocks for Augmented Vision
6. The Future of our Augmented Worlds
7. A computerโs understanding of space for Augmented Reality