In Windows 8, we set out to modernize our input platform. We wanted to make sure that developing for it became more straightforward, but also to build a foundation that can grow and support new input modalities as Windows and apps evolve.
To do this, we went back to basics and focused on core principles that guide our design. In this blog post I share with you why these principles matter, and how you can take advantage of the platform to build great apps in Windows 8.
Platform principles
Windows has always been a reflection of the apps built on top of it. It is through the world of apps that many users have experienced Windows, and in Windows 8, that is even more so. In Windows 8, a large majority of the OS functionality is delivered through the app experience. As such, apps need to be able to interact with the user much more predictably and inspire user confidence. Similarly, the development process and the platform need to be consistent and predictable. We blogged about the hardware efforts in Experiencing Windows 8 Touch on Windows 7 Hardware and Touch Hardware and Windows 8, and we discussed some aspects of your feedback. We know that a successful platform has to be easy to develop for, be consistent and inspire confidence, and have broad reach – allowing you to truly unleash your creativity. We started with these principles:
Broad reach
You want to be able to easily target as many devices as possible. From an input platform perspective, that means we need to support a broad range of input types (mouse, touch, pen, touchpads) and form factors (slate, all-in-one, desktop, laptops, convertibles). Windows is such a widespread and successful platform in part because it supports so many form factors and devices. In turn, the platform must make it easy to target them.
Consistency and confidence
We want you to have a consistent and confident experience. As an app developer, you shouldn’t have to teach your users new interactions or input paradigms. You should be able to leverage a consistent Windows-wide experience, confident that your users already know how to interact with your app. This increases user confidence in the apps and ecosystem, and it makes your life easier.
Ease of development
No platform will be successful if it is overly complicated, inconsistent, or otherwise difficult to develop for. We made ease of development one of our guiding principles.
Introduction to the input platform
The input platform is built in layers. At the bottom are the Windows Runtime Input APIs – providing the most power and flexibility. Built on top of those are the gesture and pointer events in the HTML and XAML frameworks, which provide common gesture and raw data events for apps. Finally, the app templates and controls provide basic functionality that that you can use in a variety of situations.
.
Figure 1: The input platform. The layers at the top focus on mainline scenarios, and lower layers progressively add flexibility and power.
Most of the time, you can simply use the app templates and our HTML and XAML controls, such as ListView or SemanticZoom. That gives developers basic functionality, plus bonus goodness like supporting common interaction patterns (e.g. the touch language), support for touch targeting, accessibility support, tooling integration, and more.
In particular, one of the most common scenarios for touch is panning and zooming, and the scroll view controls for HTML and XAML in Windows 8 enable that, along with behaviors like inertia, bounce at content boundaries, and snap points that you can control. These behaviors also give you “stick to the finger” performance built on an underlying component called DirectManipulation. For example, the Start menu in Windows 8 was built on this support.
Let’s say you’re building a game, a custom control, are doing custom visualizations, are creating gestures for 3D manipulation, or are doing something that requires raw data or builds on top of the Windows 8 gestures, you’ll want to start with the pointer and gesture framework events (HTML and XAML). Gestures include simple interactions like tapping and more complicated ones like zooming, panning, and rotating at the same time. The pointer APIs, which we’ll get to in more detail shortly, enable a streamlined way to get mouse, touch, and pen data in Windows 8. Access to these events makes it easy to use the Windows 8 interaction language in your app.
Finally, the Windows Runtime input APIs are at the bottom layer of the stack. These APIs (GestureRecognizer, PointerPoint, PointerDevice) provide complete flexibility and control, letting you have full access to the raw input data and its associated properties, the full set of gestures and gesture configuration, pointer device APIs, and more. The WinRT API surface for input is a superset of what’s provided at all other layers, adding richness and power. Unleash your creativity!
The concepts behind the input platform are shared across all Metro app frameworks, and the API surfaces are very similar. That’s because we wanted to make it easy to transfer knowledge between frameworks. After writing a Metro style HTML app, you will just as easily be able to handle input in a Metro style XAML app or build directly on top of CoreWindow.
In the next sections, we focus on the bottom two layers of the input platform and introduce some of the concepts underlying the platform and why it was important to make a change.
Pointer – unifying input
Mouse is a simple input device, especially when compared to touch. It has a position and some button state. Handling touch naturally, on the other hand, starts adding complexity that doesn’t exist for mouse. Users can use multiple fingers at the same time, and you need some way to differentiate their streams of input. Mouse has a hover state, but most touch devices on the market today don’t support hover. Touch also has interesting properties for apps, like the contact geometry of a touch. A finger is bigger and less precise than the mouse. As we looked at mouse, touch, and pen, it was clear the existing APIs and approaches were insufficient.
We quickly realized that it made sense to think of input in buckets – there was “pointing input” like mouse, touch, and pen, “text input”, like keyboards and handwriting or speech recognition, and so forth. We considered other interesting input devices like Surface and Kinect, and we started seeing similarities between these devices and modalities. We took the approach of unifying the pointing inputs into a coherent API surface we call pointer. That became a paradigm shift for us, creating a coherent, simpler story for mouse, touch, and pen, and aligning with our principles for the platform.
Take for example a very basic painting app – it wants to handle down, move, and up events from various inputs. If you want it to respond to touch, pen, and mouse, a naïve platform might force you to write 9 separate and redundant event handlers. Here’s what you might start with in this naïve platform:
// BAD! Don’t do this in your code.
class NaivePlatform
{
Dictionary strokes = new Dictionary ()
Canvas PaintArea = new Canvas()
void OnMouseDown(MouseEventArgs e) { CommonDownHandler(e.Id, e.Position) }
void OnMouseMove(MouseEventArgs e) { CommonMoveHandler(e.Id, e.Position) }
void OnMouseUp(MouseEventArgs e) { CommonUpHandler(e.Id, e.Position) }
void OnPenDown(PenEventArgs e) { CommonDownHandler(e.Id, e.Position) }
void OnPenMove(PenEventArgs e) { CommonMoveHandler(e.Id, e.Position) }
void OnPenUp(PenEventArgs e) { CommonUpHandler(e.Id, e.Position) }
void OnTouchDown(TouchEventArgs e) { CommonDownHandler(e.Id, e.Position) }
void OnTouchMove(TouchEventArgs e) { CommonMoveHandler(e.Id, e.Position) }
void OnTouchUp(TouchEventArgs e) { CommonUpHandler(e.Id, e.Position) }
void CommonDownHandler(uint pointerId, Point position)
{
// Create a new stroke for this pointer and set its basic properties
var stroke = new Polyline()
stroke.Points.Add(position)
stroke.Stroke = new SolidColorBrush(Colors.Red)
stroke.StrokeThickness = 3
// Add the stroke to dictionary so we can update it in CommonMoveHandler
strokes[pointerId] = stroke
// Add the stroke to the canvas so that it is rendered
PaintArea.Children.Add(stroke)
}
void CommonMoveHandler(uint pointerId, Point position)
{
try
{
// Update the stroke associated to this pointer with the new point
strokes[pointerId].Points.Add(position)
}
catch (KeyNotFoundException)
{
// this pointer is not painting - ignore it
}
}
void CommonUpHandler(uint pointerId, Point position)
{
// This stroke is completed, so remove it from the dictionary.
// It will still be rendered because we are not removing it from the canvas.
strokes.Remove(pointerId)
}
}
Obviously this kind of code is inelegant and prone to copy and paste errors. Note too that this is an extremely simplified example. A real painting app would likely want to handle cancel events and roll back the stroke. If you wanted to use pen pressure or touch contact geometry to affect the ink, it becomes much harder to use common event handlers. That might lead you to create some abstraction above the raw input data – and you might find yourself building something like pointer.
In the pointer version of the painting app, you instead have a simple set of down, move, and up handlers:
// GOOD! Do this instead of the previous code.
void OnPointerPressed(object sender, Windows.UI.Xaml.Input.PointerRoutedEventArgs e)
{
// Retrieve current point, in the coordinate system of the painting area
var currentPoint = e.GetCurrentPoint(PaintArea)
// Create new stroke for this pointer and set its basic properties
var stroke = new Windows.UI.Xaml.Shapes.Polyline()
stroke.Points.Add(currentPoint.Position)
stroke.Stroke = new Windows.UI.Xaml.Media.SolidColorBrush(Windows.UI.Colors.Red)
stroke.StrokeThickness = 3
// Add the stroke to dictionary so we can update it in PointerMoved event handler
strokes[currentPoint.PointerId] = stroke
// Add the stroke to the painting area so that it is rendered
PaintArea.Children.Add(stroke)
}
void OnPointerMoved(object sender, Windows.UI.Xaml.Input.PointerRoutedEventArgs e)
{
// Retrieve current point, in the coordinate system of the painting area
var currentPoint = e.GetCurrentPoint(PaintArea)
try
{
// Update the stroke associated to this pointer with the new point
strokes[currentPoint.PointerId].Points.Add(currentPoint.Position)
}
catch (System.Collections.Generic.KeyNotFoundException)
{
// this pointer is not painting - ignore it
}
}
void OnPointerReleased(object sender, Windows.UI.Xaml.Input.PointerRoutedEventArgs e)
{
// Retrieve current point, in the coordinate system of the painting area
var currentPoint = e.GetCurrentPoint(PaintArea)
// This stroke is completed, remove it from the dictionary.
// It will still be rendered because we are not removing it from PaintArea.
strokes.Remove(currentPoint.PointerId)
}
Pointer events representing touch, pen, and mouse input have basic common properties like the position of the input, the type, and an associated id. Each modality may have unique data that needs to be tied to its pointer events. For example, pen might carry pressure or tilt info, mouse typically carries more button state, and a device like Surface can embed tag or vision data. We added the ability for these unique data to be exposed along with pointer. If you have pointer data from touch input, it contains basic properties like position and a pointer id, plus touch-specific data like the contact rectangle for the touch.
This combination starts to open up a more consistent set of APIs that lets the majority of pointing input code remain generalized and simple, yet still allowing input-differentiated experiences. One key example here is a note-taking app. Pen might ink, touch might pan and zoom, and mouse might have a traditional selection model. Each modality has its uniqueness represented, but the code is ultimately simple and straightforward.
Considering the web for a moment, pointer is exposed both for Metro style HTML apps, and for the browser (Metro style and desktop). These environments continue to support W3C standards like mouse events of course, but those offer limited functionality compared to pointer. Pointer is cleanly extendable to future input types, and sidesteps the bloat caused by adding waves of new objects for each new input (e.g. TouchEvent, TouchList, and Touch). We’re very excited about what pointer has to offer here.
One of the principles driving work across Win8 was fast and fluid. If the input stack isn’t fast and fluid, nothing else can be. We worked hard to remove any buffering or delays from touch input processing, and we made large investments in the performance of the touch-related stacks across the board. These efforts yielded fast, lag-free input in Windows 8. On Windows 8 hardware, the end to end latency (from a finger making contact with the digitizer to the display changing in response) is between 60-70ms. The input stack takes 1-2ms of that time!
.
Figure 2: A breakdown of our touch performance investment areas in Windows 8
Pointer, and its interaction with the gesture system, was significant in helping make performance improvements. It naturally aligns with our platform principles, and it makes it easier to handle a variety of inputs yielding automatic reach, reduces lines of code written, and contributes to ease of development.
Gestures – making the touch language part of your app
We’re all familiar with gestures most users don’t think twice about panning in their web browser or tapping on an app to launch it. For us, gestures are the expression of the Windows 8 interaction language. They take user input and map it to natural operations in apps and the system.
Gestures input, not surprisingly, builds on top of pointer. Most apps are gesture consumers – they handle taps and pans and zooms, and do little with the raw pointer data flowing by except to pass it to gesture detection.
Thinking about the layers of the input platform again, every level supports a consistent set of gestures and that set mirrors the Windows 8 interaction language. In the majority of cases, you won’t need to teach users any new concepts to use your app.
.
Figure 3: Windows 8 interaction language. Note how it maps to the supported set of gestures.
In Windows 8, our model is to give all apps pointer input by default and let them choose what set of data to feed to gesture detection, how to configure that gesture detection, and what to do with the output. That flexibility gives you greater power to build exactly the experiences you envision.
Our interaction language focuses on a principle of direct manipulation – content should “stick to your fingers”. Manipulations make this happen. From the platform’s perspective, a gesture is any interaction we recognize and for which we provide a notification. Manipulations are one of those types of gestures, alongside other gestures like press and hold, or tap. Manipulations are combinations of translation, scale, and rotation changes (2D affine transforms for the linear algebra minded). For example in the new Start experience, if you pan, that’s a manipulation under the covers. If you put a second finger down and start zooming, that’s a manipulation too. Not only that, but we can easily express the transition from one finger interaction to two and the transition between panning and zooming (or the combination of the two).
The HTML and XAML frameworks provide gesture events on your behalf, and those will satisfy most cases. If you need more control, say for additional configuration options, use the Windows Runtime GestureRecognizer. To get started, you might configure it like this:
C#
// C#
public GestureManager(Windows.UI.Xaml.Shapes.Rectangle target, Windows.UI.Xaml.UIElement parent)
{
// Configure gesture recognizer
gestureRecognizer = new Windows.UI.Input.GestureRecognizer()
gestureRecognizer.GestureSettings =
Windows.UI.Input.GestureSettings.Hold |
Windows.UI.Input.GestureSettings.ManipulationRotate |
Windows.UI.Input.GestureSettings.ManipulationRotateInertia |
Windows.UI.Input.GestureSettings.ManipulationScale |
Windows.UI.Input.GestureSettings.ManipulationScaleInertia |
Windows.UI.Input.GestureSettings.ManipulationTranslateInertia |
Windows.UI.Input.GestureSettings.ManipulationTranslateX |
Windows.UI.Input.GestureSettings.ManipulationTranslateY |
Windows.UI.Input.GestureSettings.RightTap |
Windows.UI.Input.GestureSettings.Tap
// Register event handlers for gestures
gestureRecognizer.ManipulationStarted += OnManipulationStarted
gestureRecognizer.ManipulationUpdated += OnManipulationUpdated
gestureRecognizer.ManipulationInertiaStarting += OnManipulationInertiaStarting
gestureRecognizer.ManipulationCompleted += OnManipulationCompleted
gestureRecognizer.Holding += OnHolding
gestureRecognizer.RightTapped += OnRightTapped
gestureRecognizer.Tapped += OnTapped
}
JavaScript
// JS
function GestureManager(target, parent) {
var gestureRecognizer = new Windows.UI.Input.GestureRecognizer
// Configure GestureRecognizer
gestureRecognizer.gestureSettings =
Windows.UI.Input.GestureSettings.hold |
Windows.UI.Input.GestureSettings.manipulationRotate |
Windows.UI.Input.GestureSettings.manipulationRotateInertia |
Windows.UI.Input.GestureSettings.manipulationScale |
Windows.UI.Input.GestureSettings.manipulationScaleInertia |
Windows.UI.Input.GestureSettings.manipulationTranslateInertia |
Windows.UI.Input.GestureSettings.manipulationTranslateX |
Windows.UI.Input.GestureSettings.manipulationTranslateY |
Windows.UI.Input.GestureSettings.rightTap |
Windows.UI.Input.GestureSettings.tap
// Register event handlers for gestures
gestureRecognizer.addEventListener('manipulationstarted', onManipulationStarted)
gestureRecognizer.addEventListener('manipulationupdated', onManipulationUpdated)
gestureRecognizer.addEventListener('manipulationcompleted', onManipulationCompleted)
gestureRecognizer.addEventListener('manipulationinertiastarting',
onManipulationInertiaStarting)
gestureRecognizer.addEventListener('manipulationinertiacompleted',
onManipulationInertiaCompleted)
gestureRecognizer.addEventListener('holding', onHolding)
gestureRecognizer.addEventListener('tapped', onTapped)
gestureRecognizer.addEventListener('righttapped', onRightTapped)
}
At their core, the interactions apps are most interested in are gestures. For most apps, the flow is to declare what gestures your app looks for, take the pointer data your app sees, run it through gesture detection, and handle those gestures.
In this code snippet, see how all pointer data is fed to gesture detection without needing to check input type or do any type-specific processing:
C#
// C#
void OnPointerPressed(object sender, Windows.UI.Xaml.Input.PointerRoutedEventArgs e)
{
var currentPoint = e.GetCurrentPoint(parent)
// Make target capture the pointer associated to this event
target.CapturePointer(e.Pointer)
// Route the event to the gesture recognizer
gestureRecognizer.ProcessDownEvent(currentPoint)
}
void OnPointerMoved(object sender, Windows.UI.Xaml.Input.PointerRoutedEventArgs e)
{
// Route the event to the gesture recognizer
// We pass all intermediate points that might have been coalesced
// in a single PointerMove event.
gestureRecognizer.ProcessMoveEvents(e.GetIntermediatePoints(parent))
}
void OnPointerReleased(object sender, Windows.UI.Xaml.Input.PointerRoutedEventArgs e)
{
var currentPoint = e.GetCurrentPoint(parent)
// Route the event to the gesture recognizer
gestureRecognizer.ProcessUpEvent(currentPoint)
// Release pointer capture on the pointer associated to this event
target.ReleasePointerCapture(e.Pointer)
}
void OnPointerWheelChanged(object sender, Windows.UI.Xaml.Input.PointerRoutedEventArgs e)
{
var currentPoint = e.GetCurrentPoint(parent)
bool shift = (e.KeyModifiers & Windows.System.VirtualKeyModifiers.Shift) ==
Windows.System.VirtualKeyModifiers.Shift
bool ctrl = (e.KeyModifiers & Windows.System.VirtualKeyModifiers.Control) ==
Windows.System.VirtualKeyModifiers.Control
// Route the event to the gesture recognizer
gestureRecognizer.ProcessMouseWheelEvent(currentPoint, shift, ctrl)
}
JavaScript
// JS
function onPointerDown(evt) {
// Make target capture the pointer associated to this event
target.msSetPointerCapture(evt.pointerId)
// Route the event to the gesture recognizer
gestureRecognizer.processDownEvent(evt.getCurrentPoint(parent))
}
function onPointerMove(evt) {
// Route the event to the gesture recognizer
// We pass all intermediate points that might have been coalesced
// in a single PointerMove event.
gestureRecognizer.processMoveEvents(evt.getIntermediatePoints(parent))
}
function onPointerUp(evt) {
// Route the event to the gesture recognizer
gestureRecognizer.processUpEvent(evt.getCurrentPoint(parent))
}
function onWheel(evt) {
// Route the event to the gesture recognizer
gestureRecognizer.processMouseWheelEvent(evt.getCurrentPoint(parent), evt.shiftKey,
evt.ctrlKey)
}
For a document viewing app, like Word, you’re primarily interested in pans and zooms. Gesture recognition can be configured for those two components of manipulations (ignoring rotation). As pointer data on the main view comes in, the app hands it off to gesture recognition, and gesture recognition returns events indication a manipulation has started, is continuing (possibly transitioning to an inertia state), or has ended.
Coding for broad reach across modalities and form factors
The concept of pointer led us to think about input programming differently. Pointer contributes to ease of development by requiring fewer lines of code, it gets your app into the Store faster, and it enables broader reach by making it easy to target multiple input devices. It also encourages you to take a more simplified view of handling input. One manifestation of that we affectionately call “Code for touch, let the system do the rest”.
Code for touch, or CFT, is a result of our rethinking input and became one of our primary guiding philosophies. Perhaps more accurately phrased as “code for pointer, get touch and default mouse and pen behaviors for free”, CFT lets you write simple code for handling the unified pointer events, and not have to worry about writing redundant handlers for all three major pointing inputs. If you have input-specific behaviors or want to go beyond what the OS provides by default, you can do that too.
Each of the pointer event handlers we showed earlier demonstrates CFT in the real world. All of the gesture sample’s pointer handlers are agnostic to the type of input. They typically take pointer data, set capture, do hit testing or other state management, and then hand the pointer data off to gesture recognition.
C#
// C#
void OnPointerPressed(object sender, Windows.UI.Xaml.Input.PointerRoutedEventArgs e)
{
var currentPoint = e.GetCurrentPoint(parent)
// Make target capture the pointer associated to this event
target.CapturePointer(e.Pointer)
// Route the event to the gesture recognizer
gestureRecognizer.ProcessDownEvent(currentPoint)
}
JavaScript
// JS
function onPointerDown(evt) {
// Make target capture the pointer associated to this event
target.msSetPointerCapture(evt.pointerId)
// Route the event to the gesture recognizer
gestureRecognizer.processDownEvent(evt.getCurrentPoint(parent))
}
CFT isn’t just about reach and ease of development. It also ties directly to consistency and confidence. For example, if you configure gesture detection for tap, that enables taps from all three input modalities. Configuring for SecondaryTap maps “right click-like gestures” from mouse (right click), touch (press and hold), and pen (press and hold or tapping with the barrel button) to a unified event.
Here’s an excerpt from the gesture sample showing the Tap and RightTap gesture handlers:
C#
// C#
void OnTapped(Windows.UI.Input.GestureRecognizer sender, Windows.UI.Input.TappedEventArgs
args)
{
byte[] rgb = new byte[3]
// Randomly change the color of target
randomGenerator.NextBytes(rgb)
target.Fill = new
Windows.UI.Xaml.Media.SolidColorBrush(Windows.UI.ColorHelper.FromArgb(255,
rgb[0], rgb[1], rgb[2]))
}
void OnRightTapped(Windows.UI.Input.GestureRecognizer sender,
Windows.UI.Input.RightTappedEventArgs args)
{
// Restore original values
target.Fill = initialBrush
InitManipulationTransforms()
}
JavaScript
// JS
function onTapped(evt) {
target.style.backgroundColor = getNextColorFromColor(target.style.backgroundColor)
}
function onRightTapped(evt) {
// Reset target to its initial state (transform, color)
target.style.backgroundColor = target.initialColor
target.style.msTransform = target.initialTransform
}
There is no inspection of pointer type in any of these event handers. These fire for the appropriate inputs from the various pointer types without additional work on your end.
Wrapping up
When you are exploring the platform, we want you to easily discover what you need – Input was designed so that the easiest path is also the one that leverages a consistent Windows experience in apps.
The input platform was designed with a principled approach in mind, with the goals of ease of development, consistency and confidence, and enabling extensive reach. Concepts like pointer and CFT save you time and lines of code. Gesture detection makes it easy to use the Windows 8 interaction language in your app and provide consistency and confidence for users. Similar concepts across platforms and frameworks provide consistency for developers. CFT and its expressions in the pointer and gesture APIs help provide effortless reach across devices.
Source: MSDN