Dr Dobb's Digest April 09

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Dr Dobb's Digest April 09 as PDF for free.

More details

  • Words: 25,788
  • Pages: 39
DIGEST The Art and Business of Software Development Editor’s Note

April, 2009 2

by Jonathan Erickson

Techno-News Fuzzy Logic Reveals Cells Inner Workings

3

A new computational model that reveals novel information about the cells.

Features A Moving Target

4

by Avo Reid A CTO looks at the development issues in choosing a mobile platform

A First Look at Larrabee New Instructions

6

by Michael Abrash LRBni is a very different — and fascinating — extension to the x86 instruction set.

Smartphone Operating Systems: A Developer’s Perspective

17

by Tom Thompson For developers, the battle lines are forming in the smartphone wars.

Three Reasons for Moving to Multicore

27

by Christopher Diggins Performance is at the head of the class.

Bug Labs BUG Update

29

by Mike Riley With the release of the 1.4, the BugLabs folks are still in the modular system programming game in a serious way.

Columns Conversations

32

by Jonathan Erickson Dr. Dobb’s talks with the Symbian Foundation’s Lee Williams.

Book Review

35

by Mike Riley Examining Hello Android: Introducing Google’s Mobile Development Platform.

Effective Concurrency

36

by Herb Sutter What are thread pools for, and how can you use them effectively?

Entire contents Copyright© 2009, Techweb/United Business Media LLC, except where otherwise noted. No portion of this publication may be reproduced, stored, transmitted in any form, including computer retrieval, without written permission from the publisher. All Rights Reserved. Articles express the opinion of the author and are not necessarily the opinion of the publisher. Published by Techweb, United Business Media Limited, 600 Harrison St., San Francisco, CA 94107 USA 415-947-6000.

D r. D o b b ’s D i g e s t

]

[

Editor’s Note

Open Source Meets Mobility

I By Jonathan Erickson, Editor In Chief

t’s no surprise that smartphones are going like gangbusters, or that open source keeps climbing the corporate ladder. What’s surprising, however, is how they’re doing so hand in hand. Consider a recent study by Black Duck Software, which sells application development management software. Upon reviewing 185,000 projects, Black Duck identified 2,304 as open-source targeting mobile platforms, comprising 6,588 releases. Of course, 2,304 isn’t overwhelming compared with the total number of open-source projects. What’s noteworthy, however, is that the amount of mobile source code released grew at 55 percent a year between 2005 and 2008. And the activity goes well beyond mobile Linux. In 2008, open-source projects for the iPhone led the way with 266 releases, even though the iPhone is a closed platform. Android came in second with 191 releases, Windows Mobile with 174, BlackBerry with 96, and Symbian with 64. The real question, though, isn’t “How much?” but rather “So what?” Will the combination of mobility and open source change what companies and their developers can do with mobile devices? The fact is that open source is one force driving cracks into the closed world of network carriers, which are used to controlling everything from the devices we use to the software we run. Today’s consumers — and, increasingly, business users as well — demand devices and carriers that provide more options for third-party apps and personalization. Conventional telecom vendors and device makers are reacting to that, and are beginning to treat smartphones on 3G networks as a mobile computing platform, not a voice service with some data services. This translates to more opportunities for developers, who can create and sell mobile apps via online app stores from Apple, Google, Microsoft, Nokia, and others. Count on that to spur continued growth in opensource mobile development, and for companies that embrace mobile computing to reap the benefits of those innovations

Return to Table of Contents

DR. DOBB’S DIGEST

2

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

[

]

Techno-News

EDITOR-IN-CHIEF Jonathan Erickson

Fuzzy Logic Reveals Cells’ Inner Workings A new computational model that reveals novel information about the cells

L

iving cells are bombarded with messages from the outside world — hormones and other chemicals tell them to grow, migrate, die, or do nothing. Inside the cell, complex signaling networks interpret these cues and make life-and-death decisions. Unraveling these networks is critical to understanding human diseases, especially cancer, and to predicting how cells will react to potential treatments. Using a “fuzzy logic” approach, a team of MIT biological engineers has created a new model that reveals different and novel information about these inner cell workings than traditional computational models. According to Doug Lauffenburger, head of MIT’s Department of Biological Engineering, this is the first time that scientists have used fuzzy logic to model cell biochemistry, and the approach should be applicable to any kind of cell signaling pathway. Developed by Lotfi Zadeh in the 1960s, fuzzy logic can take inexact inputs and produce accurate predictions, based on sets of rules rather than mathematical equations. It has been applied in auto-focusing cameras, automobile cruise control, and home appliances. Fuzzy logic mimics the way humans make everyday decisions — for example, deciding when to eat lunch. The decision depends on what time it is, what is in the refrigerator, how hungry you are, etc. All of this information is integrated to come up with a decision, with no math required. The new MIT model works the same way. Each component of the cell-signaling network (which could be a receptor, enzyme, or transcription factor) has its own set of rules that determine how it responds to a particular stimulus. Adding up all of these stimuli and responses leads to an outcome, such as death, cell division, or migration. In contrast, traditional computational models use physics-based equations to calculate precise DR. DOBB’S DIGEST

values for each interaction. To create such models requires more specific biochemical knowledge and they do not offer the same insights as the fuzzy logic models. While both types of model accurately predict outcomes of a pathway, fuzzy logic models also generate a graphical representation of each step along the way, allowing scientists to visualize what is happening inside the cell. With fuzzy logic models, “you can actually see the drawing and say, ‘Aha, I see what this enzyme is doing,’” said Lauffenburger. The researchers’ model allowed them to discover some previously unknown interactions in a pathway regulating programmed cell death. The pathway, called MK2 (described in their paper Fuz z y L og i c A na l ys is o f K i na se Pa t hw a y “F Crosstalk in T NF /E G F/ Ins ul in - Ind uce d Signaling”), is generally believed to promote cell death and produces cell-to-cell communication factors involved in inflammation-based tissue destruction. However, the model showed that inhibiting MK2 can actually favor cell death, because it indicated that the pathway may also control another signal that is pro-survival. This finding demonstrates that molecular components in the cellular network governing survival-versus-death decisions can promote diverse outcomes, so simple intuition cannot readily predict the effects of possible drug treatments. Without the fuzzy logic model, “you wouldn’t have found that connection and would not be able to properly understand what an anti-MK2 drug might do,” said Lauffenburger. This general modeling approach should be useful in identifying potential new targets for drugs against cancer, inflammatory diseases, and infectious diseases, he said.

EDITORIAL MANAGING EDITOR Deirdre Blake COPY EDITOR Amy Stephens CONTRIBUTING EDITORS Mike Riley, Herb Sutter WEBMASTER Sean Coady AUDIENCE DEVELOPMENT Scott Popowitz SENIOR GROUP DIRECTOR Karen McAleer CIRCULATION DIRECTOR John Skesinski MANAGER NATIONAL SALES DIRECTOR Eric Christopher CLIENT SERVICES MANAGER Gordon Peery SERVICES MARKETING COORDINATOR Laura Robison DR. DOBB’S 600 Harrison Street, 6th Floor, San Francisco, CA, 94107. 415-947-6000. www.ddj.com UBM Limited Kevin Prinz Chief Information Officer Pat Nohilly Senior Vice President, Strategic Development and Business Administration Anne Marie Miller Corporate Senior Vice President, Sales Marie Myers Senior Vice President, Manufacturing Alexandra Raine Senior Vice President, Communications TechWeb Tony L. Uphoff Chief Executive Officer Bob Evans Senior Vice President and Content Director Eric Faurot Senior Vice President, Live Events Network Joseph Braue Senior Vice President, Light Reading Communications Network John Siefert Vice President and Publisher, InformationWeek Business Technology Network Scott Vaughan Vice President, Marketing Services Greg Kerwin Vice President, Global Development John Ecke Vice President, Financial Technology Network Jill Thiry Publishing Director John Dennehy Vice President, Finance Fritz Nelson Executive Producer, TechWeb TV Scott Popowitz Senior Group Director, Audience Development Beth Rivera Vice President, Human Resources

Return to Table of Contents

3

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

A Moving Target A CTO looks at the development issues in choosing a mobile platform

by Avo Reid

A

s mobility moves from the cool phase to being a strategic part of business infrastructure, one of the key questions for business technology leaders becomes what platform to choose.

With more than 70 percent of organizations recently surveyed by IDC deploying multiple mobile applications, operating system features and software development tools become deciding factors. One-fourth of companies are deploying CRM to handhelds, a recent InformationWeek Analytics survey finds, showing it’s a growing business priority. What started as a tool for managing personal information is now seen by more companies as a key computing platform in their overall IT strategy. These organizations believe in the promise of mobile computing to streamline business processes, enable round-the-clock access to enterprise applications, and even increase customer satisfaction. Business technology leaders embracing mobility face a lot of difficult decisions as they drive mobile platform decisions. For a company that uses Lotus Notes, for example, it may make sense to standardize on the BlackBerry, because of its Lotus support. However, if that shuts down POP support in the process, employees won’t be able to retrieve e-mail on their iPhones, because Notes doesn’t support that platform. More complicated are the questions around whether the application development tools are sufficient for companies to develop the internal applications we want, and which tools fit our needs. Many of the benefits companies expect from mobile devices are derived from a blend of the platform and operating system they use as well as the hardware running it. Is there one best mobile device platform? The answer is no, as each platform has its strengths and weaknesses. There are two major steps in charting a mobile development strategy. First is to consider what ways your company plans to use the platform. Second is to assess which platforms best fit those needs. Gartner offers a useful grouping into three usage patterns seen most often: • The “appliance profile” is the simplest and lowest cost, and it fits a company that limits device and application choices, does little application development, and focuses on moderate security capabilities such as passwords and remote device wipe. • The “platform profile” allows development of feature-rich applications and the broad use of thirdparty apps with robust security features, but limits the device choice. • The “concierge profile” allows for custom application development and permits broad device choices. It’s the most costly — Gartner estimates it’s twice the total cost of the appliance approach. For companies developing in-house mobile applications, the platform choice is the most challenging. Each major platform has evolved its development tools into broad application development toolkits focused on single-platform development, not multiplatform work. Since each vendor’s development tools require a different set of skills, deploying the same team that developed a RIM BlackBerry application to work on iPhone applications will require some retooling. Many companies first venture into mobile technology is providing BlackBerry e-mail access. Next, they often look to appplications for sales and other field employees, CRM, and business intelligence, such as DR. DOBB’S DIGEST

4

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t performane dashboards. In short, mobile applications are integrating with corporate systems and other programs on a much more complex level. One challenge involves the goal of limiting the amount of coding our developers have to do. The solution is to architect mobile applications to take advantage of cloud-based computing and Web services for the heavy lifting, and writing code only to provide user interface and feedback. This approach limits the amount of device-specific code that needs to be ported to other devices. Companies also can consider portable technologies such as Adobe’s Flash Lite, which makes it possible for the same application to run on BlackBerry, Windows Mobile, or any other mobile OS that has the Flash Lite environment running in the target OS.

Platform Trade-Offs After settling on these strategic development questions, business technology leaders must settle on which platform best meets those needs. Here’s perspective on several leading choices. We don’t cover Palm in depth, since it’s slipped to 2 percent of the smartphone market, says Gartner, while Symbian has half the world market, and the BlackBerry and iPhone make up about 70 percent of North America. • Nokia’s Symbian OS-based S60 platform has something for everyone — C, C++, Java, Python, WRT widgets, and Flash — but the APIs require some getting used to. Symbian C++ and Open C/C++ (a C programming interface with runtime Posix libraries) programs are packaged as metadata files that must be digitally signed for security checks or the application won’t execute. IT can therefore use security certificates to monitor and control in-house mobile applications. • iPhone uses Objective-C — challenging even for experienced C, C++, and C# programmers. Developers coming from other languages face an even steeper learning curve. The Cocoa Touch programming interface and proprietary XCode integrated development environment (IDE) provide a powerful environment that includes a WYSIWYG interface builder. For Web-based apps, the SDK includes the HTML/JavaScript-based Dashcode framework. Everything in the iPhone runs at root level — and every process executing with root privileges can be a security threat. Additionally, the iPhone permits only one third-party app to run at a time. IPhone apps also must be digitally signed before they can execute. • Android applications are written in Java, but not Java ME. Instead, the Android SDK is a combination of standard Java SE and Java ME methods and classes, as well as nonstandard ones. This means that there’s a learning curve, even for seasoned Java developers. The Android Development Tools plug-in lets developers use Eclipse to write and debug applications. Again, Android apps must be signed or they won’t run. The SDK does provide a developer key, but a private key is required for public distribution. • BlackBerry applications can be developed several ways: a Java-based IDE that provides access to RIM APIs and an Eclipse plug-in; a rapid application development approach that focuses on Web services using Visual Studio or Eclipse plug-ins and supports any .NET or Java language choice; or a Web-based app approach referred to as Browser Development, which lets developers create apps using existing BlackBerry browser software. The downside to writing apps using BlackBerry API extensions is that it ties the application to a particular device. Still, that’s no different than using the Android’s unique Java classes. • Windows Mobile uses the .NET Compact Framework, which makes development relatively straightforward for developers familiar with .NET languages such as C#, Visual Basic .NET, and (for native code) Visual C++. Because the .NET Compact Framework is a subset of the .Net Framework, components from .NET-based desktop clients, application servers, and Web servers are available. The upside is companies that have standardized on Microsoft platforms and developer tools can jump into mobile development. The downside is the the apps run on a single platform — Windows Mobile OS. —Avo Reid is CTO at the Bureau of National Affairs and Mobility Blogger on Dr. Dobb’s CodeTalk. Return to Table of Contents

DR. DOBB’S DIGEST

5

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

A First Look at the Larrabee New Instructions (LRBni) LRBni is a very different — and fascinating — extension to the x86 instruction set

by Michael Abrash

DR. DOBB’S DIGEST

O

ne more grain of sand dropped on top of a pile of sand will usually do nothing more than make the pile a tiny bit larger. Occasionally, though, it will set off an avalanche that radically reshapes the landscape. Observations such as this form the basis of complexity theory, which holds that small events can have unpredictable, and sometimes disproportionately large, effects — the relevance of which will become apparent momentarily. Nearly five years ago, Mike Sartain and I had just put the wraps on our x86 software renderer, Pixomatic. We had done everything we could think of to speed it up, and while it had certainly gotten a lot faster, it was still so much slower than hardware that we knew we could never close the gap. As we were setting up in the RAD Game Tools booth at Game Developers Conference one morning, I said to Mike: “Man, if only Intel had a lerp [linear interpolation] instruction!” Mike pointed across the aisle at the Intel booth. “Maybe you should ask for one.” The odds seemed long, to say the least, but I didn’t have any better ideas, so I went over and talked with Dean Macri, our developer rep. That resulted in a couple of maverick Intel architects, Doug Carmean and Eric Sprangle, coming over to chat with us later; and somehow, over the course of five years, that simple question led to a team at RAD — which grew to include Tom Forsyth and Atman Binstock — working with Intel to help design an instruction set extension and write a software graphics pipeline for it. Which brings us to the present day, when at long last I get to tell you about a fascinating, and very different, extension to the x86 instruction set called Larrabee New Instructions (LRBni) — and if that’s not a perfect example of complexity theory in action, I don’t know what is. The funny thing is, I never did get that lerp instruction! 6

Why Larrabee? To understand what Larrabee is, it helps to understand why Larrabee is. Intel has been making single cores faster for decades by increasing the clock speed, increasing cache size, and using the extra transistors each new process generation provides to boost the work done during each clock. That process certainly hasn’t stopped, and will continue to be an essential feature of main system processors for the foreseeable future, but it’s getting harder. This is partly because most of the low-hanging fruit has already been picked, and partly because processors are starting to run up against power budgets, and both out-of-order instruction execution and higher clock frequency are power-intensive. More recently, Intel has also been applying additional transistors in a different way — by adding more cores. This approach has the great advantage that, given software that can parallelize across many such cores, performance can scale nearly linearly as more and more cores get packed onto chips in the future. Larrabee takes this approach to its logical conclusion, with lots of power-efficient in-order cores clocked at the power/performance sweet spot. Furthermore, these cores are optimized for running not single-threaded scalar code, but rather multiple threads of streaming vector code, with both the threads and the vector units further extending the benefits of parallelization. All this enables Larrabee to get as much work out of each watt and each square millimeter as possible, and to scale well far into the future.

What Is Larrabee? A Quick Overview Larrabee is an architecture, rather than a product, with three distinct aspects — many cores, many threads, and a new vector instruction set — that boost performance. This architecture will first be used in GPUs, and could be used in CPUs as well. April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

Figure 1: A conceptual model of the Larrabee architecture. The actual numbers of cores, texture units, memory controllers, and so on will vary a lot. Also, the structure of the bus and the placement of devices on the ring are more complex than shown.

At the highest level, the architecture consists of many in-order cores, each with its own L1 and L2 cache, all sitting on a coherent interconnect bus — which you can think of as a ring, although in fact the topology is considerably more complicated than that — as in Figure 1. The cores are x86 cores enhanced with vector capability, and the memory system is fully coherent. In short, Larrabee is an enhanced x86 architecture; it supports all the familiar general-purpose programming techniques and tools used on CPUs for decades, and is much like programming a lot of Core i7 cores at once. Because initial configurations are designed for use as GPUs, they lack chipset features needed to serve as a main CPU running, say, Windows; nonetheless, they are fully capable of running operating systems and general applications. For example, Larrabee, running as a GPU device under Windows, can bring up a BSD OS, with the Larrabee graphics pipeline running as just another BSD application. Furthermore, each of those Larrabee cores supports multiple hardware threads per core (currently four, although that may change in the future). This is an important part of getting good performance out of the in-order cores; if one thread misses the cache, the other threads can keep the core busy. Threading also helps work around pipeline latency. In effect, threaded in-order cores shift the burden of extracting parallelization and working around pipeline bubbles from instruction reordering hardware to the programmer and the compiler. Without a doubt, that makes life more challenging for programmers, but the rewards are potentialDR. DOBB’S DIGEST

ly large, thanks to the out-of-order hardware and associated power that can be saved. Besides, if a program can be successfully parallelized across all those Larrabee cores, it shouldn’t in principle be any more difficult to parallelize it across the threads as well. However, while this is true to a considerable extent, in actual practice issues arise because there is only one set of most core resources — most notably caches and TLBs — so the more threads there are performing independent tasks on a core, the more performance can suffer due to cache and TLB pressure. The graphics pipeline code on Larrabee works around this by having all the threads on each core work on the same task, using mostly shared data and code; in general, this is a fertile area for future software architecture research. So Larrabee has lots of cores, each with multiple threads, allowing software to readily take advantage of thread-level parallelism. That’s obviously critical to getting a big performance boost — lots of cores running at high utilization are going to be much faster than even the fastest single core — and mul-

tithreaded programming is an essential, fascinating, and challenging aspect of Larrabee. However, it’s also a relatively familiar challenge from existing multicore systems, albeit taken to a new level with Larrabee, so I’m going to leave further discussion of multicore/multithreaded Larrabee programming for another day. What I’m going to delve into for the rest of this article is the third aspect of Larrabee performance — the 16-wide vector unit, and LRBni, the instruction set extension that supports it. Together, these are designed to let software extract maximum performance from data-level parallelism — that is, vector processing. This is all somewhat abstract, so to make things a little more concrete, let me mention something I know for sure LRBni can do, because I’ve done it: software rendering with GPU-class efficiency, without any fixed-function hardware other than a texture sampler. It should be clear upon a little reflection that with a 16-wide vector unit, you can run a pixel shader on 16 pixels at a time, with the nth element of each vector instruction operating on the nth pixel of a 16-pixel block; Kayvon Fatahalian’s presentation From Shader Code to a Teraflop: How Shader Cores Work discusses how this works in some detail. Somewhat less obvious is that it is possible to use LRBni to implement an efficient software rasterizer, using vector instructions to determine a triangle’s pixel coverage for 16 pixels at a time. Unfortunately, those topics are far too complex to discuss here. However, LRBnibased implementations of both rasterization and shaders will be discussed in detail in future articles.

Figure 2: The vector data types supported by LRBni.

7

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t Larrabee’s Vector Architecture LRBni adds two sorts of registers to the x86 architectural state. There are 32 new 512-bit vector registers, v0-v31, and 8 new 16-bit vector mask registers, k0-k7. While some core resources such as caches are shared by the core threads, that is not the case for registers; each thread has a full complement of vector and vector mask registers. LRBni vector instructions are either 16wide or 8-wide, so a vector register can be operated on by a single LRBni instruction as 16 float32s, 16 int32s, 8 float64s, or 8 int64s, as in Figure 2, with all elements operated on in parallel. LRBni vector instructions are also ternary; that is, they involve three vector registers, of which typically two are inputs and the third the output. This eliminates the need for most move instructions; such instructions are not a significant burden on out-of-order cores, which can schedule them in parallel with other work, but they would slow Larrabee’s in-order pipeline considerably. For the purposes of discussion, I divide LRBni into several broad groups: • • • •

vector arithmetic, logical, and shift; vector mask generation; vector load/store; other instructions, including those that help keep the vector pipeline well fed.

I discuss each of these in turn, referring to Table 1, which lists a broad sample of LRBni instructions. (Table 1 is not a complete listing; some instructions are still evolving, and others would require too much explanation.) Vector instructions start with v, and vector mask instructions start with k. The mnemonic suffixes follow the SSE convention of “px,” where p means “packed” (that is, a vector of 8 or 16 elements), and x refers to the element type: • s for float32 (single-precision, henceforth referred to as simply 9); • i for int32; • u for unsigned int32 (used only in conversions and a few specific instructions); • q for int64, and d for float64 (doubleprecision, henceforth referred to as double).

DR. DOBB’S DIGEST

Load and store instructions, which don’t use p, use one of the following: • d for 32-bit quantities (dwords); • q for 64-bit quantities (qwords). To keep things simple, for the most part I’m going to talk only about float and int32 operations in this article, but LRBni provides support (albeit somewhat less extensive) for double and int64 operations as well. You can find additional information about LRBni, including instruction descriptions and prototyping libraries, at www.intel.com.

Vector Arithmetic, Logical, And Shift Instructions The arithmetic, logical, and shift vector instructions include everything you’d expect: add, subtract, add with carry, subtract with borrow, multiply, round, clamp, max, min, absolute max, logical-or, logicaland, logical-xor, logical-shift and arithmeticshift by a per-element variable number of bits, and conversions among floats, doubles, and signed and unsigned int32s. There are also multiply-add and multiply-sub instructions, which run at the same speed as other vector instructions, thereby doubling Larrabee’s peak flops. Finally, there is hardware support for transcendentals and higher math functions. The arithmetic vector instructions operate in parallel on 16 floats or int32s, or 8 doubles, although this is not fully orthogonal; most float multiply-add instructions have no int32 equivalent, for example. The logical vector instructions operate on 16 int32s or 8 int64s, and the shift vector instructions operate on 16 int32s only. The non-orthogonality of the vector instructions may seem a bit inconvenient, but they make for lower-power hardware, which in turn makes it possible to have more cores — and therefore more processing power. Both the destination and the first source operand for a vector instruction must typically be vector registers (for certain instructions, one of the first two operands must be a vector mask register, as I discuss shortly), but the last source may optionally be a mem8

Table 1: Broad sampling of LRBni instructions.

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

Figure 3: 1-to-16 {1to16} and 4-to-16 {4to16} broadcasts.

ory operand; this feature comes at no performance cost and saves a great many load instructions, reducing code size and freeing up the in-order pipeline to do other work. This is the reason for the existence of the reverse-subtract instructions, and also for the many variants of multiply-add and multiply-subtract, which allow you to choose which of the three operands is added to or subtracted from, although the destination must always be a vector register. Multiplyadd and multiply-sub have three vector operands like other vector instructions, but are special in that they have three sources, so the first operand must serve as both a source and the destination; hence, unlike the other instructions, most multiply-add and multiply-sub instructions have no non-destructive form. (The exception is vmadd233, a special form of multiply-add designed specifically for interpolation, which gets both offset and scale from a single operand and consequently uses only two source

operands.) It’s worth noting that multiplyadd and multiply-sub instructions are fused; that is, no bits of floating-point precision are lost between the multiply and the add or subtract, so they are more accurate than and not exactly equivalent to a multiply instruction followed by a separate add or subtract instruction. But wait, there’s a lot more to vector instructions, which are really more like little clusters of processing functions than traditional scalar or SSE instructions — and all at no extra cost! If there’s a memory operand to a vector instruction, that operand may optionally be broadcast from one or four elements in memory up to 16 vector elements (or 8 for doubles or int64s) prior to the instruction’s operation, as in Figure 3. This is useful for keeping memory and cache footprint down when applying a scalar or a four-element vector across a vector operation. Alternatively, the source memory operand may be converted from one of sever-

Table 2: (a) Load-op broadcasts and up-conversions for int32 operations. (b) Load-op broadcasts and up-conversions for float operations.

DR. DOBB’S DIGEST

9

al compact types (including float16) to float, or from a smaller integer to int32, as listed in Table 2. This is not only useful for keeping footprint down but also removes the need for a separate instruction to perform the conversion. However, a single instance of a load-op instruction can either convert or broadcast, but can’t do both. If there is no memory operand, the last vector register operand may be swizzled in one of seven ways, as in Table 3, including one that supports efficient calculation of four cross-products at once. All load-op broadcasts, conversions, and swizzles are free, occurring during the normal course of vector instruction execution. We’re still not quite done, because every vector instruction can also perform predication. Each vector mask register contains 16 bits, neatly matching the 16 elements in a vector register. Every vector instruction can take a vector mask register as the writemask operand, and if any bit in that vector mask register is zero, the corresponding element of the destination register is left unchanged. Once again, there is no cost for this. Vector instructions can also specify no writemask, for the common case in which all 16 elements should be updated. Predication makes it possible to handle the partial vector iteration at the end of vectorized loops. More importantly, it makes it possible to handle conditionals and loops in vector code.

Table 3: Vector operand swizzles. Notation: dcba denotes the 32-bit elements that form one 128-bit block in the source (with ‘a’ least-significant and ‘d’ most-significant), so aaaa means that the least-significant element of the 128-bit block in the source is replicated to all four elements of the same 128-bit block in the destination; the depicted pattern is then repeated for all four 128-bit blocks in the source and destination. These can be used only if there is no memory operand, and on only one operand per vector instruction. 64-bit elements are handled identically, except that in that case the above table describes 256-bit blocks, and the action is repeated for the two 256-bit blocks in the vector.

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t Let’s take a look at some of these features in action. First, here’s a simple floating-point vector multiply: vmulps v0, v5, v6 ; v0 = v5 * v6

Figure 4 shows how this performs 16 multiplies in parallel. Next, let’s make it a multiply-add (Figure 5): vmadd231ps v0, v5, v6 ; v0 = v5 * v6 + v0

Figure 4: vmulps v0, v5, v6.

Here, the destination is also the third source. In the instruction mnemonic, “231” refers to the placement of the three operands in the multiply-add equation. Thus, “madd231” means “multiply operand_2 with operand_3, add operand_1”; “madd132” would mean “multiply operand_1 with operand_3, add operand_2,” which translates to “v0 = v0 * v6 + v5” for the three operands used before. Now we’ll add predication; k1 writemasks the updating of the elements (Figure 6): vmadd231ps v0 {k1}, v5, v6

Figure 5: vmadd231ps v0, v5, v6.

We can make one source a load-op memory operand using the standard assortment of x86 addressing modes (Figure 7): vmadd231ps v0 {k1}, v5, [rbx+rcx*4]

We can broadcast from 4 elements in memory to 16 elements to operate on (Figure 8): vmadd231ps v0 {k1}, v5, [rbx+rcx*4] {4to16}

Or we can upconvert from float16 format (Figure 9): vmadd231ps v0 {k1}, v5, [rbx+rcx*4] {float16}

Figure 6: vmadd231ps v0 {k1}, v5, v6.

Figure 7: vmadd231ps v0 {k1}, v5, v6.

One note here: Memory operands to vector instructions must be aligned to the size of the block of data loaded; for this purpose, it is the size before writemasking is applied that matters. Thus, the example in Figure 7 must be 64-byte aligned, but the example in Figure 8 only has to be 16-byte aligned, and the example in Figure 9 only has to be 32byte aligned. (The alignment requirement is implementation-dependent, and could change in the future, but it will be true of the initial versions of Larrabee, at least.) No, it’s not like any x86 assembly syntax you’ve ever seen, but it’s actually pretty straightforward, and, as you can see, for once things are spelled out pretty clearly — DR. DOBB’S DIGEST

April 2009

www.ddj.com

10

D r. D o b b ’s D i g e s t “{float16}” is a lot easier to parse than most assembly-language mnemonics I’ve encountered. All of the above instructions run at the same throughput (although again that’s implementation dependent), and all of the capabilities illustrated above work with any vector instruction.

More About Vector Masks Now that we’ve seen how predication works, let’s look at how vector masks get set. They are primarily either generated by vector compares

or copied from general-purpose registers (general-purpose registers are the familiar x86 scalar registers — rax, ecx, and so on), although they can also come from add-andgenerate-carry and subtract-and-generateborrow instructions, or from a couple of special add-and-set-vector-mask-to-sign instructions designed for rasterization. Vector mask registers can also be operated on by a set of vector mask instructions. I discuss each of the primary ways of modifying vector masks next. Vector compares have the base mnemonic vcmp, and operate as you’d

Figure 8: vmadd231ps v0 {k1}, v5, [rbx+rcx*4] {4to16}.

imagine; the elements of one vector are compared pairwise with the elements of another vector, and the bit in the destination vector mask register that corresponds to each pair is set to the result of the comparison. The standard float, double, and signed and unsigned int32 comparisons are supported. There is also a vector test instruction, vtest, which operates similarly to vector comparison. One interesting point is that although the vector compare instructions take a mask input, it does not operate as a normal writemask, although the operation is similar enough so that the usual writemask notation is used. With normal writemasks, 0-bits block updating of destination elements; for vector compare instructions (and vtest as well), 0-bits in the source mask result in corresponding 0-bits in the destination mask that is, the comparison result is logicalanded with the source mask. This variant form of masking is desirable because the result will typically be used as a writemask, rather than the normal case where the result is used with a separate writemask that keeps the masked elements inactive. This is illustrated in Figure 10 for the vector-compare-less-than-packed-single instruction: vcmpltps k3 {k1}, v0, v2

Data may also be copied between two vector mask registers, or between a vector mask register and a general-purpose register, as, for example, with: kmov k2, eax ; k2 = ax

There are also binary instructions to perform a variety of logical operations on vector mask registers, such as:

Figure 9: vmadd231ps v0 {k1}, v5, [rbx+rcx*4] {float16}.

kand k1, k0 ; k1 = k1 & k0

Figure 10: vcmpltps k3 {k1}, v0, v2. The initial state of the destination vector mask register is ignored; 0-bits in the source mask result in 0-bits in the destination mask. DR. DOBB’S DIGEST

11

Finally, there is exactly one way to use the vector mask registers to set the general processor flags: with the kortest instruction. In fact, this is the only vector-related instruction of any sort that can affect the flags. Kortest logical-ors two vector mask registers together and sets the zero and carry flags based on the result; if the result is all-zeroes, ZF is set, and if the result is all-ones, CF is set, as in Figure 11. April 2009

www.ddj.com

D r. D o b b ’s D i g e s t Getting Data Into and Out of Vector Format

Figure 11: kortest k1, k3

Vector Loads, Stores, and Conversions

Table 4: Load conversions supported by vloadd, vexpandd, and gatherd.

Larrabee provides both aligned and unaligned loads and stores. Like all vector instructions, loads can do 1-to-16 or 4-to-16 broadcasts. Unlike other vector instructions, however, they can also do simultaneous type conversions from smaller types to float or int32; in fact, they can do far more type conversions than can load-op instructions, supporting all common DirectX/OpenGL types, as in Table 4. Vector stores can write all 16 elements, the low four elements, or only the low element of a vector. At the same time, stores can also down-convert to the types that loads can up-convert from, with a few graphics-specific exceptions, such as sRGB, that require a separate conversion instruction (Table 5). A writemask can provide predication for vector loads and stores just as it does for other vector instructions. Once again, writemasking, broadcasting, conversion, and selection are free.

The instructions covered so far are the heart of Larrabee’s data-crunching capabilities, but by themselves they’d require all their input and output to be arranged in structure-ofarrays (SOA) form, which would be unfortunate because most data is in array-of-structures (AOS) form — not least a lot of graphics data, such as vertex arrays. Since Larrabee’s initial use will be as the processor for a graphics card, it’s obviously essential to be able to get data into and out of SOA format efficiently, and LRBni adds three sorts of instructions for this purpose. Of these, first and most important are the gather/scatter instructions. The key to gather is that it lets you load each element of the destination vector from any memory address, independent of where the other elements are being loaded from, as in Figure 12, which I’ll discuss shortly. If you think of this as performing a separate scalar load for each element, it’s obvious why it’s so useful for vectorization — it’s the vector load instruction for cases where each of the 16 streams has a different data source. Consider the case of checksumming an int32 array. If it’s just one array, you can process it 16 values at a time, using the normal vector load instruction, vload, followed by vaddpi, to sum 16 values at a pop; or you could just do a load-op vadd, as in Listing One. Then, at the end, you can sum together the 16 values you’ve accumulated, and you’re done. (If the array wasn’t a multiple of 16 in length, you’d use the writemask to do a partial sum at the end.) If, however, the value you were checksumming was a field in a structure, so a skip was required between each addition, the

LISTING ONE ; ; ; ; ; ; ;

Table 5: Store conversions supported by vstored, vcompressd, and scattered.

DR. DOBB’S DIGEST

Partial code to calculate an array checksum, summing 16 elements at a time; code after the loop to do a final sum of the 16 partial sums would also be required. On entry: rbx points to the base of the array to sum. rcx is how many elements to sum. On exit, v0 contains the 16 partial sums. vxorpi v0, v0, v0 rcx, 4 ; do 16 at a time shr ChecksumLoop: vaddpi v0, v0, [rbx] rbx, 64 add dec rcx jnz ChecksumLoop

12

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t LISTING TWO ; ; ; ; ; ; ; ; ;

Partial code to calculate the checksum of a specific field in an array of structures, summing 16 elements at a time; code after the loop to do a final sum of the 16 partial sums would also be required. On entry: v2 contains the offsets of the first 16 checksum fields in the array relative to rbx. rcx is how many elements to sum. On exit, v0 contains the 16 partial sums. vxorpi v0, v0, v0 shr rcx, 4 ; do 16 at a time ChecksumLoop: vgatherd v1 {k0}, [rbx + v2] vaddpi v0, v0, v1 ; step to the next 16 values to checksum vaddpi v2, v2, [Mem_Structure_Size_Times_16] {1to16} dec rcx jnz ChecksumLoop

Figure 12: vgatherd v1 {k1}, [rbx + v2*4]. This is a simplified representation of what is currently a hardware-assisted multi-instruction sequence, but will become a single instruction in the future.

DR. DOBB’S DIGEST

13

vgatherd instruction would allow you to parallelize in either of two different ways. You could gather 16 fields at a time from the array, as in Listing Two. Or, more generally, you could process 16 different streams and do 16 sums at once, one from each of 16 different arrays; you’d gather 16 values, one from each array, and then vaddpi them, as in Listing Three. When ChecksumLoop in Listing 3 finishes, you will have accumulated the 16 sums for the 16 arrays. The structure size can even be different for each array. (Note that Listings Two and Three are almost identical; gather is so flexible that the same gather-based code can do many different things, depending on the initial conditions.) Okay, those last two code listings require a bit of explanation, because the gather/scatter instructions do not follow normal addressing rules. The address for a gather or scatter is formed from the sum of a base register and the elements of a scaled index vector register, as in Figure 12. This is the only case in which a vector register can be used to address memory. More precisely, for each element to be loaded, the address is the sum of the base register and the sign-extension to 64 bits of the corresponding element of the index vector register, optionally scaled by 2, 4, or 8. Note that the 32-bit size of the elements used for the index results in a 4 GB limit on the range for gather/scatter (or larger if scaling by 2, 4 or 8). What if your gather targets aren’t all contained within a 4 GB range? Then you need to wrap another loop around the basic gather loop, in order to step through the 4 GB ranges touched by the gather addresses, which is somewhat more complicated, but not unduly so. All of the above applies for scatters, but in reverse. Finally, gather and scatter support all the data conversions that vload and vstore, respectively, support, as well as writemasking. They don’t support broadcast or store selection, since those would be useless for these instructions — to broadcast in a gather, just set all the index fields to the same value (a partial broadcast is performed in Figure 12), and scatter can similarly easily perform store selection. April 2009

www.ddj.com

D r. D o b b ’s D i g e s t LISTING THREE ; Calculates checksums of a specific field in 16 arrays of structures in parallel. ; On entry: ; v2 contains the 16 offsets of the checksum field in each of the ; 16 arrays relative to rbx. ; rcx is how many elements to sum. ; On exit, v0 contains the 16 checksums. vxorpi v0, v0, v0 ChecksumLoop: vgatherd v1 {k0}, [rbx + v2] vaddpi v0, v0, v1 ; step to the next value in each array vaddpi v2, v2, [Mem_Structure_Sizes] dec rcx jnz ChecksumLoop

Figure 13: vgatherd v1 {k1}, [rbx + v2*4]. : vcompressd [rbx] {k1}, v0. This is a simplified representation of what is currently a two-instruction sequence.

Another important feature is the ability to queue data efficiently with the vcompress and vexpand instructions. For vcompress, the writemask-enabled elements of the source vector are stored sequentially in memory, as in Figure 13; for vexpand, the writemaskenabled elements of the destination are loaded from a sequential stretch of memory, reversing the action of vcompress, as in Figure 14. A new scalar instruction, countbits, has been added so that the number of enabled bits in a vector mask register — and thus, the number of elements stored by vcompress or loaded by vexpand — can easily be counted. As with all vector instructions vcompress and vexpand can be used without specifying a writemask, in which case all elements are loaded or stored, with no compression or expansion needed. In this mode, vcompress and vexpand function as unaligned store and load. Finally, the bsf and bsr bit-scan instructions have been enhanced. Where the existing bsf instruction finds the first 1-bit starting from bit 0 and scanning up, the new bsfi instruction finds the first 1-bit starting from the bit above the bit specified by the destination operand. This allows bsfi to continue a search started with bsf, without any bit-clearing overhead. The bsri instruction similarly provides a starting point for reverse bit scans. These instructions are useful for parallel-toserial conversion when the results of a vector operation must be processed serially, as we will see when we look at rasterization.

The Rest of LRBni

Figure 14: vexpandd v0 {k1}, [rbx]. This is a simplified representation of what is currently a two-instruction sequence.

DR. DOBB’S DIGEST

14

Several vector instructions have been added for moving bits around within each element. Vinsertfield rotates each source element according to a per-instruction immediate value, then masks off a portion of the result according to two more immediate values, and leaves the destination element untouched where the mask is zero, effectively inserting the rotated source element into the destination element, as in Figure 15. (In this case “mask” just means a normal bitmask, of the sort you might logical-and with a register, not the writemask.) Used with no bitmask, vinsertfield can also serve as a rotate-by-immediate vector instruction. April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

Figure 15: vinsertfield v0, v1, 8, 4, 23 for element 0 of v0 and v1. The same rotation and masking is repeated for all 16 elements.

vbitinterleave11pi and vbitinterleave21pi allow the interleaving of bits from two registers; vbitinterleave11pi alternates bits from the two sources, starting with bit 0 of the last source, and vbitinterleave21pi alternates one bit from the last source with two bits from the first source. Bit-interleaving is useful for generating swizzled addresses, particularly in conjunction with vinsertfield, for example in preparation for texture sample fetches (volume textures in the case of vbitinterleave21pi). The following sequence generates 16 offsets into a fully-swizzled 256x256 four-component float texture from 16 8-bit x coordinates stored in v1 and 16 8-bit y coordinates stored in v2, as in Figure 16: vxorpi v3, v3, v3 vbitinterleave11pi v0, v2, v1 vinsertfield v3, v0, 4, 4, 19

(Note that if it was a gather instruction that was going to use these indices, and if the texel size was 8 or less, it wouldn’t be necessary to use vinsertfield to shift up the address DR. DOBB’S DIGEST

in order to address the texels, since the gather instruction can scale by 2, 4, or 8.) There are also shuffle instructions for permuting elements from a source vector to a destination vector. Although LRBni is primarily a vector instruction extension, it adds a few scalar instructions as well. In addition to bsfi and bsri, it adds insertfield, bitinterleave11, and bitinterleave21, the scalar versions of the vector bit-manipulation instructions described above. Prefetching and other cache-control instructions have been added as well; these are particularly important on Larrabee, where data must be fetched far enough ahead and at a high enough rate to keep the voracious vector units well-fed and fully loaded in streaming applications, without the help of out-of-order hardware. Finally, note that in the initial version of the hardware, a few aspects of the Larrabee architecture — in particular vcompress, vexpand, vgather, vscatter, and transcendentals 15

and other higher math functions — are implemented as pseudo-instructions, using hardware-assisted instruction sequences, although this will change in the future.

What Does It All Add Up To? I’d sum up my experience in writing a software graphics pipeline for Larrabee by saying that Larrabee’s vector unit supports extremely high theoretical processing rates, and LRBni makes it possible to extract a large fraction of that potential in real-world code. For example, real pixelshader code running on simulated Larrabee hardware is getting 80% of theoretical maximum performance, even after accounting for work wasted by pixels that are off the triangle but still get processed due to the use of 16-wide vector blocks. Tim Sweeney, of Epic Games — who provided a great deal of input into the design of LRBni — sums up the big picture a little more eloquently: April 2009

www.ddj.com

D r. D o b b ’s D i g e s t Larrabee enables GPU-class performance on a fully general x86 CPU; most importantly, it does so in a way that is useful for a broad spectrum of applications and that is easy for developers to use. The key is that Larrabee instructions are “vector-complete.” More precisely: Any loop written in a traditional programming language can be vectorized, to execute 16 iterations of the loop in parallel on Larrabee vector units, provided the loop body meets the following criteria:

• Its call graph is statically known. • There are no data dependencies between iterations. Shading languages like HLSL are constrained so developers can only write code meeting those criteria, guaranteeing a GPU can always shade multiple pixels in parallel. But vectorization is a much more general technology, applicable to any such loops written in any language. This works on Larrabee because every traditional programming element — arithmetic, loops, function calls, memory reads,

Figure 16: The operation of the instruction sequence: vxorpi v3, v3, v3 vbitinterleave11pi v0, v2, v1 vinsertfield v3, v0, 4, 4, 19 for element 0 of v0, v1, v2, and v3. The same rotation and masking is repeated for all 16 elements. The x and y coordinates are 8-bit; the upper 8 bits of each are ignored, and in the example above are set to non-zero values in order to illustrate the masking operation of vinsertfield.

DR. DOBB’S DIGEST

16

memory writes — has a corresponding translation to Larrabee vector instructions running it on 16 data elements simultaneously. You have: integer and floating point vector arithmetic; scatter/gather for vectorized memory operations; and comparison, masking, and merging instructions for conditionals. This wasn’t the case with MMX, SSE and Altivec. They supported vector arithmetic, but could only read and write data from contiguous locations in memory, rather than random-access as Larrabee. So SSE was only useful for operations on data that was naturally vector-like: RGBA colors, XYZW coordinates in 3D graphics, and so on. The Larrabee instructions are suitable for vectorizing any code meeting the conditions above, even when the code was not written to operate on vector-like quantities. It can benefit every type of application! A vital component of this is Intel’s vectorizing C++ compiler. Developers hate having to write assembly language code, and even dislike writing C++ code using SSE intrinsics, because the programming style is awkward and time-consuming. Few developers can dedicate resources to doing that, whereas Larrabee is easy; the vectorization process can be made automatic and compatible with existing code. In short, it will be possible to get major speedups from LRBni without heroic programming, and that surely is A Good Thing. Of course, nothing’s ever that easy; as with any new technology, only time will tell exactly how well automatic vectorization will work, and at the least it will take time for the tools to come fully up to speed. Regardless, it will equally surely be possible to get even greater speedups by getting your hands dirty with intrinsics and assembly language; besides, I happen to like heroic coding. So in the next article we’ll look under the hood, examining how rasterization, a process that is most definitely not inherently parallel, can be efficiently implemented with LRBni. —Michael Abrash is a programmer at Rad Game Tools and the author of numerous books and articles on graphics programming and performance optimization. Return to Table of Contents

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

Smartphone Operating Systems: A Developer’s Perspective For developers,the battle lines are forming in the smartphone wars

by Tom Thompson

DR. DOBB’S DIGEST

A

pple’s iPhone is either loved or loathed, for as many reasons as there are users. However, there’s one fact that everyone can agree on about the device: It shook up the smartphone industry. Despite Apple’s coming late to the party in 2007, by the end of 2008 the iPhone had rocketed to second or third place in the U.S. smartphone market, upstaging other vendors who have been selling smartphones for over a decade. Depending upon whose market report you consult, along with the surges and slides of smartphone sales, these positions are subject to change over the coming months, but there’s no disputing that the iPhone has altered the landscape forever. What is the secret sauce to the iPhone’s success? Feature-wise, the iPhone’s hardware hardly makes a compelling case, as it lacks certain features found on other smartphones. However, what the iPhone does, it does effectively and easily. In a single word, the features that the iPhone offer are *useable*. To prove this, all you have to do is take a look at the picture submission statistics of Flickr, a popular photo-sharing site. The graphs on the site’s Camera Finder page, which track the volume of photo submissions by device, reveal that the iPhone postings easily outpaced other smartphones with much better cameras. Furthermore, at the end of 2008, the iPhone temporarily matched the submission volume from several high-end digital cameras. What drove this volume was not the camera capabilities of the iPhone, but that its owner can snap and send photos easily and quickly to any web site. Another factor in the iPhone’s success is its means of application distribution. This is done through Apple’s App Store, which leverages the familiarity and infrastructure of Apple’s iTune software, which many people already use to search for 17

and purchase music. While network operators have had their own ways to distribute software, it’s obvious that the App Store’s quick access to the goods has made the difference, as the store was making $1 million per day a month after its launch in 2008. What Apple has shown is that an easy-to-use platform, with ready access to applications, can carve out a section of the smartphone market filled with established players. Apple has fired the opening shot in the battle for the next stage in personal computing — the era of the smartphone. “Apple’s iPhone radically redefined the concept of the smartphone, and RIM’s Storm is obviously the first response,” said Tom R. Halfhill, senior analyst for In-Stat’s Microprocessor Report. “The huge software community that Apple is building around the iPhone is as revolutionary as the iPhone’s hardware design. Google’s Android could expand the concept even further by encouraging a more open approach to both hardware and software development.” The existing stewards of the smartphone market, such as Nokia, Microsoft, and Research In Motion, are not going to stand by and let upstarts like Apple and Google grab marketshare. Already they are beginning to counter with smartphones that offer a touch screen, accelerometers, and location services. However, the situation is complicated by the demands of the network operators, who want to be more than just dumb pipes carrying data and want to make money on services. As the battle lines form, this is a time of opportunity for developers. Apple’s App Store has shown that developers can write software that adds value to the platform — and just as important, distribute them in a way that they can make money. The industry stewards have countered Apple’s move with their own application stores, so there’s a huge April 2009

www.ddj.com

D r. D o b b ’s D i g e s t opportunity to write the “killer app” for one of several smartphone platforms.

The Nokia S60 Platform Without a doubt, Nokia is the steward of the mobile phone market. It dominates the smartphone industry, both in marketshare and sheer volume of devices sold. The Finnish company sold its first mobile phone in 1992, and at the end of 2008 it owned 38.6 percent of the world marketshare, according to ABI Research. Admob, which tracks online requests to web sites by platform, pegged Nokia’s share in mobile network traffic at 41 percent. In terms of devices shipped, the numbers are impressive: In 2005, the billionth (yes, that’s with a ‘b’) Nokia phone was sold. Nokia’s smartphone platform, the S60, was introduced in 2001 and the first handset that incorporated the platform was the Nokia 7650. While smartphones are a subset of the Nokia’s product line, the numbers are formidable: Nokia shipped 180 million S60-based devices in 2008. The S60 Platform is a smartphone reference design that provides a consistent interface and execution environment across the various devices made by Nokia and its licensees, which include Samsung, Lenovo, and LG. Some of the S60 Platform’s key features are: • Multitasking operating system that executes multiple third-party apps simultaneously • Memory protection that implements a robust and secure execution environment • Power management software conserves energy on battery-powered devices • Bluetooth and TCP/IP connectivity • Asian language support • Multitouch interface with haptic feedback (available in the fifth edition) • Can execute Java ME applications • Has enablers for enterprise support, such as push-based e-mail and protocols for synchronizing Exchange e-mails and contacts • Numerous avenues to develop S60 applications, using different programming languages and several sets of tools To see how S60 Platform implements these features, a tour of its software stack is DR. DOBB’S DIGEST

Figure 1: S60 software stack.

in order. The platform’s layered structure (Figure 1) allows the platform to address the different needs of Nokia, the network operator, and the developer. The topmost layer is the Applications layer, which is where native S60, licensee, operator, and third-party applications execute. Beneath this layer is the S60 Scalable UI layer. This layer was developed and is licensed by Nokia, and consists of UI frameworks. The visible components of these frameworks adjust for different screen sizes, or when the device switches between portrait and landscape screen orientations. They use Scalable Vector Graphics - Tiny (SVG-T) and relative positioning to achieve this capability. These UI components also adopt the device’s localization settings and rules, such as which direction to draw text, and how to display the calendar and time. The UI framework provides a toolkit of ready-to-use components that implement lists, editable forms, grids, notification screens, content viewers, and other visual UI elements. The next layer in the S60 Platform stack is the Runtimes layer. As its name implies, 18

this layer contains the runtime libraries that support various programming languages, interactive content display, and web rendering. For example, the runtime libraries for Symbian C++ and Open C/C++ programming languages reside here. It also houses the execution environment for Java ME. Web support for the S60’s web browser and for Web Runtime (WRT) widgets is provided via a WebKit rendering engine. The Runtimes layer also supports bindings for other execution modules such as Flash Lite and Python. The Platform Libraries and Middleware layer contains the frameworks used to implement specific services for layers higher in the stack, or expose lower-level OS APIs in the stack to the developer. It contains various application engines that manage Personal Information Management (PIM) data, messaging, and data synchronization. Multimedia content rendering — such as video decoding — is performed here. Camera and audio recording interfaces are provided for multimedia applications. A security framework manages the security April 2009

www.ddj.com

D r. D o b b ’s D i g e s t certificates and keys used for secure data sessions. The Symbian OS and Security Platform layer resides at the lowest part of the stack. The Symbian OS provides fundamental system services for the S60 Platform. The OS is microkernel-based, with the microkernel residing in kernel space, while the rest of the OS and software stack resides in user space. This provides security and memory protection for vital low-level services. In addition, the OS places each application into its own separate address space that isolates them from the OS and each other for security and reliability. The Security layer helps an IT department manage its smartphones, allowing an enterprise to lock or wipe a lost smartphone. The Symbian OS has undergone extensive field-testing, having evolved from the 32-bit EPOC OS in 1994. The Nokia S60 Platform uses the latest Symbian OS v9, which has an EKA2 real-time kernel. The real-time capability is required to properly manage a device’s telephony and multimedia functions. The Symbian OS has a plug-in framework that enables the easy addition of new peripherals, because their drivers and supporting frameworks can be “snapped” into the system without having to modify the OS or other frameworks. A client-server mechanism provides access to low-level system resources, and in fact the kernel itself is a server that parcels out resources to those applications that need them. This transaction scheme allows applications to exchange data without requiring direct access to the OS space.

Tools for S60 Platform Development There is something just about for everyone to write S60 Platform applications. The platform supports several programming languages, notably C/C++ for porting existing UNIX applications, and Java to port Java ME MIDlets. As mentioned previously, the software stack offers several runtimes that offer application development using WRT widgets, Flash, and Python. The primary programming language for the platform is Symbian C++, which meshes with Symbian’s object-oriented interfaces and frameworks. You use Carbide.c++, an EclipseDR. DOBB’S DIGEST

based IDE and toolset, to write and debug Symbian C++ code. Symbian C++ has special language idioms for embedded system support, such as a clean-up stack. On the negative side, these idioms and the language’s behavior makes for a steep learning curve with the language, although this is no different than learning Objective-C for the iPhone. One other complication is that over the years the platform has accumulated hundreds of API classes and huge number of methods. It takes time to sift through these when learning to write an S60 Platform application. The platform also supports Open C/C++, which implements a vendor-neutral C programming interface. Open C’s runtime consists of an implementation of selected POSIX libraries, while Open C++ uses STL/Boost libraries. These libraries allow ports of Linux applications to the platform. Using the Open C SDK, the Carbide.c++ IDE can be used to write Open C/C++ programs. The SDK can also be used with GCCE and RVCT compilers. Python language support is available as an installable runtime. Python is valuable as a rapid prototyping environment, and it can be extended to function as a code wrapper that invokes S60 Platform APIs. WRT widgets can be written using any web authoring tools. A WRT widget is essentially a web page containing HTML and ECMAScript, stored in a .zip file. The widget’s markup language and scripts execute locally on the phone. Nokia has developed a plug-in that works with Aptanna’s Studio, a development environment used to debug HTML, JavaScript, CSS, and Ajax code. In the latest release of S60 Platform, WRT widgets can, through JavaScript extensions, access the phone’s contacts list or location API directly, to make simple yet useful lightweight applications. Java ME development can be done with the usual suspects: either Eclipse or the NetBeans IDEs. The Runtimes layer implements the Java ME profile/configuration MIDP 2.0/CLDC 1.1. Once you’ve completed that killer mobile app, those programs written in Symbian C++ and Open C/C++ are packaged as SIS files. Symbian SIS files contain installation metadata, program code, and resources. The smartphone’s software installer reads the 19

metadata to determine if the application requires specific hardware that might be absent on the device, such as an accelerometer or Bluetooth. The installer can also perform a security check at this time. For the S60 Platform, the SIS file must be digitally signed. This is required by Symbian OS v 9 for the security check and trusted code mechanism to work. Security certificates can be used to permit or block application execution. Such certificates can be used by the enterprise to control in-house mobile applications, or by the network operator to manage smartphone features or access to network services. S60 applications can be distributed overthe-air (OTA), or via Bluetooth or PC connectivity software. Handango has managed the wide-scale distribution of Nokia applications. In February, Nokia announced plans to launch its Ovi Store, which sells applications, videos, games, podcasts, and other content, similar to Apple’s App Store. The store will be accessible by Nokia S60 smartphones in May. Nokia will also offer developers 70 percent of the reveue from sales. Finally, the Symbian Foundation will take on the governance of Symbian and Symbian-based platforms such as the S60 Platform, and move them to open source over the next two years. The source code for these platforms will be available to developers for free. S60 Platform Summary: Pros: • Nokia has the largest smartphone marketshare. • The S60 Platform has a diverse set of languages and tools to develop applications, ranging from industrial-strength C and C++, to Python scripts for prototyping, to Web-based WRT widgets. • Good enterprise support. Cons: • Nonstandard Symbian C++ has a steep learning curve, with special idioms to master. • Large number of Symbian APIs to learn, since it contains hundreds of classes and thousands of member functions.

The RIM BlackBerry Smartphone Platform If there is one smartphone widely recognized April 2009

www.ddj.com

D r. D o b b ’s D i g e s t for its enterprise support and communication capabilities, it’s Research In Motion’s (RIM) BlackBerry smartphones. BlackBerry smartphones feature a useable — if cramped — QWERTY-style keypad that owners can easily write e-mails, rather than going through all of the contortions of entering leetspeak on a numeric keypad. Combined with wireless support and a server infrastructure that pushes Microsoft Exchange e-mails to the device, BlackBerry smartphones enable on-the-go workers to keep in touch from anywhere. The Canadian company introduced its first device, a wireless two-way pager, in 1999. In 2002, it launched its first BlackBerry smartphone. RIM, as a smartphone steward, has mounted a rapid response to the iPhone threat: It launched the Storm, a full touch-screen version of its platform, in Q4 of 2008. In addition, the company offers a range of BlackBerry smartphones with different feature sets, knowing that one size does not fit all. Some of the BlackBerry platform’s features are: • Multitasking BlackBerry Device Software executes multiple applications simultaneously • Java ME underpinnings provide memory protection and trusted code execution • Can execute Java ME applications • Touch interface support • Manages multiple e-mail Exchange email accounts, along with support for POP3 and SMTP, and e-mails can have file attachments • Browser supports various media types, and can download files • Has enterprise features such as push e-mail; synchronization with Microsoft Exchange • For security the platform is FIPS 140-2 compliant, and supports AES or Triple DES encryption sessions via BlackBerry Enterprise Servers • Has development tools for writing Java ME applications for BlackBerry smartphones Like Nokia and Apple, RIM owns and builds the entire hardware and software stack for the BlackBerry smartphone, which gives them the ability to tightly integrate software features to hardware functions. Interestingly, at its core the BlackBerry smartphone is Java ME device, and can exeDR. DOBB’S DIGEST

Figure 2: BlackBerry software stack.

cute Java ME MIDlets. However, RIM’s BlackBerry Device Software has enhanced the capabilities of the platform with its own Java Virtual Machine (JVM), along with new Java classes that offer multitasking capabilities and UI enhancements to go beyond the capabilities of Java ME. Take a look at the software stack (Figure 2) to see how RIM accomplished this. The topmost Application layer is where Java ME applications (MIDlets) and BlackBerry UI applications execute. You can also take existing Java ME code and add specific BlackBerry classes to make a hybrid Java ME application. For example, you might invoke a BlackBerry API call to select an audio output device (speakers or headphones), then use a standard multimedia player class to play the audio content. Hybrid applications are possible as long as you don’t intermix MIDP 2.0 and BlackBerry API calls that perform either screen drawing or application management. Drilling down, the next stack layer is the Java Classes and Frameworks layer. The pro20

grammer who has written many a Java ME MIDlet will be on familiar ground here, as this layer resembles the Java ME platform. There’s the usual MIDP MIDlet classes that manage the UI and application lifecycle, and beneath that resides the CLDC classes that provide access to low-level resources. For BlackBerry smartphones running BlackBerry Device Software 4.0 or later, the platform’s Java profile/configuration is MIDP 2.0/CLDC 1.1 compliant, otherwise it uses MIDP 1.0/CLDC 1.0. This layer also implements useful Java Specification Request (JSR) API packages, such as JSR-75 (PIM and FileConnection services), JSR-135 (multimedia capture and playback), JSR-82 (Bluetooth support), JSR-120 (wireless messaging), and JSR-179 (location services), to name a few. All of these classes and those of the application are loaded and executed by the BlackBerry JVM. The BlackBerry API extensions in this layer enhance the platform in several ways. First, they provide UI APIs for custom menu, widget, and screens. Next, an application April 2009

www.ddj.com

D r. D o b b ’s D i g e s t class, Application, enables an application to persist and continue to run in the background, unlike the MIDP Midlet class, which requires that the application terminate when it is closed. Other APIs handle the set-up and tear-down of network sessions, and manage I/O to servers. Finally, these APIs provide hooks into the device’s camera, media player, and Web browser. An application can be written using only CLDC and BlackBerry APIs. Such an application has access to all of the device’s features including, for example, Bluetooth, the accelerometer, and the touch screen on the Storm. It can also run concurrently with other applications. Finally, the application can be launched when the BlackBerry starts up and then run in the background. The catch to writing an application that uses BlackBerry API extensions is that it ties the application this smartphone. However, this is no worse than using the unique Java classes found in Google’s Android. The BlackBerry Device Software layer implements a low-level multitasking, multithreaded operating system. It uses listener threads to monitor for specific device events. For example, these listener threads manage push support for e-mail and messages. The BlackBerry Device Software allows an IT manager to configure applications and disable certain smartphone features, and to blank a lost device remotely.

Software Tools for the BlackBerry Platform With the BlackBerry platform being a Java ME platform, you can use a number of Java development tools, such as Eclipse and NetBeans to write applications for BlackBerry smartphones. RIM also provides a BlackBerry Java Development Environment (JDE), which is a set of standalone tools. The JDE consists of an IDE that allows you to write, build and test Java Mobile and BlackBerry smartphone applications at the source-code level. The JDE has phone simulators that allow you to view and interact with the program. It also has a Signing Authority Tool, which you use to sign the application. Note that for your application to access the BlackBerry APIs, the code must be signed. Simulator software for DR. DOBB’S DIGEST

e-mail and communications with BlackBerry Enterprise Servers is available for thoroughly checking out a networked application. For specialized client/server applications, RIM provides RAD tools that can quickly build BlackBerry smartphone client applications. These tools can prototype services for a UI framework, data management, and client-server I/O. The RAD tools are available as a plug-in for Microsoft Visual Studio, or as a standalone Mobile Development System (MDS) used inside of Eclipse. For Web development, you can use standards-based HTML, JavaScript, and CSS. The BlackBerry web browser is standards compliant, and on BlackBerry Device Software 4.6 or later, the browser adheres to AJAX, JavaScript 1.5, CSS 2.1, DOM L2, and HTM4.01 standards. The platform also supports a messaging arrangement termed “Web Signals”. When an update is made to a web site, its server can use RIM’s push infrastructure to send a notification to a BlackBerry device. The pushed message includes a hyperlink back to the updated content. Applications can be distributed over the air (OTA), or through a web loader application on a PC, with a BlackBerry smartphone tethered to the computer with a USB cable. RIM has been distributing applications through Handango. However, in March the company plans to launch an app store to distribute applications. Developers will retain 80 percent of the revenue made from sales of the application. BlackBerry Summary Pros: • Best-of-breed Java ME platform for enterprise communications. • Infrastructure for push e-mail and messaging in place and proven. • Platform supports multiple third-party applications running, if they use BlackBerry API classes. Con: • To use the platform to its best advantage, must use RIM’s own APIs and classes.

The Apple iPhone Platform Apple introduced its first iPhone in September 2007, and 11 months later in July 2008 followed that with the iPhone 3G. By the end of 2008, the company sold over 10 21

million iPhones, to capture 1 percent of the mobile phone market. Not a bad beginning for a newcomer to the industry. This success occurred despite the fact that the device has a low-resolution 2-megapixel camera, and lacked some common smartphone features such video recording, voice dialling, and a to-do list. On the other hand, what makes the iPhone a success is that it dispensed with the phone keypad and function buttons. Instead, the iPhone uses a touch screen to display whatever the user needs at the moment: a web page, the latest e-mail, a map displaying your current location, an iPod music selection screen, and more. Getting to a feature usually takes a couple of taps on the screen. The iPhone applications display large, uncluttered interfaces, partially because the touch screen does not require a stylus, and Apple promotes the design goal that applications should accomplish one purpose. Apple is able to integrate its software well with the device because the company builds the iPhone’s hardware and its software stack, similar to the smartphone stewards. Some of the capabilities of the iPhone are: • It’s a web browser, with a best-of-breed experience • It’s an iPod, able to download, select, and play music • It’s an application platform, able to run useful utilities, access online content, and play games • It has enterprise support, such as synchronization to Exchange, and the ability to establish a VPN connection • Location-based APIs using the device’s GPS feature provide position information, valuable for the creation of new forms of local presence applications and social networking • APIs permit access to accelerometer’s data, allowing for the development of novel games The iPhone has one of the best browsing experiences for a smartphone. Its WebKitbased Mobile Safari browser conforms to AJAX, ECMAScript 3, CSS 2.1, and partial CCS 3, DOM 2, HTML 4.01, and XHTML 1.0 standards, and renders most web sites accurately. The multi-touch gestures interface lets you to zoom in and interact with the content. On the other hand, there’s no Flash support, April 2009

www.ddj.com

D r. D o b b ’s D i g e s t and you can’t download files. A built-in email program uses a virtual keyboard to type in messages, eliminating the need to do leetspeak, and you can view certain types of mail attachments. Unfortunately, the mailer doesn’t let you store these attachments as files in a directory, nor does it support a landscape mode (unlike the browser). The smartphone is also an iPod, and lets you choose music using CoverFlow (a unique graphic selection mechanism) or from a typical list. On the negative side, for the iPhone 2.2 OS the Bluetooth stack lacks an A2DP profile, so your music choice can’t be played through wireless stereo headphones. Everywhere you look, the iPhone exhibits this Jekyl-andHyde personality of doing certain things very well, but then there are those odd omissions. The iPhone 3G can work in tandem with Microsoft Exhange Server 2003 and 2007 to support enterprise operations. For example, Exchange’s ActiveSync technology can perform remote wipes on lost iPhones, and manages users logging to the corporate intranet. The smartphone supports VPN connections that can use Cisco IPSec, L2TP/IPSec, PPTP, and certificate-based authentication (PKCS1, PKCS12). ActiveSync’s direct push feature can send updates regarding e-mail, calender, and contacts to the device. Note that the iPhone only allows one Exchange account. For nonExchange users, Apple’s MobileMe online service, after some fits and starts in 2008, now supports the push of e-mails and changes to the calendar and contacts. To the end user, there’s no concept of files on this platform (although they do lurk behind the UI curtain), which explains why you can’t download content. Copy-and-paste is not supported for version 2.2. Only one third-party app executes at a time, which some competitors attempt to allude that the iPhone can’t multitask. However, e-mail can be checked while music plays in the background, so obviously it can. What is going on here? To find an answer for that, we have to delve into the iPhone OS’s software stack (Figure 3). Like the other mobile platforms, the topmost layer of the iPhone OS stack is the Applications layer. Apple applications can run concurrently, but only a single thirdDR. DOBB’S DIGEST

Figure 3: iPhone software stack.

party application can run alongside of them. The platform doesn’t have any Java implementation, so Java ME applications aren’t allowed. Beneath this layer is the Cocoa Touch layer. It consists of frameworks that manage the UI, such as capturing events, managing windows, and displaying graphics within these windows. Cocoa Touch is a subset of Apple’s Cocoa, which are object-oriented frameworks written in Objective-C. Cocoa provides many classes, or components, from which you can build a fully-featured application. However, Cocoa Touch frameworks are tailored for the constraints of the smartphone platform. These frameworks strike the right balance between abstracting much of the iPhone’s low-level hardware, while still enabling you to use device-specific features. For example, Cocoa Touch components manage most of the writing to the screen and playing media, yet there are APIs exposed that let you access the accelerometer and camera. Further down the software stack is the Media layer. This layer manages all graph22

ics rendering, audio generation, and playback of audio or video files. While Cocoa Touch provides the high-level means to generate animations and graphics, you can use the frameworks in this layer to exert fine-grained control over the display of your content. Three-dimensional objects are displayed with an OpenGL ES framework that conforms to the OpenGL ES 1.1 specification. This framework uses the device’s hardware accelerators to provide full-screen animations at high frame rates, a valuable capability for games. This layer also uses Quartz, which is a vector-based graphics engine, to handle 2D drawing and graphics effects. The Quartz engine is identical to the one found in Mac OS X. The Core Animation frameworks support sophisticated animation and visual effects, with the image compositing required to accomplish them performed in hardware. Playback and recording for audio files and streams is handled here, and a media player framework provides full-screen playback of video files of several format types. April 2009

www.ddj.com

D r. D o b b ’s D i g e s t The Core Services layer provides system services for the higher layers. It consists of frameworks and engines that support an address book, a SQL database (SQLite), location services (using GPS coordinates), and networking services. A security framework manages the digital certificates, keys, and access policies that can protect an application’s data. The Core layer implements basic OS services, and consists of the kernel, drivers, and OS interfaces. The kernel is Mach based, and manages low-level functions such as virtual memory, POSIX threads, network connectivity (BSD sockets), math computations, filesystem access, and more. Only a select few higher-level frameworks have access to the kernel and drivers. If necessary, an application can indirectly access some of these services through Cbased interfaces provided in a LibSystem library. The iPhone OS stack has a rich legacy. Cocoa, the Mach kernel, and certain Core layer components had their start with the original 68030-based NexTSTEP computer in 1989. Over the years, this software was ported to PowerPC- and Intel-based platforms, and finally to the iPhone’s ARM processor. As a consequence, the code has been subjected to rigorous examination during these ports. It has also undergone extensive field-testing, since the Cocoa frameworks, Quartz, and Mach kernel are part of Mac OS X for millions of Macintosh computers. The iPhone OS can be considered a highly stripped-down version of Mac OS X, thus bringing some of that operating system’s reliability and multitasking capabilities to the platform. However, there are fundamental differences between the iPhone OS and Mac OS X, given that one runs on an embedded smartphone and the other on a desktop computer. One difference is that there is very little RAM memory to spare in the iPhone. This is not surprising given that memory — not the processor — is usually the most expensive part in the device. The iPhone comes with 128 MB of RAM, and after the OS and Apple applications take their share, there can be anywhere from 40 MB to less than 4 MB of free RAM available DR. DOBB’S DIGEST

for a third party-application. Simply put, there’s not much memory to run more than one additional application. The small amount of free memory and one-app-at-atime requirement complicates any implementation of a copy-and-paste mechanism. Furthermore, developers have determined that everything runs at root level in the iPhone. There had to be a design compromise behind this decision, but obviously every process executing with root privileges poses a potential security threat. As a security sandbox, the iPhone OS permits only one third-party application to run at a time, and not in the background.

iPhone Development Tools You write iPhone applications using XCode, which is an IDE developed and maintained by Apple. Native applications are written in Objective-C, which is an object-oriented extension to C. Unlike C++, Objective-C is both a language and a runtime. Also, Objective-C’s syntax for objected-oriented operations is significantly different from those used in C++. In short, it takes some effort to get familiar with the language’s syntax and idiosyncrasies. XCode’s development tools allow you to debug iPhone applications in an iPhone simulator, and you can trace and place breakpoints at the source-code level. The SDK that provides the Cocoa Touch frameworks and simulator is a free download. However, to run your program on an actual iPhone, the application must be signed, which requires that you purchase a signing certificate. With an enterprise signing certificate, you can distribute provision profiles that allow internally-developed applications to execute only on those iPhones that have installed the profile. For web-based applications, the SDK provides DashCode, which is a framework based on a web page composed of HTML and JavaScript. You can use DashCode’s simulator to write and test your web application. You can also use several other thirdparty frameworks to write web applications, and debug these with Aptanna Studio’s tools. All iPhone applications are distributed via Apple’s App Store, and they must be digitally signed. The App Store uses the familiar 23

iTunes front end to both display and handle purchases. Applications can be obtained through iTunes on the desktop computer with the iPhone tethered to it by a USB cable, or OTA via an App Store program on the iPhone. A developer’s path to the money is clear-cut: Seventy percent of the revenue goes to the developer, and 30 percent to Apple. There have been several instances where developers made small fortunes with a highly popular app. Other smartphone vendors have quickly adopted similar revenue arrangements. Apple reviews the application you submit before publishing it. This review can screen applications of questionable content or malicious code, but it’s also been used to quash applications that “duplicate the functionality” of the iPhone. This is unfortunate, as developers could fill the gaps or improve the features of the device. For example, adding some useful Bluetooth profiles that supported stereo headsets, data synchronization, or the ability to implement multiplayer games would be useful, and make money. Despite some missing features, the iPhone’s ease-of-use advantage has made it a popular device. However, as this article points out, the smartphone’s industry stewards are working on leveling the playing field in terms of the UI, and Apple must counter with additional features or perhaps different iPhone models (one with more memory) to continue its growth. To this end, Apple announced an update in March, iPhone OS 3, that provides some of the missing features mentioned here, such as the A2DP profile for Bluetooth, voice recording, and copy-and-paste. Landscape mode is now supported by more Apple applications (the feature has always been available to developers), and push notification on the device side. You’re still limited to running one third-party application at a time, and no background task are allowed. A beta became available at the time of the announcement, and it is expected to be released by mid-year. The update can be installed on all existing iPhones and iPhone 3Gs, although the older models might have fewer features. Apple declined to comment on this article. April 2009

www.ddj.com

D r. D o b b ’s D i g e s t New Release of iPhone OS Adds New and Missing Features On March 17th, Apple announced the iPhone OS 3.0, which promises to fill in some of the missing features on the device, while adding new capabilities. Missing features that have been added: • Copy-and-paste. System-wide pasteboards now allow you to cut, copy, and paste strings, URLs, colors and images between applications. • Audio recording. New classes enable audio recording and management. Various audio formats and sample rates are supported. • Push notification. According to Apple, push notification was significantly reworked on the server side to signal the presence of new or updated information. As now implemented, the iPhone OS 3.0 generates an audible or text alert that prompts the user to open the target application. This scheme eliminates the need for running a background task that saps battery life and poses a security risk. • tereo headsets supported. The Bluetooth A2DP profile has been implemented. • Mapping support. Developers can now access the device’s mapping services to embed navigation and GPS plotting into applications. However, for turnby-turn support, developers will have to provide their own map content because of licensing issues. • More of the Apple native applications — notably the Mail app — now support landscape mode, and the virtual keyboard now works in this mode as well. What’s been added: • Support for accessories. The OS now supports the use of accessories connected to the iPhone either through its 30-pin docking connector or wirelessly via Bluetooth. Now that the device has been “opened”, you can expect an entire ecosystem to build up around the device, much like the iPod has. • Peer to peer support. A new game framework allows you to connect to other iPhones or for multi-player games or collaboration over Bluetooth without pairing devices. This framework implements peer-to-peer connectivity using Bonjour, Apple’s service discovery protocol. Bonjour conceals the gritty details required to locate and establish network sessions with other networked devices. • Application purchase support. Formerly, to add new content — such as new game levels — to an iPhone app, you had to submit a new application to Apple for approval and posting. With iPhone OS 3, developers can now allow users, from within the application, to purchase and obtain new content. New Store Kit classes manage the financial transaction details, and all content is transferred through iTunes. This capability is only available to paid (not free) applications, and Apple takes 30 percent of the revenue. • System wide search. You can use Spotlight (a search sevice already present in Mac OS X) to search for content in various applications.

What’s still missing: • No voice dial. • No video recording. Considering the quality of the iPhone camera, this isn’t a surprise. • To-Dos still MIA. • No background tasking. The update will be available by mid-year, and can be installed on all existing iPhone models through an iTunes download. It is also available for iPod Touches, for a small fee. Apple declined to comment on this article. —TT

iPhone Platform Summary Pros: • Best handheld mobile platform that happens to be a phone. • Ease-of-use makes it a popular platform, overcoming its other shortcomings. • You can make some serious money with a successful app. Cons: • Have to learn Objective-C; is only smartphone platform that uses it. • Phone lacks some basic features other phones have. • Competitors will soon catch up on the UI.

The Google Android Platform Like Apple, Google is one of the upstart companies that jumped into the smartphone fray recently. In late 2007, the company launched its Android platform, a smartphone stack developed for the Open DR. DOBB’S DIGEST

Handset Alliance, which is an association of 48 hardware, software, and telecommunications companies. In the fall of 2008, TMobile released the G1. Made by HTC, the G1 is the first smartphone using the Android platform. According to Admob, in January 2009 Android generated 3 percent of the smartphone network traffic — a signficant jump considering the platform had only been on the market a few months. It will be interesting to see if this trend continues, especially as more Android-powered smartphones appear. The Android platform is built upon Linux 2.6, whose pedigree can be traced back to its first source-code release in 1991. Since then, Linux has evolved and its code heavily reviewed and extensively field-tested on just about everything, from embedded systems to mainframes. Linux therefore provides a solid foundation on 24

which the Android platform is built. In addition, Android is distributed under the Apache license, which means all of its software components are free and the source code available. Some of the features of the Android platform are: • Built on Linux, a proven OS • Multiple third-party applications can execute at the same time • Applications can intercommunicate and share resources • Open source, which holds the potential that developers will contribute applications and features to the platform

Android comes with a number of useful applications, such as an e-mail program (which makes use of Google’s Gmail), a mapping program (using the company’s Google Maps), and a browser that uses WebKit, not April 2009

www.ddj.com

D r. D o b b ’s D i g e s t Google’s Chrome web browser. However, Chrome happens to be based on Webkit. Applications are written in Java, but Android is not Java ME, nor does it support such applications, for reasons that we’ll see when we tour its software stack. Like most smartphones nowadays, Android can handle changes in screen orientation, and provides access to accelerometer data. Android applications are designed to operate concurrently, and on the G1, the platform is noted for its ability to both browse and manage multiple IM conversations. On the other hand, such heavy use of the smartphone’s CPU shortens battery life significantly. Maybe Apple is on to something in limiting the number of applications that the platform can run. Android as implemented on the G1 has some strange omissions, just like Apple’s iPhone OS on the iPhone. For starters, Android doesn’t offer a virtual keyboard. This problem is mitigated on the G1 with its builtin keyboard, and you can expect future Android releases to include an input management framework that support soft keyboards and other input options. Like the iPhone’s version 2.2 OS, Android’s Bluetooth stack lacks many useful profiles. Flash support is lacking as well. On the positive side, the Android APIs support a touch interface (and the G1 has a capacitive touch screen), but not any multitouch gestures. The platform does currently support copy-and-paste, a feature the iPhone OS won’t get until version 3.0. Note that Android’s copy and paste only handles editable text fields, and copying text from the web pages is the browser isn’t allowed. For starters, it doesn’t offer a virtual keyboard. This problem is mitigated on the G1 with its built-in keyboard, but it does create a problem for smartphone manufacturers who wish to release a touch-screen-only device. Although the APIs support a touch interface (and the G1 has a capacitive touch screen), it lacks multitouch gestures, which complicates web surfing and editing mail replies. Copy-and-paste is supported, but only for browser URLs, and not text in applications. Like the iPhone’s OS, Android’s Bluetooth stack lacks many useful profiles. Flash support is lacking as well. Let’s now examine the Android software stack (Figure 4) to see how it implements the platform. DR. DOBB’S DIGEST

Figure 4: Android software stack.

The Applications layer hosts all Android and third-party applications, and several or more third-party applications can execute simultaneously. However, as noted, doing so can shorten battery life. Beneath the Applications layer is the Android Frameworks layer. This layer consists of Java classes that provide application functions such as window management, window content display, application messaging, and handling phone calls. Obviously the interface Android presents at this level uses the Java programming language, and because the source code is available, you can modify these classes to extend their capabilities or override their behavior. Note that some of the lower levels in the stack present C++ interfaces. Next on the software stack comes the Libraries and Runtime layer. The libraries implement 2D and 3D graphics support, along with multimedia content decoding. This layer has engines to support application features, such as a SQLite database engine and WebKit, which handles web content rendering. Like Java ME, Android strives for hardware independence by using a bytecode 25

interpreter to execute Android applications. However, it doesn’t use Sun’s JVM, but instead uses its own Dalvik Virtual Machine (DVM). The advantage to Android’s use of a different bytecode interpreter is that the DVM was designed so that multiple instances of it can run, each in their own protected memory space, and each executing an application. While this approach offers stability and a robust environment for running multiple applications, it does so at the expense of compatibility with Java ME applications. The lowest layer of the stack is the Linux kernel. It provides preemptive multitasking and system services such as threads, networking services, and process management. It handles all low-level drivers and manages the device’s power. For a detailed description of Android system stack, see my earlier DDJ article, “The Android Mobile Phone Platform,” September 2008.

Android Developer Tools The programming language of choice for the Android platform is — no surprise here — Java, although you can use C++ if you’re April 2009

www.ddj.com

D r. D o b b ’s D i g e s t writing code that might reside in the other layers. With the Android Development Tools (ADT) plug-in, you can use Eclipse to write and debug your applications. Other Java IDEs, such as IntelliJ, can be used. The Android SDK (now at version 1.1) is a free download. Android’s core Java libraries are derived from the Apache Harmony project, and are as compliant with J2SE as Google could make them. Seasoned Java programmers will find the Android SDK an amalgam of Java SE and Java ME methods and classes, along with unique new ones that address the needs of mobile platforms. In short, while Java programming experience is valuable, you still have to master the SDK’s new classes and methods. A manifest file serves a similar function as the Java ME one in that it describes the contents of the archive file and its entry point. However, the Android manifest also specifies the application’s permissions, which are required for sharing resources, and the events that the application can handle. Additional XML files describe the application’s UI elements, and how they are laid out on the screen. All of these enhancements mean that the Android program that you write aren’t portable to other Java ME platforms, but this is also true of applications that you’d write using RIM’s UI APIs. The developer tools compile the Java code to generate Dalvik bytecode files, with an extension of .dex. These files, along with the manifest, graphics files, and XML files, are packaged into an .apk file that is similar to a Java JAR file. Fortunately, you can use the existing Java signing tools, Keytool and Jarsigner, to create keys and sign your Android application. All Android applications must be signed or the platform won’t execute them. While the SDK provides a developer key for use during code test, you must generate a private key to sign the application when it is ready for public distribution. The private key is required to identify the author of the application, and to establish relationships between applications so that they can share code or information. The certificate that you use to generate the private key does not require a signing authority, and you can use self-signed certificates for this purpose. At the end of 2008, Google announced DR. DOBB’S DIGEST

the availability of the Android Developer Phone 1, which is an HTC phone with the Android platform installed. This eliminates the purchase of a G1 phone and a service contract from T-Mobile. The Developer Phone provides access to a shipping Android device without the cash outlay or contract contortions required when developing for the other platforms. The Developer Phone allows you access to a shipping Android device without the cash outlay or contract contortions required for the other platforms. For application distribution, there’s Google’s Android Market, which was launched in October 2008. The only requirements are that the application must be signed, and you must pay a $25 USD application fee. Initially, all applications distributed by the site had to be free, but in February the site began supporting priced applications. Google allows developers to take 70 percent of the proceeds. Unlike Apple, Google doesn’t review the applications submitted to Android Market, although the user community can rate the application. This arrangement gives you a shot at writing an application that can truly enhance the platform. On the other hand, it’s possible that you might pick up a malicious application before it is detected by the user 26

community. Like most things in life, sometimes there isn’t an ideal way to handle things, just practical compromises. Android Summary Pros: • Open source, open platform: If you hate the mail program, some third-party is writing a better one. • If you’re used to Java, you can start writing for the platform right away. • Linux underpinnings permits reliable execution of multiple applications. Cons: • Many of the mobile Java classes are specific to the platform, and tie applications that you write to it. However, this is also true of RIM’s Blackberry classes, or Apple’s Objective-C classes. • Running multiple applications has its advantages, but can also shorten battery life significantly. —Tom Thompson is the head of Proactive Support for embedded products at Freescale Semiconductor. He can be reached at [email protected]. The views stated in this article are the author’s, and don’t necessarily represent Freescale’s positions, strategies, or opinions. Return to Table of Contents

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

Three Reasons for Moving to Multicore Performance is at the top of the list by Christopher Diggins

DR. DOBB’S DIGEST

P

arallel computing used to be the specialty domain of supercomputers. These days, however, even computers aimed at the home market have at least two processor cores. Four-core machines are already widely available, with affordable six processors on the horizon. As if that’s not enough, some hardcore gamers have dual quad-core processors installed, while companies like Intel are prototyping 80-core processors. But for decision-makers in the software industry, the most pressing question is how does this shift toward parallelism in hardware affect them? To address this question, we turned to Cilk Arts — a company cofounded by legendary computer scientist Charles E. Leiseron — which recently interviewed more than 150 companies regarding their challenges and priorities in supporting multicore platforms. When asked to identify the top three reasons motivating companies to move to multicore, Cilk Arts’ Ilya Mirman reports that they repeatedly heard three key themes: Application Performance. Achieving good performance in a concurrent application on multicore hardware is not as simple as adding a bunch of threads to an appliction. In the performancecritical sections of the software, all of the cores have to be kept busy. This is especially challenging because the number of cores cannot be known ahead of time. Performance has to be as good as possible if there are 1, 2, or even 16 or more cores available. Other factors affecting performance of concurrent software are efficient management of synchronization mechanisms (e.g., locks) and efficient distribution of work across the cores. Locks are a widely used mechanism for assuring that resources and memory can be shared between threads without corruption. Inadequate usage of locks can lead to race conditions, while overuse leads to poor performance. To maximize usage of the cores, work has to be distributed among the 27

cores dynamically as each core completes its various tasks. According to a Rogue Wave survey, performance requirements are the primary reason for them to shift to multicore hardware: • 58% of respondents said that an increase in performance has been the reason behind their organizations shifting existing applications to multi-core hardware. • 92% said that their business applications have high-performance requirements; of those that have high performance apps, 69% said that their business applications have requirements to support high throughput. • 82% said that performance requirements for their organizations’ apps are on the rise. Development time is a direct consequence of the fundamental complexity of concurrent software development. Managing this increased complexity takes an additional amount of time for everyone involved in software development: designers, developers, and testers. Closely related to development time is cost. Not only is concurrent software more expensive to write, because it takes longer, but it is more expensive per-programmer because it requires specialized knowledge and training. Software reliability in a concurrent system is an especially thorny problem. When concurrent software runs on a serial (non-parallel) computer, the parallelism is typically emulated using a timesharing technique. When running the same software on parallel computers, new possibilities for race conditions arise because of the fact that assembly instructions can be executed simultaneously. According to Mirman, “these pernicious bugs are notoriously hard to find. You can run regression tests in the lab for days without a failure only to discover that your software crashes in the field with regularity. If you’re going to multicore-enable your application, you need a reliable way to find and eliminate race conditions. To avoid race conditions, April 2009

www.ddj.com

D r. D o b b ’s D i g e s t access to shared resources and memory must be synchronized.” Most companies, when confronted with the prospect of migrating their software base to address the needs of multicore hardware, first try an approach of manually managing native threads using thread pools and other techniques, what Leiserson calls Do It Yourself (DIY) Multithreading. According to Mirman, this is the least fruitful approach for writing concurrent software because it takes too long, is too expensive, and generally produces less reliable software. There are solutions to the problem of writing scalable and reliable concurrent software for multicore platforms that don’t require a lot of retraining. Many of these are specifically for the C++ developer arena, where performance tends to be of a more immediate concern, and the challenges of concurrent software development are more acute. Referred to as “concurrency platforms” by Charles Leiserson in his article “The Case for a Concurrency Platform” and elsewhere, these solutions are either library-based solutions or minor language extensions. They all provide new abstractions for expressing the inherent parallelism in software, which have to be added by the programmers, but they solve the problem of dividing the work to be done-up among the core efficiently. In effect, load balancing the work among the core. All the programmer has to do is point out where the opportunities for parallelism exist. Many of these solutions are based on the principle of “work-stealing.” Cilk Arts cofounders Matteo Frigo and Leiserson introduced work-stealing techniques in an awardwinning paper “Implementation of the Cilk5 Multithreaded Language.” This research on work-stealing formed the basis of the Cilk++ product, and influenced other projects such as Intel’s Threaded Building Blocks. We asked Mirman to tell us a bit about work-stealing in Cilk++: The Cilk++ Runtime System (RTS) enables a Cilk++ program to dynamically and automatically exploit an arbitrary number of available processor cores. With sufficient parallelism and memory bandwidth, the RTS delivers near-perfect linear speed-up as the number of cores increases. Mirman went on to describe DR. DOBB’S DIGEST

how Cilk++ uses a decentralized scheduling algorithm to efficiently distribute work. Some of the other concurrency platforms aimed at the C++ audience migrating to multicore hardware are: • Intel’s Threading Building Blocks (TBB) offers a complete approach to expressing parallelism in a C++ program. It is a library that helps you take advantage of multicore processor performance without having to be a threading expert. Threading Building Blocks is not just a threads-replacement library. It represents a higher-level, task-based parallelism that abstracts platform details and threading mechanisms for performance and scalability. • The OpenMP API supports multiplatform shared memory parallel programming in C/C++ and Fortran. OpenMP is a portable, scalable model with a simple and flexible interface for developing parallel applications on platforms from the desktop to the supercomputer. • The RapidMind Development Platform is a framework for expressing data-parallel computations from within C++ and executing them efficiently on multicore processors. (See RapidMind: C++ Meets Multicore by Stefanus Du Toit and Michael McCool.) • The Task Parallel Library is a managed code library for conveniently expressing potential parallelism in existing sequential code, where the exposed parallel tasks will be run concurrently on all available processors. • Cilk++ from Cilk Arts simplifies the task of parallelizing code. Specialized keywords are introduced into what would otherwise be a compliable C program. The keywords indicate the functions that can be parallelized and work units that comprise those functions. The runtime system schedules the work units among the available processing elements, using a “work stealing” paradigm. Achieving scalable performance in the face of hardware parallelism without sacrificing reliability or significantly increasing development costs is an issue of managing complexity. Concurrency platforms can alleviate this complexity by providing a level of abstraction for expressing parallelism that removes the burden of manual thread management. There are still going to be some changes needed to the software development 28

pipeline for developing concurrent software for parallel hardware. Here it is best to take the leads from the high-performance computing industry who have been refining their processes for developing concurrent software over several decades. Here is how the different phases of software development are affected by concurrent software development: • Design. Because complexity of concurrent software is increased, careful design is more important than ever. Interactions between modules need to be reduced. Synchronization bottlenecks need to be identified and avoided as early as possible, to avoid spending developer time later on. • Development Concurrency platforms are needed for programmers to express algorithmic parallelism, without worrying about the details of distributing work across the number of cores. • Quality assurance is where the biggest investments and changes may need to be made for concurrent software. The QA team needs more time and resource to test a far greater number of hardware configurations. In addition to testing, new tools and processes need to be implemented for static and dynamic analysis, in addition to profiling and debugging. • Support. Despite a company’s best efforts, there are going to be more problems after deployment. Field support engineers can help QA by working closely with Beta users.

So while there are steps we can take to manage the complexity of concurrent software in a parallel world, it is still going to take some investment in training and new tools. Careful investment in the correct tools and a reexamination of the software pipeline will go a long way to mitigating these costs.

Acknowledgments Thanks to Ilya Mirman and Cilk Arts for sharing data from their interviews, and to Kris Unger for providing feedback and suggestions on this article. — Christopher Diggins is a freelance programmer and consultant. He can be contacted at [email protected]. Return to Table of Contents

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

Bug Labs BUG Update The BUG sounds off!

by Mike RIley

R

egular readers may recall my article on BugLabs BUGbase and selected modules which were available at that time. The BUG continues to evolve. Earlier this month, the latest R1.4 build was released, allowing BUG builders to interact with the new BUGsound module, among other features. With the release of the 1.4 build based on the Poky Linux distribution, along with version 1.2.8.1 of the BUG SDK and Eclipse-based Dragonfly development environment, the Bug Labs folks are still in the modular system programming game in a serious way. Even with critics like me questioning the longevity and viability of the platform in the face of other more functional Java-based mobile platforms like Google’s Android, there’s just something about the tinkering aspect the BUG’s a la carte modular selection coupled with the dedicated, enthusiastic hobbyist community surrounding the BUG that makes it thrive. The BUG’s kernel and root filesystem have been completely revamped since my previous article. The new system based on Poky delivers not only a

Figure 1: BUGvonHippel.

DR. DOBB’S DIGEST

29

much more comprehensive set of Linux commandline utilities and broader Linux module support, but even more sophisticated Linux-compiled applications including full versions of Perl (v5.8), Python (v2.5.2), and Ruby (v1.8.6). Additional programs such as MySQL and even Gnome-based window applications can be installed via the aptlike ipk package installer. And for those using the BUGdisplay module, the new Gnome-based desktop shell with its simple application selector makes using that module much more useful. In addition to this significant operating system upgrade, Bug Labs has released two more modules: BUGvonHippel and BUGsound. The BUGvonHippel module, named after MIT professor Dr. Eric von Hippel (author of the book Democratizing Innovation), exemplifies the open platform nature of the BUGbase by providing an open interface for students, hobbyists, and embedded systems engineers to extend the BUGbase with their own electronic designs. This breakout board module includes a female USB 2.0 port that can be used to interface the BUG with keyboards, mice, storage devices, and anything else that is USBbased. Perhaps the most exciting aspect of the BUGvonHippel is the ability to plug in a USB-toEthernet adapter and finally connect the BUG to a direct network connection rather than having to rely on the clunky direct USB-Ethernet bridge interface that required the BUG to be tethered to a network-connected computer for the BUG to perform anything really interesting. Following the instructions that Bug Labs’ John Connolly shared with me, I was able to load a Pegasus Ethernet module into the Poky kernel, connect a TRENDnet USB-to-Ethernet adapter to the connected BUGvonHippel module, set a static IP in the April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

Figure 2: Ethernet.

Figure 3: Ethernet panel.

/etc/network/interfaces configuration file, bring up the Ethernet interface, and untether for a standalone bug experience. This not only made the BUG instantly more useful from a free-standing computing

platform, but also made BUG development via the updated Dragonfly SDK vastly easier. In fact, I could connect to, interrogate, and develop BUG applications from several of my networked computers with Eclipse configured with Dragonfly — very cool! Taking it to the next level, I then successfully installed the latest stable release of the Django framework (recall that Poky is loaded with a complete install of Python 2.5.2), copied over a couple simple Django web applications (with SQLite-based data stores) that I wrote, and they executed flawlessly. While I had hoped that I could control various aspects of the BUG’s hardware and attached modules, that unfortunately at this time is not possible. These suspicions were confirmed by my e-mail conversations with John, who wrote: As it stands now, there is no Dragonfly SDK support for anything but Java. Ambitious users may be able to use our existing libraries (with some work and rewrites, which I’d be happy to support) to make it possible to control the module/Base hardware from outside of Java, but it is not supported out of the box. If you’re curious, we’re looking at porting JNA (Java Native Access) to Sun’s PhoneME vm/classpath, or using existing support with another Java VM/classpath implementation. In essence, this would make it much easier for users to use our native libraries to control their BUG from other languages.

Figure 4: Dragonfly.

DR. DOBB’S DIGEST

30

Python, Ruby, and I’m sure Perl all have tools for using prebuilt C libraries natively. As of 1.4, the C/C++ shared libraries are crammed up with Java-centric glue, because we used JNI.

In the meantime, I have several ideas on how to circumvent this boundary, though not in an elegant manner. Essentially, write a small Java application that polls the intended BUGmodule for whatever values/data it collected, dump this at set intervals to a file, then read that file at set intervals from my Python script(s)/Django app for further manipulation and/or browser display. Not ideal but certainly possible until the Bug Labs development team or a contributor provides some form of direct access. Of course, while the BUGvonHippel module supplies the ever versatile USB port, it also exposes the BUG to custom electronic interfaces via its two rows of direct female wire connections to the module’s circuit board, consisting of power, DAC, ADC, GP I/O, Ground, I2C, I2S, and Serial connections. This open configuration could potentially expose the BUG to countless custom electronics projects and device interface configurations for learning labs, hobbyists, and prototype projects. Bug Labs also has a video interview with Dr. von Hippel talking about the BUG board named after him, discussing why the BUGvonHippel board is such an insightful, open design. The BUGsound audio module contains a 20-mm speaker and omindirectional microphone with built-in hardware stereo codecs and input, output, and headphone and microphone standard 3.5mm jacks along the side of the module. Additional technical details about this module can be found on the Bug Labs website. I had big hopes for this module with the thought of being able to transform the BUG into an MP3 player and, with its newly capable free-standing network capability courtesy of the BUGvonHippel module, a networked media player. Unfortunately at this time, this module’s (and quite honestly, most of the BUG modules) documentation is extremely sparse. Bug Labs currently relies on its concept/prototype applications on its Community Applications site (directly accessible within the Dragonfly perspective in the Eclipse IDE), such as this April 2009

www.ddj.com

D r. D o b b ’s D i g e s t More On the BUG

Figure 5: BUGvonHippel USB.

Figure 6: BUGsound.

AudioEventTester application to demonstrate this and other modules, as well as document the module’s function calls via the raw source code provided in these projects. While uber-techies will be able to tease out the essentials from these minimal efforts, formal, comprehensive documentation for the entire BUG line is a requirement before the BUG can be comfortably incorporated into a broader developer community. Certainly the online support and community forums are helpful, but nothing beats solid, up-to-date, and exhaustive documentation for deep hardware and software exploration. Like the BUGvonHippel, Bug Labs has a brief video about the BUGsound module that does little more than reiterate the module’s features but, like the BUGvonHippel video, does not actually show the module executing these features.

Where’s the WiFi? As of this writing, there has yet to be a WiFibased BUG module released. However, according to Bug Lab’s Mehrshad Mansouri in this BUG Community forum thread, “We’re working on a wi-fi enabled BUGbase DR. DOBB’S DIGEST

I enjoyed this piece on the BUG on Dr. Dobb’s. I’m a BUG enthusiast, but (as you can tell from my e-mail address) also a part-time consultant for Bug Labs. I go by finsprings on the forums and bugnet. With regard to the Java-only APIs on the BUG, I wanted to see if you were aware of the RESTful APIs that are currently available for the BUG modules. These are still a bit limited, and I think they’re all read-only for now, but there’s no reason they can’t become more fullyfledged (I think John was looking at supporting writes for the von Hippel for example). Depending on what data you need access to, and how frequently, they could be a workable solution for incorporating data into your Django web apps. They’re also pretty quickly extendable in the BUG modules’ Java code, so if there’s something blatantly missing it could hopefully be added in relatively short order. If you point your browser to http://bugip/service, it will give you a list of the services you can access, which will depend on the modules you have installed; each service has a Service XML tag with a name attribute. Then you can query the particular service using http://bugip/service/servicename and get the data back, typically in XML form. For example, BUGlocate will give you lat/lon position data back, BUGmotion will give you back accelerometer readings or the last time motion was detected, BUGcam will take a picture and return you the jpeg, and so on. A few examples from my BUG are at http://pastebin.com/m5aef9b1c so you can see what I mean. I’m a Python fan, and I know some of the folks in the BL office are too, so if you get any cool stuff going with Django or plain ol’ command-line Python on the BUG, it would be great if you would share, in #buglabs on freenode or some other way :-) Also, regarding BUGsound, http://www.youtube.com/watch?v=x9ON2kT5ct4 shows a video of a BUG app that I wrote that uses the accelerometer in the BUGmotion to select which audio samples to mix together to output via BUGsound. It gets pretty annoying pretty quickly (the guys who were at CES were cursing me :-)) but it does at least show BUGsound doing something; and hey, there’s cowbell! If you try Phunky, be aware that it takes a long time to start up; that’s because it has to cache all 9 samples up-front, and because they’re all 44.1kHz stereo, they’re a little on the large size even for as short a duration as they have. Dave (finsprings) Findlay [email protected] and we plan to release it within the next few months, and we will be shipping the BUGwifi module within the next few weeks.”

Conclusion BUG Release 1.4 and SDK 1.2.8.1, released in the middle of March, made huge strides of improvement and bring more functionality for the BUGlocate and BUGvonHippel modules compared to the version I wrote about last year. The most important addition is formal support for the BUGsound module via its new audio API. While I struggled to get anything more than a simple demo working with the BUGsound module, it has great potential, just like the 31

rest of the BUG modules. I hope to follow up with the Bug Labs folks in another six months or so from now to see how this rapidly evolving platform continues to innovate, especially with the expectation that the built-in WiFi and Bluetooth hardware will be integrated into future models of the BUGbase, making the BUG platform an innovation engine for students, educators, hobbyists, tinkerers, and embedded software systems engineers. — Mike Riley is a Dr. Dobb’s contributing editor. He can be contacted at [email protected]. Return to Table of Contents

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

[

Conversations

]

Q&A: When Mobility and Open Source Collide Dr.Dobb’s talks with the Symbian Foundation’s Lee Williams

by Jonathan Erickson

S

ymbian OS powers more than 200-million mobile devices, and as Executive Director of the Symbian Foundation, it’s one of Lee Williams’s job to take the operating system to open source.

Dr. Dobb’s: When it comes to mobile devices, open source has gained real traction. Williams: Yes, open source and community-style software development is probably the single best way to create and leverage value in the mobile marketplace. The characteristics of this marketplace, one which is rapidly colliding with so many other adjacent technology and consumer markets, demand that companies find a way to leverage open-source concepts and models. Dr. Dobb’s: Near Field Communication is coming on strong. Williams: NFC chips let devices be used as a mobile wallet or boarding pass for planes. These types of products already exist in Japan, so it’s only a matter of time before we see them appear in other parts of the world. Dr. Dobb’s: What emerging mobile technologies should we be keeping our eye on? Williams: 3G bandwidth and coverage continues to grow and become affordable and ubiquitous. LTE (Long-Term Evolution) is one technology to watch as it establishes itself as a key standard for delivering even higher bandwidths of data. Smartphones are getting advanced features more quickly than the advances in battery technology; we need to find new ways to maximize battery life. SMP (Symmetric Multi-Processing) does this by powering up individual processors only when they are required and powering them down when the device is idle or executing less performance-critical tasks. Advanced graphics and transaction technologies are also worth watching. DR. DOBB’S DIGEST

32

Dr. Dobb’s: “Converged mobile devices.” What are they and what unique challenges do they pose for software developers? Williams: A converged mobile device is a product that integrates different consumer electronic product functions such as a digital camera, digital music player, video media player, digital video recorder, messaging device, Internet appliance, and handheld GPS into one multifaceted unit. With so many different types of capabilities or even a few of them in one product, the potential for creating unique features that a consumer will value is extreme. However, with this comes a number of challenges. One key challenge is to focus on good design. It is better to make sure that product designers, software developers, and content and service providers focus on keeping things simple, and doing a few things well, versus simply exposing a myriad of functions and requiring a consumer to constantly learn and relearn the capabilities of a converged product. Another challenge has to do with remembering interoperability demands. Converged products often interact with the world, and other products in a way that only a person or another converged product can. It’s not enough to provide a new software application for this type of product and force a user to go find it every day — a developer would need to focus on the whole experience. The web browser is a good example, on a PC it may make sense to ask a user to find, click, type, and browse the Web or look for a service. In a mobile, converged product, you need to help the user be present with the service even or especially when they are driving or have the product in a pocket or handbag, and requiring them to constantly select ‘yes’ or to type in forms are real headaches for a consumer. Perhaps one of the biggest challenges faced by software developers is their route to consumers and April 2009

www.ddj.com

D r. D o b b ’s D i g e s t ultimately revenue. Many developers have created good applications, but getting them out to a global marketplace is not easy. It can take months of negotiations with major companies to even get a chance to be exposed to a consumer. The Symbian Foundation platform already provides developers with access to a global market with

tent using virtual keyboards offered by large touch screen display devices, opting instead to use a product with a physical keyboard. At the other end of the spectrum, you see hoards of teenagers in the U.S., Europe, and Asia happily texting one-handed, using predictive text. UIs must therefore be discussed in the context of a products form factor. A

You cannot diminish the importance of efficient power utilization to most consumers…the fastest way that a mobile product, no matter how capable, falls out of favor with a consumer is if it cannot last days between charges

about a quarter of a billion devices shipped worldwide and support for 50 different languages and many different product form factors. We are focused on helping developers get their applications to these users by giving them a one-stop drop-in to an applications inventory or electronic warehouse. We will not provide a store front, but will help the community create multiple online stores from which they can generate revenue for themselves and the developer. These stores will end up being highly differentiated, so as to appeal to the broadest range of consumer tastes and interests. Dr. Dobb’s: Power, security, user interfaces. Can you rank — and discuss — these in terms of the challenges for software developers? Williams: UI, Security, Power. Consumers first of all want a great experience with their mobile product and an easy-to-use and attractive UI can help deliver this. This does not necessarily mean the UI has to be touch-enabled or that a UI sits in software alone. Also, display size is becoming increasingly important. The trend is toward both large and small displays and form factors. For example, business executives that send lots of e-mails are rarely conDR. DOBB’S DIGEST

mass market mobile OS needs to be able to support several different form factors, display sizes, and input methods to cater to individual tastes. Symbian OS enables this type of differentiation, which you can see throughout our communities product portfolios. Security is the next most important concern as consumers carry a lot of personal information on these products and interact with personal content and services. Also, to be always connected, you will seamlessly roam between many different types of operator networks. To ensure the highest level of support and quality of service, these operators require some form of control over their network environment. Good and flexible security becomes an imperative for using this type of product on the world’s networks, and with the Internet’s plethora of services. Power comes last but not least. You cannot diminish the importance of efficient power utilization to most consumers. I would argue one of the fastest ways that a mobile product, no matter how capable, falls out of favor with a consumer is if it cannot last days between charges. We also need to consider the fact that mobile product features and needs are becoming computation33

ally intensive at a faster rate than battery technology is advancing and becoming more capable. So what challenges do each of these areas present to developers? With UIs, developers need to remind themselves to ‘keep things simple and useful’. Focus on the whole experience, meaning you need to be inclusive of display sizes, input methods, and form factors when you design and develop your applications and services. This will ensure users have a positive experience interacting with their application, while making it easy to use the application again and again. When it comes to security, developers need to pay special attention to how they design their applications. Leverage the certificate signing and authorization models. They do benefit consumers, if even in indirect ways. Different mobile platforms offer different security models, and not knowing how best to design applications with these will lead to reduced confidence in their products from handset manufacturers, operators, and ultimately consumers. It’s primarily up to the mobile platform to manage power effectively, not the developer. For example, Symbian OS was built-up from its Psion heritage, where two AA batteries would power a personal organizer for over two months. Bringing this technology to mobile phones means Symbian OS is arguably the most power-efficient mobile OS on the market. That said, developers should make safe calls when developing applications, not look to avoid a public API, and focus on not having processes running when not being accessed or used. Dr. Dobb’s: Location-based applications and games. Is this the future of mobile devices? Williams: Location-based applications are certainly a key part of the future of mobile devices. Games and different forms of entertainment and content access will also be more prevalent. I envision more or less an explosion of social networking concepts and services running on mobile products. Gone is the day of going back to a PC for an online social experience. What you can do with location-aware applications and content when combined with connectivity and the rich amount of data and people present on April 2009

www.ddj.com

D r. D o b b ’s D i g e s t the Internet is truly incredible. This has the potential to forever change the way people interact with the world and others in their lives. Another area where we will see growth is in mobile payment concepts. NFC (Near

that allows you to run thousands of Palm applications on Symbian products, and a company called Red Five Labs has a runtime for Symbian OS that ensures Microsoft .NET applications can be fully supported. Dr. Dobb’s: There are currently more mobile

There is a challenge is that the mobile marketplace requires that you be able to participate with a number of different types of companies and suppliers

Field Communication) chips in devices allow them to be used as a mobile wallet or boarding pass for planes. These types of products already exist in Japan and our member companies have already had years of experience creating these types of solutions, so it’s only a matter of time before we see them appear in other parts of the world. The great thing about the foundation membership base and our increasingly open platform offering is that the future of mobile products and services is in the hands of developers, so anything is possible. Dr. Dobb’s: Cross-platform applications in the mobile space. Will we ever see this? Williams: We already have them on the Symbian OS offering. Open C, Java, .NET CF, Flash, and WebKit or ECMA scriptbased applications can cross over and are portable across several mobile platforms. These technologies provide investment protection and reach for developers and service providers. We are enhancing these offerings by continuing to integrate the latest versions of these runtimes, and are expanding this functionality with QT libraries, Adobe AIR technology, and others. There are also a number of companies that have created emulators or specific runtimes that allow more applications to cross platforms. A company called StyleTap has a Palm emulator DR. DOBB’S DIGEST

phone users than Internet users. Where is the opportunity? Williams: With over 4 billion mobile phone users worldwide, it makes for a pretty attractive commercial opportunity for companies that want to increase their market potential. The footprint is growing and now driving the number of Internet users, as many people around the world are not buying and cannot afford a PC. Instead, they are opting for a mobile product to put them online. There is a challenge in that the mobile marketplace requires that you be able to participate with a number of different types of companies and suppliers. This includes silicon suppliers, operators, device manufacturers, and application, content, and service providers. The Symbian Foundation is helping to do this by ensuring we lower the barrier for entry for software developers. We are a coordination point and hub of access to all of these different companies. A single source for information, dialog, and business development with a large collection of these companies. That presents a unique opportunity to go get to this marketplace and to sustain investments in the development of their applications. Return to Table of Contents

34

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

[

Book Review

]

Hello Android: Introducing Google’s Mobile Development Platform

Reviewed by Mike Riley

I recently obtained a developer’s version of the G1 mobile device running the new Android OS for an article entitled “The Android Developer Experience” that I wrote for Dr. Dobb’s. The Pragmatic Bookshelf happened to publish its first book on Google’s new mobile platform around that same time, complementing the Android SDK documentation and the various websites and discussion groups on the subject. Hello, Android: Introducing Google’s Mobile Development Platform, by Ed Burnette, checks in at just over 200 pages, but every page is packed with practical instruction covering most of the features in the Android OS. Burnette walks you through the construction of a Sudoku game complete with properly oriented screens, sound effects, persistent preferences, and 2D graphics. By Chapter 6, readers following along will have a game better than most of the Sudoku attempts available for download from the Android Market. The added bonus, of course, is that a full understanding of the code behind the screens brings readers that much closer to a comfort level with Android to competently write their own applications. The final part of the book consists of four chapters covering more advanced Android functionality, including the use of web services, acquiring GPS, accelerometer, light and magnetic sensor data, leveraging SQLite data binding and programming 3D graphics using Android’s OpenGL libraries. The appendices discuss the Java language subset and, more importantly, which Java libraries are not supported on the current Android OS platform. There’s also a single page DR. DOBB’S DIGEST

Hello Android: Introducing Google’s Mobile Development Platform Ed Burnette Pragmatic Bookshelf, 2009 200 pp., $32.95 ISBN: 978-1-934356-17-3

bibliography listing four recommended titles (two Java and two SQL books) for further reading. Before reading this book, I had taught myself some Android programming for my Dr. Dobb’s article sharing my development experience. I relied predominantly on the Android SDK documentation, several Android-specific developer websites (especially anddev.org) and a lot of trial and error. My job would have been much easier had I read this book first to teach me about coding for this operating system. Even so, the book helped answer some questions I had and showed me how to use features that I hadn’t yet employed in my own programs. After discovering how easy calling upon the digital compass or SQLite functionality was, mash-up ideas quickly flowed with possibilities of new, useful data capture and visualization applications that would be impossible to implement on other mobile platforms lacking such hardware and software integration. Overall, I strongly recommend anyone interested in developing for or taking a closer look at the Android platform to obtain this book. Hello, Android can be digested in a dedicated weekend of reading and building the code along with the author, and will continue to serve as a useful quick reference until an exhaustive reference on the Android OS makes its way to a printed book format.

Return to Table of Contents

35

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

[

Effective Concurrency

]

Use Thread Pools Correctly: Keep Tasks Short and Nonblocking How many threads must a thread pool pool? For a thread pool must pool threads.

by Herb Sutter

W

hat are thread pools for, and how can you use them effectively? As shown in Figure 1, thread pools are about letting the programmer express lots of pieces of independent work as a “sea” of tasks to be executed, and then running that work on a set of threads whose number is automatically chosen to spread the work across the hardware parallelism available on the machine (typically, the number of hardware cores [1]). Conceptually, this lets us execute the tasks correctly one at a time on a single-core machine, execute them faster by running four at a time on a four-core machine, and so on. Besides scalable tasks, one other good candidate of work to run on a thread pool is the small “oneshot” thread. This is work that we might ordinarily express as a separate thread, but that is so short that the overhead of creating a thread is comparable to the work itself. Instead of creating a brand new thread and quickly throwing it away again, we can avoid the thread creation overhead by running the work on a thread pool, in effect playing “rent-athread” to reuse an existing pool thread instead. (See [2] for more about using threads correctly, including running small threads as pool work items.) But the thread pool is a leaky abstraction. That is, the pool hides a lot of details from us, but to use it effectively we do need to be aware of some things a pool does under the covers so that we can avoid inadvertently hitting performance and correctness pitfalls. Here’s the summary up front: • Tasks should be small, but not too small, otherwise performance overheads will dominate. • Tasks should avoid blocking (waiting idly for other events, including inbound messages or contested locks), otherwise the pool won’t consistently utilize the hardware well—and, in the extreme worst case, the pool could even deadlock.

DR. DOBB’S DIGEST

36

Let’s see why.

Tasks Should Be Small, but Not Too Small Thread pool tasks should be as small as possible, but no smaller. One reason to prefer making tasks short is because short tasks can spread more evenly and thus use hardware resources well. In Figure 1, notice that we keep the full machine busy until we start to run out of work, and then we have a ragged ending as some threads complete their work sooner and sit idle while others continue working for a time. The larger the tasks, the more unwieldy the pool’s workload is, and the harder it will be to spread the work evenly across the machine all the time. On the other hand, tasks shouldn’t be too short because there is a real cost to executing work as a thread pool task. Consider this code: // Example 1: Running work on a thread pool. pool.run( [=] { SomeWork(); } );

By definition, SomeWork must be queued up in the pool and then run on a different thread than the original thread. This means we necessarily incur queuing overhead plus a context switch just to move the work to the pool. If we need to communicate an answer back to the original thread, such as through a message or Future or similar, we will incur another context switch for that. Clearly, we aren’t going to want to ship int result = int1 + int2; over to a thread pool as a distinct task, even if it could run independently of other work. It’s just like the sign you see in a theme park at the entrance to the roller coaster: “You must be at least this big to go on this ride.” So although we like to keep thread pool tasks small, a task should still be big enough to be worth April 2009

www.ddj.com

D r. D o b b ’s D i g e s t the overhead of executing it on the pool. Measure the overhead of shipping an empty task on your particular thread pool implementation, and as a rule of thumb, aim to make the work you actually ship an order of magnitude larger.

Tasks Should Avoid Waiting Next, let’s consider a key implementation question: How many threads should a thread pool contain? The general idea is that we want to “rightsize” the thread pool to perfectly fit the hardware, so that we provide exactly as much work as the hardware can actually physically execute simultaneously at any

given time. If we provide too little work, the machine is “undersubcribed” because we’re not fully utilizing the hardware; cores are going to waste. If we provide too much work, however, the machine becomes “oversubscribed” and we end up incurring contention for CPU and memory resources, for example in the form of needless extra context switching and cache eviction. [3] So our first answer might be: Answer 1 (flawed): “A thread pool should have one pool thread for each hardware core.” That’s a good first cut, and it’s on the right track. Unfortunately, it doesn’t take into account that not all threads are always ready to run. At any given time, some

Figure 1: Thread pools are about taking a sea of work items and spreading them across available parallel hardware

Figure 2: Blocking inside a task idles a pool thread

DR. DOBB’S DIGEST

37

threads may be temporarily blocked because they are waiting for I/O, locks, messages, events, or other things to happen, and so they don’t contribute to the computational workload. Figure 2 illustrates why tasks that block don’t play nice with thread pools, because they idle their pool thread. While the task is blocked, the pool thread has nothing to do and probably won’t even be scheduled by the operating system. The result is that, during the time the task is blocked, we are actually providing less parallel work than the hardware could run; the machine is undersubscribed. Once the task resumes, the full load is restored, but we’ve wasted the opportunity to get more work done on otherwise-idle hardware. At this point, someone is bound to ask the natural question: “But couldn’t the pool just reuse the idled thread that’s just sitting blocked anyway, and use it to run another task in the meantime?” The answer is “no,” that’s unsafe in general. A pool thread must run only one task at a time, and must run it to completion. Consider: The thread pool cannot reclaim a blocked thread and use it to run another task, thus interleaving tasks, because it cannot know whether the two tasks will interact badly. For example, the pool cannot know whether the blocked task was using any thread-specific state, such as using thread-local storage or currently holding a lock. It could be disastrous if the interleaved task suddenly found itself running under a lock held by the original task; that would be a fancy way to inject potential deadlocks by creating new lock acquisition orders the programmer never knew about. Similarly, the original task could easily break if it returned from a blocking call only to discover unexpected changes were made to its thread-local state made by an interloping interleaver. Some languages and platforms do provide special coroutine-like facilities (e.g., POSIX swapcontext, Windows fibers) that can let the programmer write tasks that intentionally interleave on the same thread, but these should be thought of as “manual scheduling” and “having multiple stacks per thread” rather than just one stack per thread, and the programmer has to agree up front to participate and has to opt into a restrictive April 2009

www.ddj.com

D r. D o b b ’s D i g e s t programming model to do it safely. [4,5] A general-purpose pool can’t safely just interleave arbitrary tasks on the same thread without the programmer’s active participation and consent. So we actually want to match, not just any threads, but specifically threads that are ready to run, with the hardware on this machine that can run them. Therefore a better answer is: Answer 2 (better): “A thread pool should have one ready pool thread for each hardware core.” Figure 3 illustrates how a thread pool can deal with tasks that block and idle their pool thread. First, a thread pool has to be able to detect when one of its threads goes idle even though it still has a task assigned;

For short threads, we can avoid the thread creation overhead by running the work on a thread pool, in effect playing ‘rent-a-thread’ this can require integration with the operating system. Second, once idling has been detected, the pool has to create or activate an additional pool thread to take the place of the blocked one, and start assigning work to it. From the time the original task blocks and idles its thread to the time the new thread is active and begins executing work, the system has more cores than work that is ready to run, and is undersubscribed. But what happens when the original task unblocks and resumes? Then we enter a period where the system is oversubscribed, with more work than there are cores to run the work. This is undesirable because the operating system will schedule some tasks on the same core, and those tasks will contend against each other for CPU and cache. Now we do the dance in reverse: The thread pool has to be able to detect when it has more ready threads than cores, and retire one of the threads as soon as there’s an opportunity. Once that has been done, the pool is “rightsized” again to match the hardware. The more often tasks block, the more they interfere with the thread pool’s ability to match ready work with available hardware that can execute it.

Figure 3: Thread pools can adapt by adding threads

DR. DOBB’S DIGEST

Tasks Should Avoid Waiting For Each Other The worst kind of blocking is when tasks block to wait on other tasks in the pool. To see why, consider the following code: // Example 2: Launching pathologically interdependent tasks // // First, launch N tasks into the thread pool. Each task // performs some work while blocking twice in the middle. for( i = 0; i < N; ++i ) { pool.run( [=] { // … do some work … phase1[i].wait(); // A: wait point #1 // … and more work … phase1[i+1].signal(); // B // … and more here … phase2[i].wait(); // C: wait point #2 // … and still more … phase2[i+1].signal(); // D } ); } // back on the parent thread… phase1[0].signal(); // release phase 1 phase1[N].wait(); // wait for phase 1 to complete // E: what is the state of the system at this point? phase2[0].signal(); // release phase 2 phase2[N].wait(); // wait for phase 2 to complete

This code launches N tasks into the pool. Each task performs work, much of which can execute in parallel. For example, there is no reason why the first “do some work” section of all N tasks couldn’t run at the same time because those pieces of work are independent and not subject to any mutual synchronization. There is some mutual synchronization, though. Each task has two points where it waits for the previous task—in this toy example it’s just a simple wait; but in real code, this can happen when the task needs to wait for an intermediate result, for a synchronization barrier between processing stages, or for some other purpose. Here, there are two phases of processing. In each phase, each ith task waits to be signaled, does more work, then signals the following i+1th task. Execution proceeds as follows: After launching the tasks, the parent thread kicks them past the first wait simply by signaling the first task. The first task can then proceed to do more work, and signal the next task, and so on until the last task signals the parent that phase 1 is now complete. The parent then signals the first task to wake it up from its second wait, and the phase 2 wakeups proceed similarly. Finally, all tasks are complete and the parent is notified of the “join” and continues onward. Now stop for a moment and consider these questions: • What is the state of the system when we reach line E? • How many threads must the thread pool have to execute this program correctly? What happens if it doesn’t have enough threads? Let’s consider the answers. First, the state of the system at line E is that we have N tasks each of which is partway through its execution. All N tasks must have started, because phase1[N] has been signaled by the last task, which can only happen if the previous task signaled phase1[N-1], and so on. Therefore, each of the N tasks must have performed its line B, and is either still performing the processing between B and C, or else is waiting at line C. (No task can be past C because the parent thread hasn’t kicked off the phase 2 signal cascade yet.) 38

April 2009

www.ddj.com

D r. D o b b ’s D i g e s t

So, then, how many threads must the thread pool have to execute this program? The answer is that it must have at least N threads, because we know there is a point (line E) at which every one of the N tasks must have started running and therefore each must be assigned to its own pool thread. And there’s the rub: If the thread pool has fewer than N threads and cannot create any more, typically because a maximum size has been reached, the program will deadlock because it cannot make progress unless all tasks are running. Just make N sufficiently large, and you can deadlock on most production thread pools. For example, .NET’s ThreadPool.GetMaxThreads returns the maximum number of threads available; the current default is 250 threads per core, so to inject deadlock on a default-configured pool, let N be 250 x #cores + 1. Similarly, Java’s ThreadPoolExecutor lets you set the maximum number of pool threads via a constructor parameter or setMaximumPoolSize; the pool may also fail to grow if the active ThreadFactory fails to create a new thread by returning null from newThread. At this point you may legitimately wonder: “But isn’t Example 2 pathological in the first place? It even says so in the comment!” Actually, Example 2 is fairly typical of many kinds of concurrent algorithms that can proceed in parallel much of the time but occasionally need synchronization points as barriers to exchange intermediate results or enter a new stage of processing. The only thing that’s pathological about Example 2 is that the program is trying to run each work item as a thread pool task; instead, each work item should run on its own thread, and everything would be fine.

Summary Thread pool tasks should be as small as possible, but no smaller. The shorter the tasks, the more evenly they will spread across the pool and the machine, but the more the pertask overhead will start to dominate. Thread pool tasks should still be big enough to be worth the round-trip overhead of shipping them off to a pool thread to execute and DR. DOBB’S DIGEST

shipping the result back, without having the overhead dominate the work itself. Thread pool tasks should also avoid blocking, or waiting for other things to happen, because blocking interferes with the ability of the pool to keep the right number of ready threads available to match the number of cores. Tasks should especially avoid blocking to wait for other tasks in the pool, because when that happens it can lead to worse performance penalties—and, in the extreme worst case, having a large number of tasks that block waiting for each other can deadlock the entire pool, and with it your application.

Notes [1] I’ll use the term “cores” for convenience to denote the hardware parallelism. Some CPUs have more than one hardware thread per core, and so the actual amount of hardware parallelism available = #processors x #cores/processor x #hardware threads/core. But “total number of hardware threads” is a mouthful, not to mention potentially confusing with software threads, so for now I’ll stick with talking about the “total number of cores.” [2] H. Sutter. “Use Threads Correctly = Isolation + Asynchronous Messages” (Dr. Dobb’s Digest, April 2009). Available online at http://www.ddj.com/go-parallel/article/ showArticle.jhtml?articleID=215900465. [3] H. Sutter. “Sharing Is the Root of All Contention” (Dr. Dobb’s Digest, March 2009). Available online at http://www .ddj.com/go-parallel/article/showArticle .jhtml?articleID=214100002. [4] Single UNIX Specification (IEEE Standard 1003.1, 2004). Available online at http://www.opengroup.org/onlinepubs/0096 95399/functions/makecontext.html. [5] Windows Fibers (MSDN). Available online at http://msdn.microsoft.com/en-us/ library/ms682661(VS.85).aspx. —Herb Sutter is a bestselling author and consultant on software development topics, and a software architect at Microsoft. He can be contacted at www.gotw.ca. Return to Table of Contents

39

April 2009

www.ddj.com

Related Documents

April 09
April 2020 8
April 09
April 2020 10
April 09
April 2020 5