Demand Paging on Symbian by Jane Sales
Demand Paging on Symbian by Jane Sales
Reviewed by John Beattie Dan Handley Jenny Keates Jason Parker Jo Stichbury Editor and Typesetter Satu McNabb
Symbian Foundation 1 Boundary Row London SE1 8HP England Visit our Home Page at developer.symbian.org Our email address:
[email protected] This work is licensed under the Creative Commons Attribution-Share Alike 2.0 UK: England & Wales License. To view a copy of this license, visit creativecommons.org/licenses/by-sa/2.0/uk or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. ISBN: 978-1-907253-00-3
Typeset in 11 Arial by Symbian Foundation
Table of Contents About the Author....................................................................................................ix Acknowledgements................................................................................................ix Chapter 1: Introduction........................................................................................1 1.1 Introduction to Demand Paging........................................................................2 Chapter 2: The Advantages and Disadvantages of Demand Paging...............3 2.1 Benefits of Demand Paging..............................................................................3 2.2 Costs of Demand Paging..................................................................................4 Chapter 3: Understanding Demand Paging on Symbian..................................9 3.1 ROMs................................................................................................................9 3.2 NAND Flash......................................................................................................9 3.3 The Composite File System............................................................................10 3.4 XIP ROM Paging.............................................................................................11 3.5 Code Paging...................................................................................................15 3.6 File System Caching.......................................................................................16 3.7 Writeable Data Paging....................................................................................16 3.8 The Paging Algorithm.....................................................................................17
3.9 The Live Page List..........................................................................................17 3.10 XIP ROM Paging: Paging In.........................................................................18 3.11 XIP ROM Paging: Paging Out......................................................................18 3.12 The Paging Configuration.............................................................................19 3.13 Unpaged Files...............................................................................................19 3.14 Paging Cache Sizes......................................................................................20 3.15 Effective RAM Saving...................................................................................21 3.16 Byte-Pair Compression.................................................................................22 Chapter 4: Under the Hood: The Implementation of Demand Paging...........23 4.1 Kernel Implementation....................................................................................23 4.2 Media Driver Support......................................................................................60 4.3 File Server Changes.......................................................................................65 Chapter 5: Enabling Demand Paging on a New Platform...............................77 5.1 Choosing Which Type of Demand Paging to Implement................................77 5.2 Migrating Device Drivers to a Demand-Paged System..................................78 5.3 Guidelines for Migrating Device Drivers..........................................................89 5.4 Media Driver Migration..................................................................................109 5.5 Implementing File Clamping..........................................................................120 Chapter 6: Component Evaluation for Demand Paging................................129
6.2 Dynamic Analysis..........................................................................................129 6.3 Identifying Demand-Paging Problems and Mitigation Techniques...............135 6.4 Symbian’s Pageability Categories................................................................138 Chapter 7: Configuring Demand Paging on a Device...................................141 7.1 Building a Basic Demand-Paged XIP ROM.................................................141 7.2 Building a Basic Code-Paged ROM.............................................................145 7.3 Fine-Grained Configuration...........................................................................149 7.4 Optimizing the Configuration........................................................................152 7.5 Other Demand-Paged ROM Building Features............................................154 7.6 Using the Symbian Reference Configurations..............................................155 Chapter 8: Testing and Debugging in a Demand-Paged Environment........159 8.1 Tracing and Debugging with Demand Paging..............................................159 8.2 Testing...........................................................................................................162 Chapter 9: In Conclusion.................................................................................167
About the Author Jane Sales joined Psion in 1995 to lead the team developing a new operating system, EPOC32, for Psion’s as-yet-unreleased Series 5. A few months later, EPOC32 first booted in Jane’s spare bedroom in Cookham, Berkshire. Jane notes with pleasure that the distribution of Symbian-based devices based on its descendant are rather more widespread today. Jane left Symbian in 2003 to move to the south of France with her husband. She wasn’t allowed to escape completely though – under the olive trees in her garden she wrote her first book, Symbian OS Internals, which was published by Wiley in 2005. Soon afterwards Jane moved to Ukraine where she set up a mobile strategy consultancy, working with Symbian’s Research and Strategy Groups, among others. In 2008, Jane co-founded a company named Ambient Industries, which is developing an original way for people to discover the world around them on their iPhones. (Jane very much hopes that she has still not escaped from Symbian completely, and that her company will port its product to the Symbian platform in the near future.) Two perspicacious British investors funded Ambient Industries in 2009, and Jane now divides her time between Cambridge (the first one) and San Francisco (the famous one).
Acknowledgements Jane would like to thank Jo Stichbury and Satu McNabb for their calm, organized management of this project – an approach that makes making books fun. Jane would also like to thank John Beattie for his painstaking review which exposed many of her silly mistakes. Nevertheless, any errors that remain in the book you are reading are Jane’s alone. Symbian would like to thank Jane for her professional and yet relaxed approach to this project – this book was created, apparently, effortlessly. We’d also thank everyone involved in this project: the reviewers and technical experts, and copyeditor Jenny Keates for her skilled edit.
Introduction
1
1 Introduction This text supplements my earlier book, Symbian OS Internals,1 and provides a comprehensive and highly detailed insight into the workings of demand paging on Symbian. This text will be invaluable for people who are: • Creating a new, demand-paged device. The text gives clear instructions on how to implement demand paging for the first time and on the trade-offs that affect performance and ROM size. • Writing device drivers for a demand-paged environment (or porting them). • Wanting to understand demand paging on Symbian. This detailed commentary on the internals of demand paging will serve a range of readers: students studying real-time operating systems; middleware programmers understanding the behavior of underlying systems and systems engineers comparing Symbian with other similar operating systems. Readers of this book should have a basic understanding of the Symbian kernel architecture, including the process model, threads and memory management. If you are not familiar with this material, I suggest reading Chapters 1 (Introducing EKA2), 3 (Threads, Processes and Libraries) and 7 (Memory Models) of my book.1 To understand code-paging in depth, you should also consider reading Chapters 9 (The File Server) and 10 (The Loader). If your interest lies in device drivers, then Chapter 12 (Device Drivers and Extensions) provides a good grounding in Symbian’s device driver architecture.
1 Symbian OS Internals: Real Time Kernel Programming, Jane Sales et al., 2005, John Wiley & Sons. This entire text for the book can also be found online at developer.symbian.org/wiki/index. php/Symbian_OS_Internals.
2
Demand Paging on Symbian
1.1 Introduction to Demand Paging As Symbian-based devices become increasingly feature-rich, the demand on system resources increases. One important commodity is physical RAM, which contributes significantly to the bill of materials (BOM) of a device. Increasing ROM from 64 MB to 128 MB is not only costly in financial terms (around $3.50 at time of writing) but also in terms of additional power consumption and reduced battery life. Demand paging is a feature of virtual memory systems that makes it possible for pages of RAM to be loaded from permanent storage when they are needed – that is, on demand. When the contents are no longer required, the RAM used to store them may be reused for other content. In this way, the total physical RAM required to store content is less than if all the content were permanently available, and hence the BOM of the device can be reduced. The most important mechanism used in demand paging is the page fault. (Memory is managed in units known as pages, which are blocks of 4 KB on all current architectures.) When an application attempts to access a location that is not present in memory, a page fault occurs. The kernel handles this page fault, reading the missing page from disk into RAM and then restarting the faulting application. All memory that is managed by the kernel in this way is said to be pageable memory, and the process is controlled by an entity known as the paging system. A good source of more information on basic operating system concepts is Andrew Tanenbaum’s Operating Systems: Design and Implementation.2 With an unbecoming lack of modesty, I’ll also suggest my own book, Symbian OS Internals,2 if you’re more interested in the specifics of the Symbian kernel. The Symbian implementation of demand paging has been a huge success. Not only is there considerably more free RAM in the device, but also the ROM boots more quickly, applications start up faster and the stability of the device has increased. So successful was demand paging that it has been back ported two OS generations, to devices that have already been released into the market.
2 Operating Systems: Design and Implementation (Second Edition). Andrew Tanenbaum, 1997, Prentice-Hall.
The Advantages and Disadvantages of Demand Paging
3
2 The Advantages and Disadvantages of Demand Paging 2.1 Benefits of Demand Paging Demand paging can provide a considerable reduction in the RAM usage of the operating system. The actual size of the RAM saving depends considerably on the way demand paging is configured, as you will see. However, it is safe to say that on a feature-rich user-interface (UI) platform it is possible to make multimegabyte RAM savings without a significant reduction in performance. Internal tests show an increase in free RAM at boot from 18.5 MB to 32 MB – a 73% increase. Figures after launching a couple of applications (Web and GPS) are even more impressive. Free RAM increases from 9 MB to 26.5 MB in the demandpaged ROM – a 194% increase. At least two other benefits may be observed. Note that these are highly dependent on the paging configuration. See Section 7 for more details.
2.1.1 Improved Application Start-Up Times Due to Lazy Loading
Usually the cost of servicing a page fault means that paging has a negative impact on performance. But sometimes, on composite file system ROMs (described in Section 3.3), demand paging improves performance, especially when the use case normally involves loading a large amount of code into RAM (for example, when booting or starting large applications). In this case, the performance overhead of paging can be outweighed by the performance gain of loading less code into RAM. This is sometimes known as ‘lazy loading’ of code. For example, the loading time of the Web application decreased from 2.35 seconds to 1.33 seconds – a performance boost of 44%.
4
Demand Paging on Symbian
2.1.2 Improved System Boot Time
When the non-demand-paged case consists of a large core image, most or all of the code involved in a use case will already be permanently loaded, and so there will not be a reduction in application start-up times due to lazy loading. But there is always a big win when booting, where the loading of the whole core image from NAND flash into RAM in the non-demand-paged ROM is a major contributor to the overall boot time. During Symbian tests, typical boot time of a production ROM reduced from 35 seconds to 22 seconds – a performance boost of 37%.
2.1.3 Improved Stability When Out of Memory
A device is often at its least stable when it is out of memory (OOM). Poorly written code may not cope well with exceptions caused by failed memory allocations. As a minimum, an OOM situation will degrade the user experience. If demand paging is enabled on a device, the increased RAM available to applications makes it less likely that the device will run out of memory, thus avoiding many potential stability issues. Furthermore, at any particular time, the RAM saving achieved by demand paging is proportional to the amount of code loaded in the non-demand-paged case. For instance, the RAM saving when five applications are running is greater than the saving immediately after boot. This makes it even harder to induce an OOM situation. This increased stability only applies when the entire device is OOM. Individual threads may still have OOM problems due to reaching their own heap limits, and demand paging will not help in these situations.
2.2 Costs of Demand Paging 2.2.1 Performance
One of the expected downsides to demand paging is a reduction in performance. Demand paging can add an execution overhead to any process that takes a paging fault. Also, paging faults block a thread for a significant and unpredictable time while the fault is serviced, so any thread that can take a paging fault is not suitable for truly real-time tasks. The delay will be in the order of 1ms if the back-
The Advantages and Disadvantages of Demand Paging
5
ing store (media used to store pages that are not in main memory) is not being used for other file system activity. However, in busy systems, the delay for a single fault could be hundreds of milliseconds or more, and should be treated as unbounded for the purpose of any real-time analysis. Note that a paging fault can occur for each separate 4 KB page of memory accessed, so an eight-byte object straddling two pages could cause two paging faults. Threads of lower priority than a pageable thread cannot be guaranteed to be realtime either, because they will never run while the higher-priority thread is ready to run. In addition, the servicing of a paging fault has the thread priority of the NAND media driver (KNandThreadPriority = 24) – so this means that a faulting thread of higher priority than this will effectively have its priority reduced. However, I would be giving an overly pessimistic impression if I did not mention the fact that the paging system as a whole acts as a most recently used cache – pages that have been accessed recently will be present in memory. Since recently accessed pages are more likely to be accessed again, paging faults are relatively rare in practice. To support real-time code, Symbian provides a means to lock the memory in a process (‘wire’ it) so that it is always present and not demand paged. Device drivers must be modified to either access user memory in the context of the client thread (preferable) or use a dedicated DFC thread (so a paging fault in one driver doesn’t affect another). Servers that provide real-time services must have their process memory wired, and be multi-threaded, so that real-time clients are serviced in a separate thread, or they must be changed to reject clients that aren’t wired. Of course, truly real-time applications must also have their process memory wired.
2.2.2 Complexity
Demand paging makes more demands on system software, especially the kernel. The resulting increase in complexity brings a concomitant risk of defects. For example, if the kernel is servicing a paging fault and needs to execute a thread which can itself take a paging fault then deadlock results. (This problem also occurs if the servicing thread needs to block on a service provided by another thread which takes a paging fault, or to block on a mutex held by such a thread – or even if the servicing thread needs to hold a mutex of a higher order than the
6
Demand Paging on Symbian
one held by the faulting thread.) Because this is complex, Symbian applies the simple, safe rule that demand-paged memory must not be accessed when holding any kernel-side mutex. Symbian had to examine and re-write the kernel and the media drivers to avoid this issue – but, fortunately, such code is rare. (And for code paging, of which I shall say more in the next section, Symbian also had to look at file systems, file-system plug-ins and all the other areas of the OS that these use.) That said, we should congratulate the Symbian kernel engineers. They have delivered an extraordinarily robust implementation of demand paging – such is the quality of their delivery that it has needed very few modifications since it was first released.
2.2.3 Increase in ROM Size on Flash Media
A demand-paged ROM tends to be around 10% larger than the same ROM unpaged. This is because the unpaged ROM can be compressed as a single entity. On a demand-paged ROM, each 4 KB page must be able to be decompressed and copied into RAM in isolation, so pages are compressed individually, which is not as efficient.
2.2.4 Increase in Kernel RAM Usage
A demand-paged kernel itself is actually slightly less RAM efficient than its nondemand-paged predecessor. Symbian has modified the structures used by the kernel to manage RAM to make them more efficient and robust, as well as simpler to use and understand. They have been extended to support the implementation of demand paging, file-system caching and future requirements: the result is a RAM increase amounting to 0.58% of the total physical RAM available on a device. (Note: the additional RAM usage will not show up in tools that monitor the size of chunks in the system such as MEMTRACE.) There are also other modest increases in RAM usage due to other supporting features, such as additional DFC threads used by device drivers (amounting to 5 KB per thread) and byte-pair compression support in the loader (amounting to around 20 KB). However, this does not take into account the considerable savings from demand paging itself. You should expect a considerable net decrease in RAM usage when using demand paging!
Understanding Demand Paging on Symbian
7
3 Understanding Demand Paging on Symbian 3.1 ROMs The term ‘ROM’ traditionally refers to memory devices storing data that cannot be modified. These devices also allow direct random access to their contents, so that code can execute from them directly, and is said to be execute in place (XIP). This has the advantage that programs and data in ROM are always available and don’t require any action to load them into memory. On Symbian, the term ROM has developed the looser meaning of ‘data stored in such a way that it behaves like it is stored in read-only memory.’ The underlying media may be physically writeable (RAM or flash memory) but the file system presents a ROM-like interface to the rest of the OS, usually as drive Z.
3.2 NAND Flash The ROM situation is further complicated when the underlying media is not XIP. This is the case for NAND flash, which is used in almost all devices on the market today. Here, it is necessary to copy (or shadow) any code in NAND to RAM for execution. The simplest way to achieve this is to copy the entire ROM contents into RAM during system boot and use the MMU to mark this area of RAM with read-only permissions. The data stored by this method is called the core ROM image (or just core image) to distinguish it from other data stored in NAND. The core image is an XIP ROM and is usually the only one. It is permanently resident in RAM.
8
Demand Paging on Symbian
Figure 1, Layout A on page 9 shows how the NAND flash is structured in this simple case. All the ROM contents are permanently resident in RAM and any executables in the user data area (usually the C: or D: drive) are copied into RAM as they are needed. This approach is costly in terms of RAM usage, so Symbian introduced the composite file system, a more efficient scheme.
3.3 The Composite File System This scheme (broadly speaking) splits the ROM contents into those parts required to boot the OS, and everything else. The former is placed in the core image as before and the latter is placed into another area known as the read-only file system (ROFS). The loader (part of the file server) copies code in the ROFS into RAM as it is needed at run-time, at the granularity of an executable, in the same way as for executables in the user data area. There can be several ROFS images, for example, localization and/or operatorspecific images. Usually, the first one (called the primary ROFS) is combined with the core image into a single ROM-like interface by what is known as the composite file system. In the rest of this document, I will use the term ROM to mean the combined core and primary ROFS images – that is, everything on drive Z. Where it is important to refer to just the core ROM image (or an XIP ROM generally), I will specify this. References to ROFS will mean the primary ROFS unless otherwise stated. For clarity, I shall only consider systems with a single (primary) ROFS. Figure 1 shows snapshots of virtual memory for two demand-paging scenarios. Layout B in Figure 1 shows an ordinary composite file system structure. When comparing this to layout A, you can see that layout B is more RAM-efficient because some of the contents of ROFS are not copied into RAM at any given time. The more unused files there are in ROFS, the greater the RAM saving.
Understanding Demand Paging on Symbian
9
Figure 1: The data copied into RAM for two different NAND flash layouts. The same use case and ROM contents are assumed for each layout.
3.4 XIP ROM Paging Since an XIP ROM image on NAND has to be loaded into RAM in order to run, an opportunity arises to demand page the contents of the XIP ROM. This means that when executing the ROM, we read its data from NAND into RAM on demand.
10
Demand Paging on Symbian
An XIP ROM image is split into two parts, one containing unpaged data and one containing data that are paged on demand. Unpaged data consists of: 1. Kernel-side code 2. All code that should not be paged for other reasons (for example, performance, robustness or power management) 3. The static dependencies of (1) and (2). The terms ‘locked down’ or ‘wired’ are also used to mean unpaged. At boot time, the unpaged area at the start of the XIP ROM image is loaded into RAM as normal, but the virtual address region normally occupied by the paged area is left unmapped. No RAM is allocated for it. When a thread accesses virtual memory in the paged area, it takes a page fault. The page-fault handler in the kernel then allocates a page of physical RAM and reads the contents for this from the XIP ROM image on NAND flash. The thread then continues execution from the point where it took the page fault. This process is called ‘paging in’ and I will describe it in more detail in Sections 3.10 and 4.1.7. When the free RAM on the system reaches zero, the kernel can satisfy memoryallocation requests by taking RAM from the paged-in XIP ROM region. As RAM pages in the XIP ROM region are unloaded, they are said to be ‘paged out’ (I discuss this in Section 3.11). Figure 2 on the next page shows the operations described. All content in the paged data area of an XIP ROM is subject to paging, not just executable code, so accessing any file in this area may give rise to a page fault. Remember that a page may contain data from one or more files, and page boundaries do not necessarily coincide with file boundaries. All non-executable files in an XIP ROM, even those in the unpaged section, will be paged unless they are explicitly configured to be unpaged (see Section 7.3). When XIP ROM paging is used in conjunction with the composite file system, the non-demand-paged NAND flash layout strategy may no longer make sense. If a small core image and a large ROFS image are used, then only a fraction of the files in the overall ROM can benefit from XIP ROM paging. Instead, it is usual to
Understanding Demand Paging on Symbian
11
Figure 2: The operations involved in XIP ROM paging
place most (or even all) files in the core ROM image. The ROFS is mainly used for files that would be in the unpaged area of the core ROM anyway or have too many unpaged dependencies. Figure 3, layout C on page 12 shows a typical XIP ROM paging structure. Although the unpaged area of the core image may be larger than the total core image in Figure 1, layout B, only a fraction of the contents of the paged area needs to be copied into RAM compared to the amount of loaded ROFS code in Figure 1, layout B.
12
Demand Paging on Symbian
Figure 3: The data copied into RAM for two additional NAND flash layouts (compare withFigure 1). The same use case and ROM contents are assumed for each layout.
XIP ROM paging was officially first introduced in Symbian OS v9.3, but was so successful that it was back-ported by some device manufacturers to their devices based on Symbian OS v9.2.
Understanding Demand Paging on Symbian
13
3.5 Code Paging Code paging extends XIP ROM paging, to include other file systems such as the C drive, ROFS and media drives that cannot be removed (you wouldn’t want to page to a memory card that the user might take out of the phone!). Executables that aren’t in XIP ROM are stored as files in those other file systems. Before their code can be executed, it must be copied to RAM. The particular location in RAM cannot usually be determined ahead of time, so the loader, part of the file server, must modify the file’s contents after copying to correct the memory pointers contained within them: the modification is known as ‘relocation’ or ‘fix-up’. This process makes code paging considerably more complex than XIP ROM paging. The additional overhead of code paging means that device manufacturers should usually choose to use XIP ROM paging on their devices where possible. Layout D in Figure 3 shows a typical NAND flash structure when both code paging and XIP ROM paging are used. Only those parts of an executable currently in use are copied into RAM. XIP ROM paging is still used for most data in ROM, and code paging is used for any remaining paged executables in ROFS. It is expected that the primary use for code paging will be for executables in the user data area, such as third-party applications. Pages that are used for code-paged executables never contain the end of one executable and the start of another, as XIP ROM pages might. This means that the last page of an executable is very likely to contain less than 4 KB of data. It is also worth noting that most other operating systems don’t implement code paging in this way. Instead, when they need to page out the contents of executables, they write the modified contents to the backing store so that they can later be recovered. Symbian does not do this because of concerns about power consumption and the wearing of the storage media used for the backing store. Code paging support was originally planned for Symbian OS v9.4, but was such a success that it was actually made available in Symbian OS v9.3.
14
Demand Paging on Symbian
3.6 File System Caching In Symbian OS v9.4, the file server uses file system caching to speed file operations. The file-system cache is built upon the disconnected chunk, which allows physical RAM to be committed and de-committed anywhere within its virtual address space, with no requirement that RAM in the chunk is contiguous. In previous versions of Symbian, physical RAM could be in one of two states – committed and owned by a thread or process, or uncommitted and owned by the kernel. To support file caching, Symbian has introduced a new, intermediate, state – ‘unlocked.’ The kernel may de-commit unlocked memory whenever it needs to, but before this happens a thread may lock the memory again, which returns it to the committed state with its contents preserved. The file-system cache is closely linked to demand paging. When the file server caches a new file, it commits and locks memory, reads the data, and then unlocks the memory, placing the pages at the start of the live list to become the youngest pages in the system. Then, if the file server reads the file again and there is a filecache hit, it locks the cache buffer, removing the pages from the live list. The file server reads the data, and then unlocks the pages again, returning them to the start of the live list. You can see that the cached data for the file will stay in memory as long as it is accessed often enough to stay in the live list. It will only be lost from the live list if there is no free memory in the RAM allocator, and all the other memory used for paging and caching has been more recently accessed than it has – that is, if these cache pages have become the oldest pages in the system.
3.7 Writeable Data Paging Writeable data paging extends demand paging from just paging code and constant date to the paging of writeable data too – data such as user stacks, heaps and other user data stored in chunks. The big difference here is that data is mutable, whereas code is not, so if it is modified while it is paged in, it must be written to a backing store when it is paged out. The backing store is analogous to the ‘swap file’ used to implement the virtual memory scheme on PC systems.
Understanding Demand Paging on Symbian
15
Writeable data paging would offer the greatest RAM saving but would also have the greatest impact on performance. Also, the additional write activity would increase the power consumption of the device and wear the backing media. A smart memory-caching scheme might be required to mitigate this. Writeable data paging could also be used to implement a more traditional codepaging scheme than the one described in Section 3.5. At the time of writing, Symbian does not support writeable data paging.
3.8 The Paging Algorithm All memory content that can be demand paged is said to be ‘paged memory’ and the process is controlled by the ‘paging subsystem.’ A page is a 4 KB block of RAM. Here are some other terms that are used: • Live page – a page of paged memory the contents of which are currently available • Dead page – a page of paged memory the contents of which are not currently available • Page in – the act of making a dead page into a live page • Page out – the act of making a live page into a dead page. The RAM used to store the contents of this page may then be reused for other purposes. In the rest of this section, I’ll give an overview of how the paging subsystem works and introduce some more vocabulary that is useful for understanding later sections.
3.9 The Live Page List Efficient performance of the paging subsystem depends on the algorithm that selects which pages are live at any given time, or conversely, which live pages should be made dead. The paging subsystem approximates a least recently used (LRU) algorithm for determining which pages to page out. All live pages are stored on the ‘live page list,’ which is an integral part of the
16
Demand Paging on Symbian
paging cache. The live page list is split into two sub-lists, one containing young pages and the other old pages – the paging subsystem attempts to keep the ratio of the lengths of the two lists at a value called the ‘young/old ratio.’ The paging subsystem uses the MMU to make all young pages accessible to programs but all old pages inaccessible. However, the contents of old pages are preserved and they still count as being live. If a program accesses an old page, this causes a page fault. The paging subsystem then turns that page into a young page (rejuvenates it), and at the same time turns the last young page into an old page. See Section 4.1.4 for more detail and illustration of this process.
3.10 XIP ROM Paging: Paging In When a program attempts to access a page that is paged out, the MMU generates a page fault and the executing thread is diverted to the Symbian exception handler. This then performs the following tasks: 1. Obtains a page of RAM from the system’s pool of unused RAM (the ‘free pool’) or, if this is empty, pages out the oldest live page and uses that instead 2. Reads the content for this page from some media, such as NAND flash 3. Updates the paging cache’s live list as described in the previous section 4. Uses the MMU to make this RAM page accessible at the correct linear (virtual) address 5. Resumes execution of the program’s instructions, starting with the one that caused the initial page fault. Note that these actions are executed in the context of the thread that tries to access the paged memory. Paging in on a code-paged system is described in Section 4.1.6.2.
3.11 XIP ROM Paging: Paging Out If the system needs more RAM and the free pool is empty, then RAM that is
Understanding Demand Paging on Symbian
17
being used to store paged memory is freed up for use. This is called ‘paging out’ and happens using the following steps: 1. Remove the ‘oldest’ RAM page from the paging cache. 2. Use the MMU to mark the page as inaccessible. 3. Return the RAM page to the free pool. Freeing a page on a code-paged system is described in Section 4.1.6.5.
3.12 The Paging Configuration Demand paging introduces three new configurable parameters to the system. These are: 1. The amount of code and data that is marked as unpaged 2. The minimum size of the paging cache 3. The ratio of young pages to old pages in the paging cache. The first two parameters are the most important and they are discussed in the following sections. The third has a less dramatic effect on the system and is usually left unchanged at its default value of 3. There are existing parameters that also affect how demand paging performs. Optimizing the configuration of all these together is discussed in Chapter 7.
3.13 Unpaged Files It is important that all areas of the operating system that are involved in servicing a paging fault are protected from blocking on the thread that took the paging fault (directly or indirectly). If they are not, a deadlock situation could occur. This is partly achieved in Symbian by ensuring that all kernel-side components are always unpaged. Section 5.2.1 looks at the problem of page faults in kernel-side code in more detail.
18
Demand Paging on Symbian
In addition to kernel-side components, there is likely to be a number of components that you will want to explicitly set as unpaged, so as to meet the functional and performance requirements of the device. The performance overhead of servicing a page fault is unbounded and variable – although typically around 1ms – so some critical code paths may need to be protected by making files unpaged. You might have to make chains of files and their dependencies unpaged to achieve this, but could possibly reduce the set of unpaged components by breaking unnecessary dependencies and separating critical code paths from non-critical ones. If making a component paged or unpaged is a straightforward performance/RAM trade-off, you can make it configurable, thus allowing the decision to be made later based on the system requirements of the particular device.
3.14 Paging Cache Sizes If the system needs more RAM but the free RAM pool is empty, then pages are removed from the paging cache in order to service the memory allocation. This cannot continue indefinitely because a situation will arise where the same pages are continually paged in and out of the paging cache. This is known as page thrashing. Performance is dramatically reduced in this situation. To avoid catastrophic performance loss, a minimum paging cache size can be defined. If a system memory allocation would cause the paging cache to drop below the minimum size, then the allocation fails. As data is paged in, the paging cache grows, but any RAM used by the cache above the minimum size does not contribute to the amount of used RAM reported by the system. Although this RAM is really being used, it will be recycled whenever anything else in the system requires the RAM. So the effective RAM usage of the paging cache is determined by its minimum size. In theory, it is also possible to limit the maximum paging cache size. However, this is not useful in production devices because it prevents the paging cache from using all the otherwise unused RAM in the system. This may reduce performance for no effective RAM saving.
Understanding Demand Paging on Symbian
19
3.15 Effective RAM Saving The easiest way to visualize the RAM saving achieved by demand paging is to compare the most simplistic configurations. Consider a non-demand-paged ROM consisting of a core with no ROFS (as in Figure 1, layout A). Compare that with a demand-paged ROM consisting of an XIP ROM paged core image, again with no ROFS (similar to Figure 3, layout C, but without the ROFS). The total ROM contents are the same in both cases. Figure 4 depicts the effective RAM saving.
Figure 4: The effective RAM saving when paging a simple XIP ROM.
20
Demand Paging on Symbian
The effective RAM saving is the size of all paged components minus the minimum size of the paging cache. Note that when a ROFS section is introduced, this calculation is much more complicated because the contents of the ROFS are likely to be different between the non-demand-paged and demand-paged cases. You can increase the RAM saving by reducing the set of unpaged components and/or reducing the minimum paging cache size, making the configuration more ‘stressed.’ You can improve the performance (up to a point) by increasing the set of unpaged components and/or increasing the minimum paging cache size, making the configuration more ‘relaxed.’ However, if the configuration is too relaxed, it is possible to end up with a net RAM increase compared with a non-demandpaged ROM.
3.16 Byte-Pair Compression The default ‘deflate’ compression algorithm used by Symbian only allows a particular size of compressed data to be decompressed as a whole unit. This is fine when decompressing a complete XIP ROM image or a whole executable, but it is not acceptable for demand paging where only a single page of an image/executable needs to be decompressed during a page-in event. Because of this, Symbian has introduced the ‘byte-pair’ compression algorithm, which allows data to be compressed and decompressed in individually addressable blocks. Each decompressed 4 KB block can then be mapped directly to a page of RAM. (It is only possible to demand page data that is byte-pair-compressed or uncompressed.) Support for byte-pair compression was introduced in Symbian OS v9.2. For more details, see Section 4.1.13.
Under the Hood: The Implementation of Demand Paging
21
4 Under the Hood: The Implementation of Demand Paging 4.1 Kernel Implementation 4.1.1 Key Classes in Demand Paging
The majority of the demand-paging implementation within the kernel lies in a singleton object of class MemModelDemandPaging (xmmu.cpp1), which is derived from DemandPaging (demand_paging.h), which in turn derives from RamCacheBase (ramcache.h). This last provides an interface between the MMU implementation and any form of dynamic use of the system’s free memory, such as demand paging. This interface is mainly used for transferring ownership of physical pages of RAM. The kernel manages the free memory in the system via the class MmuBase (mmubase.h). This uses RamCacheBase to access the paging system and provide RAM from both the free pool, and, if necessary, live pages. To do this, RamCacheBase makes use of MmuBase’s RAM allocator object (DRamAllocator, found in ramalloc.h). The singleton MemModelDemandPaging object is created at system boot (in the MmuBase::Init() method) if demand paging is enabled – otherwise a RamCache object (ramcache.h), which also derives from RamCacheBase, is created as usual. The MMU pages tables are initialized for the demand-paged part of the ROM. The live list is also populated with RAM pages, up to its minimum size. 1 Where possible, I have given the name of the file in which you can find more information about the class in question, although some of these files may not initially be available under the Eclipse Public License. Where filenames are not given, you may find further information about the class or method given by using the code search available on developer.symbian.org.
22
Demand Paging on Symbian
4.1.2 Page-Fault Handler
When a thread accesses demand-paged memory for which the content isn’t currently loaded, an ARM data/prefetch abort is generated. The exception-handling code calls the demand paging-fault handler, which then pages in the memory. This involves the following steps, which are executed in the context of the thread that is trying to access the pageable memory: 1. Obtain a page of physical RAM This is done in the usual way – refer to Symbian OS Internals,2 for more details. 2. Determine the storage media location where the page’s contents are stored To read demand-paged content from media, the kernel must first determine its storage location. For the contents of ROM, it does this using an index stored in the unpaged part of ROM. For non-XIP code, it uses the blockmap structure representing the particular executable. This is stored in the kernel’s code segment object for that executable. The kernel searches an address-sorted list to determine which code segment to use. 3. Read (and decompress) the contents into the RAM page The kernel uses the paging device APIs to perform the media read. I describe these in Section 4.2.2. Demand-paged content may be stored in a compressed format (byte-pair compression) and so may require decompression after reading. 4. Use the MMU to make this RAM page accessible at the correct virtual address The kernel stores all the memory that is currently being used to store demandpaged content in the ‘live list,’ discussed in Section 4.1.4. 5. Resume execution The kernel resumes execution of the code that triggered the page-in event.
4.1.3 Page-Information Structures
Each physical page of RAM managed by the kernel has an associated page2 Symbian OS Internals: Real Time Kernel Programming, Jane Sales et al., 2005, John Wiley & Sons. The entire text of the book can be found online at developer.symbian.org/wiki/index.php/ Symbian_OS_Internals
Under the Hood: The Implementation of Demand Paging
23
information structure, SPageInfo, which maintains information about the current state and usage of the page. As part of the demand-paging and file-system caching implementations, Symbian has modified SPageInfo to make it more efficient, more robust, easier to use and easier to understand. There were two main effects of this work on device manufacturers. The first was a modest increase of RAM usage by the system – around 0.58% of total RAM. The second was an effect on people porting Symbian to new platforms. If you are doing this, you will need to recompile the bootstrap. Implementation Overview In brief: • The SPageInfo structure was simplified so that members only store one datum. • New members were added to SPageInfo to allow the implementation of demand paging and file-system caching. • SPageInfo is now stored in a sparse array, indexed by a page’s physical address. The memory allocation for the array has been moved into the generic bootstrap code (bootmain.s). • All use of page numbers in APIs was removed, because information pertaining to a RAM page can now be directly obtained using only the physical address of that page. • All updates to SPageInfo objects are now performed by functions that assert that the system lock is held, ensuring that all updates have been coded in a safe and atomic manner. • Simulated OOM conditions have been added to low-level memory- management functions. This improves coverage on existing and new OOM testing. Implementation Detail SPageInfo is defined in mmubase.h and looks as follows: struct struct { { enum enum { {
SPageInfo SPageInfo TType TType
24
Demand Paging on Symbian
//
EInvalid=0,// No physical RAM exists for this page EFixed=1, // RAM fixed at boot time EUnused=2, // Page is unused EChunk=3, // iOwner=DChunk* iOffset=index into chunk ECodeSegMemory=4, // iOwner=DCodeSegMemory* // iOffset=index into CodeSeg memory EHwChunk=5, // Not used EPageTable=6, // iOwner=0 // iOffset=index into KPageTableBase EPageDir=7, // iOwner=ASID // iOffset=index into Page Directory EPtInfo=8, // iOwner=0 iOffset=index into iPtInfo EShadow=9, // iOwner=phys ROM page, // iOffset=index into ROM EPagedROM=10, // iOwner=0, iOffset=index into ROM EPagedCode=11, // iOwner=DCodeSegMemory*, // iOffset=index into code chunk EPagedData=12, // NOT YET SUPPORTED EPagedCache=13,// iOwner=DChunk*, iOffset=index into chunk EPagedFree=14, // In demand paging ’live list’ // but not used for any purpose }; enum TState { EStateNormal = 0, EStatePagedYoung = 1, EStatePagedOld = 2, EStatePagedDead = 3, EStatePagedLocked = 4 };
// // // // //
no special state demand paged and on the young list demand paged and on the old list demand paged, currently being modified demand paged but temporarily not paged
... private: TUint8 iType; TUint8 iState; TUint8 iSpare1; TUint8 iSpare2; TAny* iOwner; TUint32 iOffset;
// enum TType // enum TState // owning object // page offset within owning object
Under the Hood: The Implementation of Demand Paging
25
pointer object currently manipulating TAny* iModifier; //// pointer toto object currently manipulating page page TUint32 iLockCount;// non-zero if page acquired TUint32 iLockCount;// if pageof acquired // non-zero by code outside the kernel // by code outside of the kernel TUint32 iSpare3; TUint32 iSpare3; public: public: SDblQueLink iLink; // used for placing page into linked lists SDblQueLink iLink; // used for placing page into linked lists }; };
The iType member indicates how the RAM page is currently being used. The iOwner and iOffset members then typically specify a particular kernel object (for example, DEpocCodeSeg) and location within it. For demand paging, the key TType attributes for a page are: • EPagedROM – Contains contents for demand-paged XIP ROM. The iOffset member contains the page’s index into the ROM, this is (address-of-page – address-of-ROM-header) / 4096. • EPagedCode – Contains contents for demand-paged RAM-loaded code. iOwner points to the DEpocCodeSegMemory object to which the page belongs, and iOffset gives the page’s location within the code chunk (not within the individual CodeSeg memory). • EPagedFree – Page is in demand-paging live list but not being used to store anything. A page is placed in this state when its contents are discarded but it is not possible to return it to the system’s free poll because of locking or performance constraints. The iState member was added to support demand paging and has one of these values from enum TState: • EStatePagedYoung – Page is in the demand-paging live list as a ‘young’ page. • EStatePagedOld – Page is in the demand-paging live list as an ‘old’ page. • EStatePagedDead – Page has been removed from the live list and is in the process of being modified for a new use. • EStatePagedLocked – Page has been removed from the live list to
26
Demand Paging on Symbian
prevent it from being reclaimed for new usage, that is, it is ‘pinned.’ The page’s contents remain valid and mapped by the MMU. The iLink member is used by the paging system to link page-information structures together into the live page list. This member is also re-used to store other data when the page isn’t in the live list – for example, when a page is in the EStatePagedLocked state, iLink.iPrev is used to store a lock count. The iModifier member is described in Section 4.1.3.5. The remaining iSpare members pad the structure size to a power of two and will be used in the implementation of future OS features. Storage Location for SPageInfo Structures SPageInfo objects are stored in a sparse array. A sparse array is an array where nearly all of the elements have the same value (usually zero), which means that the array can be stored more efficiently in computer memory than by using the standard methods. In this particular case, the array is sparse because it represents the physical memory of the system, and this will not all be in use at any one time. The sparse array lives at a fixed virtual address, KPageInfoLinearBase, which means that the virtual address of the SPageInfo for a particular physical address can be calculated efficiently for performing page-table walks in software. The reverse look-up is equally efficient – by knowing the number of SPageInfos that fit on a page (remember that the size of an SPageInfo is always a power of two), we can derive the physical address that a particular SPageInfo represents. The actual functions for translating between an address and an SPageInfo are: // Return the SPageInfo for a given page of physical RAM. inline SPageInfo* SPageInfo::FromPhysAddr(TPhysAddr aAddress) { return ((SPageInfo*)KPageInfoLinearBase)+(aAddress>>KPageShift); } // Return physical address of the RAM page with which
Under the Hood: The Implementation of Demand Paging
27
// Return physical address of the RAM page with which // this SPageInfo object is associated inline TPhysAddr SPageInfo::PhysAddr() { return ((TPhysAddr)this)<
As the page info array is sparse, an invalid physical address may cause an exception when indexing the array. So to make it possible to validate physical addresses, a bitmap is also made available at address KPageInfoMap. This contains set bits for each 4 KB page of RAM in the array which is present in memory and is used by SPageInfo::SafeFromPhysAddr(), which performs the same task as SPageInfo::FromPhysAddr() but returns NULL if the physical address is invalid rather than causing a data abort. Before Symbian implemented demand paging, page-information structures were stored in a flat array, and provided a few non-trivial functions to translate between a physical page address and an array index (that is, a page number). Because of this, many internal APIs passed page numbers as additional arguments to avoid the translation process. With the new storage method for SPageInfo objects, all APIs can operate efficiently if they use only the physical addresses of pages, so we have removed all page-number arguments and methods. Initialization The bootstrap code (in bootmain.s) creates both the array at KPageInfo LinearBase and the bitmap at KPageInfoMap. Initially, all SPageInfo structures have type EInvalid. During kernel boot, the MmuBase::Init2() function sets the correct state for each page by scanning the RAM bank list provided by the bootstrap, and scanning all MMU page tables for RAM already allocated. Locking and Concurrency Handling Modifications to any SPageInfo must be performed while holding the system lock, which guarantees exclusive access to the object. Unfortunately, some operations on pages are long running or unbounded – for example, where the kernel must map demand-paged code into many processes. In situations such as these, we cannot keep the system locked, because this would affect the real-time performance of the system. In this situation, system
28
Demand Paging on Symbian
software should flash (release/re-acquire) the system lock and then detect if another thread has modified the page in question. This is done using the iModifier member. The kernel sets iModifier to zero whenever the SPageInfo object is changed in any way. So if a thread sets this to a suitable unique value (for example, the current thread pointer) then that thread may flash the system lock and then find out whether another thread has modified the page by checking whether iModifier has changed. The kernel provides functions for this – SetModifier() and CheckModified(). An example of their use is as follows: NKern::LockSystem(); SPageInfo* thePageInfo = GetAPageInfo(); NThread* currentThread = NKern::CurrentThread(); // this thread is modifying page’s usage thePageInfo->SetModifier(currentThread); while(long_running_operation_needed) { do_part_of_long_running_operation(thePageInfo); if(NKern::FlashSystem() && pageInfo->CheckModified(currentThread)) { // someone else got the System Lock and modified our page... reset_long_running_operation(thePageInfo); } } NKern::UnlockSystem();
4.1.4 Live Page List
Efficient performance of the paging subsystem depends on choosing a good algorithm for selecting the pages that are live at any given time and, conversely, the live pages that should be made dead. The paging system approximates an LRU algorithm for deciding which pages to page out.
Under the Hood: The Implementation of Demand Paging
29
All live pages are stored on the ‘live page list,’ which is a linked list of SPageInfo objects, each of which refers to a specific page of physical RAM on the device. The list is ordered chronologically by time of last access, to enable a least recently used (LRU) algorithm to be used when discarding paged content. To keep the pages in chronological order, the kernel needs to detect when they are accessed. It does this by splitting the live list into two sub-lists, one containing ‘young’ pages and the other ‘old’ pages. It uses the MMU to make all young pages accessible, but old pages inaccessible. However, the contents of old pages are still preserved, and these pages are still considered to be live. When an old page is next accessed, there is a data abort. The fault handler can then simply move the old page to the young list and make it accessible again so the program can continue as normal. The net effect is of a first-in, first-out list in front of an LRU list, which results in less page churn than a plain LRU. This is shown in more detail in Figure 5:
Figure 5: Live Page Lists – Young and Old Pages
When a page is paged in, it is added to the front of the young list, making it the youngest page in the system, as shown in Figure 6:
Figure 6: Paging-In Page J
The paging system aims to keep the relative sizes of the two lists equal to a value called the ‘young/old ratio.’ So, if this ratio is R, the number of young pages is N, and the number of old pages is No, then if Ny > RNo a page is taken from the back of the young list and placed at the front of the old list. This process is called ‘aging’ and is shown in Figure 7:
30
Demand Paging on Symbian
Figure 7: Aging Page F
If a program accesses an old page, it causes a page fault, because the MMU has marked old pages as inaccessible. The fault handler then turns the old page into a young page (‘rejuvenates’ it), and, at the same time, turns the last young page into an old page. This is shown in Figure 8.
Figure 8: Rejuvenating Page H (and Aging Page E)
When the kernel needs more RAM (and the free pool is empty), it needs to reclaim the RAM used by a live page. In this case, the oldest live page is selected for paging out, turning it into a dead page, as Figure 9 shows.
Figure 9: Paging Out Page I
If paging out leaves the system with too many young pages according to the young/old ratio, then the kernel would age the last young page on the list (in Figure 9, that would be Page D). The net effect of this is that if the page is accessed at least once between every page fault, it will just cycle around the young list. If it is accessed less often, relative to the page-fault rate, it will appear in the old list – appearing further and further back the less often it is accessed.
4.1.5 RAM Cache Interface
The kernel manages the free memory in the system via the class MmuBase,
Under the Hood: The Implementation of Demand Paging
31
which in turn uses a DRamAllocatorBase object, iRamPageAllocator. Formerly, this object managed all the unused RAM pages in the system, but with the advent of demand paging, there may also be free memory available from the live list. These two memory pools interact through the class RamCacheBase, which provides: • An abstract interface allowing the MmuBase class to make requests of the paging system. (The class DemandPaging derives from RamCacheBase and implements this interface.) • Implementation of functions for the paging system to make requests of the MmuBase class. The class definition for RamCacheBase is as follows: class RamCacheBase class RamCacheBase { { public: public: // Initialisation called during MmuBase:Init2. // Initialisation called during MmuBase:Init2. virtual void Init2(); virtual void Init2(); // Initialisation called from M::DemandPagingInit. // Initialisation called from M::DemandPagingInit. virtual TInt Init3()=0; virtual TInt Init3()=0; // Remove RAM // Remove RAM virtual TBool virtual TBool
pages from cache and return them to the system. pages from cache and return them to the system. GetFreePages(TInt aNumPages)=0; GetFreePages(TInt aNumPages)=0;
// Attempt to free-up a contiguous region of pages // Attempt to free-up a contiguous region of pages // and return them to the system. // and return them to the system. virtual TBool GetFreeContiguousPages(TInt aNumPages, TInt aAvirtual TBool GetFreeContiguousPages(TInt aNumPages, lign) =0; TInt aAlign) =0; // Give // Give virtual virtual
a RAM page to the cache system for a RAM page to the cache system for void DonateRamCachePage(SPageInfo* void DonateRamCachePage(SPageInfo*
managing. managing. aPageInfo)=0; aPageInfo)=0;
// Attempt to reclaim a RAM page given to // Attempt to reclaim a RAM page given to // the cache with DonateRamCachePage. // the cache with DonateRamCachePage. virtual TBool ReclaimRamCachePage(SPageInfo* aPageInfo)=0; virtual TBool ReclaimRamCachePage(SPageInfo* aPageInfo)=0;
32
Demand Paging on Symbian
/* Called by MMU class when a page is unmapped from a chunk. */ // Called by MMU class when a page isaPageInfo)=0; unmapped from a chunk. virtual TBool PageUnmapped(SPageInfo* virtual TBool PageUnmapped(SPageInfo* aPageInfo)=0; // Return the maximum number of pages which could be // Return the maximum number of pages which could be obtained with GetFreePages. // obtained GetFreePages. { return iNumberOfFreePages; } inline TInt with NumberOfFreePages() inline TInt NumberOfFreePages() { return iNumberOfFreePages; } // Put a page back on the system’s free pool. // Put a page back on the system’s free pool. void ReturnToSystem(SPageInfo* aPageInfo); void ReturnToSystem(SPageInfo* aPageInfo); // Get a RAM page from the system’s free pool. // Get a RAM page from the system’s free pool. SPageInfo* GetPageFromSystem(); SPageInfo* GetPageFromSystem(); public: public: MmuBase* iMmu; // Copy of MmuBase::TheMmu MmuBase* iMmu; // Copy of// MmuBase::TheMmu TInt iNumberOfFreePages; Number of pages that could TInt iNumberOfFreePages; // Number ofby pages that could be freed GetFreePages() // be freed GetFreePages() static RamCacheBase* TheRamCache; // by Pointer to the single static RamCacheBase* TheRamCache; // Pointer the single RamCacheto object // RamCache object }; };
Memory Allocation When the kernel wants to allocate memory, it uses MmuBase::AllocRamPages(), which requests a number of pages from the RAM allocator object. If there are insufficient pages to meet this request then this method calls RamCacheBase::GetFreePages(). This is actually implemented in DemandPaging::GetFreePages(). DemandPaging::GetFreePages() checks to see if the live list has the required number of spare pages; and if so, selects the oldest pages to return to the main RAM allocator using RamCacheBase::ReturnToSystem(). The RAM allocator then completes the original memory-allocation request. If a memory allocation requires physically contiguous memory, then the corresponding methods are MmuBase::AllocContiguousRam() and RamCacheBase::GetFreeContiguousPages().
Under the Hood: The Implementation of Demand Paging
33
Dynamic RAM Cache The unused RAM in the system needs to be managed so that it is available for allocating to programs as well as being used for demand paging and/or the filesystem cache. This is done using the singleton RamCacheBase object, which allows the paging system to request memory from the RAM allocator, and vice versa. This object is created at system boot. The kernel’s existing memory-allocation functions have been modified so that upon allocation failure they attempt to reclaim RAM from the paging system and the file-system cache via RamCacheBase. Similarly, the free RAM size APIs have been updated to account for memory used for the paging system and file-system cache. When file-cache memory is unlocked, the kernel calls DonateRamCachePage() to place each selected page on the live list. These pages are placed at the front of the list, becoming the youngest pages in the system, and they are given the type EPagedCache. When the memory is locked again, the kernel calls ReclaimRamCachePage() to remove the pages from the live list, assuming they are still present. Both of these functions are implemented in the class DemandPaging. Whenever memory is decommitted from a chunk by the ArmMmu::UnmapPages() method, the kernel calls the RamCacheBase::PageUnmapped() method implemented in class MemModelDemandPaging. This checks if the RAM page is currently on the live list as type EPagedCache. If it is, the page is unmapped from the chunk and placed on the end of the live list as type EPagedFree. It is now the oldest page in the system and so will become the first page to be reclaimed when free RAM is required. We use this mechanism rather than using ReturnToSystem() because it is faster, and we wish to keep execution time to a minimum as the system lock is held. Allocating Memory for Paging Whenever the paging system needs RAM to store demand-paged content, it makes a request to the RAM allocator via RamCacheBase::GetPageFromSystem(). If there is no memory available, the oldest page in the live list is reclaimed instead.
34
Demand Paging on Symbian
Free Memory Accounting During system start-up, the bootstrap calls the RamCacheBase::Init2() method, and the DemandPaging class initializes itself by populating the live list with the configured minimum number of pages. The amount of memory used for demand paging or file-system caching will never go below this level. RamCacheBase’s member iNumberOfFreePages is the count of the excess number of in-use pages over this minimum – that is, it is the number of pages that the kernel can safely reclaim to satisfy memory-allocation requests. These excess pages are counted as free memory by the system, even though at a particular time they may be being used to store demand-paged or file-system cache content. iNumberOfFreePages is incremented by GetPageFromSystem() and DonateRamCachePage() to count pages added to the paging system, and decremented by ReturnToSystem() and ReclaimRamCachePage() as pages are removed.
4.1.6 Code Paging
The code paging of RAM-loaded executables described in this section was first supported in Symbian OS v9.3 (although originally intended for Symbian OS v9.4). XIP ROM paging, which is a subset of this functionality, was also first supported in Symbian OS v9.3, and is discussed in Section 4.1.7. DCodeSeg When an executable binary is loaded into the system, the kernel creates a DCodeSeg (kern_priv.h) object to represent it. This mainly contains two broad groups of information: • The dependency graph (DCodeSeg objects representing binaries that this binary links against) • The memory used to store the contents of the executable. During implementation, we realized that complexity could be reduced by moving the second form of information into a separate object, DEpocCodeSegMemory (plat_priv.h). This insulates the demand paging implementation from lifetime and locking issues associated with the main DCodeSeg object.
Under the Hood: The Implementation of Demand Paging
35
Paging In When a demand-paged DCodeSeg is first loaded, the contents of the .text section of the executable are not present in RAM. All that exists is a reserved region of virtual address space in the code chunk. This means that when a program accesses the contents, the MMU will generate a data abort. The exception handler calls MemModelDemandPaging::HandleFault(), which then has to obtain RAM, copy the correct contents to it and map it at the correct virtual address. This is known as ‘paging in.’ Paging in consists of the following steps: 1. Check the MMU page table entry for the address that caused the abort. If the entry is KPteNotPresentEntry then there is no memory mapped at this address and it may need paging in. 2. Verify that the exception was caused by an access to the code chunk memory region. 3. Find the DCodeSeg which is at this address by searching the sorted list DCodeSeg::CodeSegsByAddress.Find(aFaultAddress). 4. Verify that the DCodeSeg is one that is being demand paged. 5. Call MemModelDemandPaging::PageIn(), which then performs the following steps: 6. Obtain a DemandPaging::DPagingRequest (demand_paging.h) object by using DemandPaging::AcquireRequestObject(). 7. Obtain a physical page of RAM using DemandPaging::AllocateNewPage(). 8. Map this RAM at the temporary location DemandPaging::DPagingRequest::iLoadAddr. 9. Read correct contents for into RAM page by calling DemandPaging::ReadCodePage(). 10. Initialize the SPageInfo structure for the physical page of RAM, marking it as type EPagedCode. 11. Map the page at the correct address in the current process. 12. Add the SPageInfo to the beginning of the live page list. This marks it as the youngest (most recently used) page. 13. Return, and allow the program that caused the exception to continue to execute.
36
Demand Paging on Symbian
In the multiple memory model, there are separate MMU mappings for each process into which a DCodeSeg is loaded – but step 11 in the previous sequence updates the mapping only for the process that caused the page in to occur. This is a deliberate decision because: • It avoids the overhead of updating a large number of MMU mappings, possibly unnecessarily. • The pseudo-LRU algorithm used by the demand-paging implementation is improved if accesses to a page by other processes generate their own page faults. To avoid having duplicate copies of the same DCodeSeg page, the multiple memory model keeps a list of pages which a DCodeSeg has paged in (DMemModelCodeSegMemory::iPages). Then, before calling MemModelDemandPaging::PageIn() in step 5 of the previous sequence, it checks to see if there is already a page loaded and if so, simply maps that page into the current process and ends. Aging a Page The pseudo-LRU algorithm used by demand paging means that as pages work their way down the live list, they eventually reach a point where they change from young to old. At this point, the kernel changes the MMU mappings for the page to make it inaccessible. It does this using the method MemModelDemandPaging::SetOld() (xmmu.cpp). In the moving memory model, SetOld() simply has to find the single page table entry (MMU mapping) for the page in the user code chunk, and clear the bits KPtePresentMask. In the multiple memory model, there can be many page table entries that need updating. In this case, the kernel calls the method DoSetCodeOld()(xmmu.cpp) to actually do the work. DoSetCodeOld() examines the bit array DMemModelCodeSegMemory::iOsAsids to determine the processes into which the DCodeSeg is loaded, and then updates each mapping in turn. Because the system lock must be held, this can affect the real-time performance of the system, and so the technique described in Section 4.1.3.5 is used. If the page’s status changes before DoSetCodeOld()has modified the mappings in all processes, then DoSetCodeOld() simply ends. This is the right thing to do, because the page’s status can change in one of two ways: either
Under the Hood: The Implementation of Demand Paging
37
by the page being rejuvenated (which I’ll discuss next) or by the page being removed from the live list (ceasing to be demand paged). Both of these events logically supersede any page-aging operation. Rejuvenating a Page When a program accesses an old page, it generates a data abort because the MMU has marked these pages as inaccessible. The fault handler (MemModelDe mandPaging::HandleFault) deals with this using the following sequence of actions. 1. Get the MMU page table entry for the address that caused the abort. If the bits KPtePresentMask are clear, then this is an old page that needs rejuvenating. (If all bits are clear, then the page needs paging in instead. 2. Find the SPageInfo for the page, using the physical address stored in the page table entry. 3. If state of page is EStatePagedDead, then change the page table entry to KPteNotPresentEntry and proceed to the paging-in operation (described in Section 4.1.6.2) instead of rejuvenating it. This is because a dead page is in the process of being removed from the live list and it should be treated as though it were not present. 4. Otherwise update the page table entry to make the page accessible again. 5. Move the page’s SPageInfo to the beginning of the live page list. This marks it as the youngest page in the system. Similarly to paging in, we only update the page table entry for the current process. The whole rejuvenation operation is performed with the system lock held. Freeing a Page When a physical page of RAM that holds demand-paged code is needed for other purposes, it must be free by calling MemModelDemandPaging::SetFree(). This mainly involves setting all page table entries that refer to the page to KPteNotPresentEntry. However, again the multiple memory model needs to update many page table entries, so Symbian factored this implementation out into a separate method called DoSetCodeFree(). Unlike the rejuvenation code, this method does not
38
Demand Paging on Symbian
need to pay attention to whether pages are changed while it is processing them. This is because all pages that are being freed have their state set to SPageInfo:: EStatePagedDead first. This prevents other parts of the demand-paging implementation from changing the page. The free operation is performed with the system lock held. However, as is the case for the rejuvenation code, DoSetCodeOld() flashes the system lock while freeing, and this makes it possible for the code segment to be unloaded and its DMemModelCodeSegMemory destroyed while these data structures are being used. To prevent this situation, SetOld() must be called with the RAM-allocator mutex held. As the destructor for DMemModelCodeSegMemory also acquires this mutex during its operation, it cannot complete while one of its (former) RAM pages is in the process of being freed.
4.1.7 ROM Paging
The kernel demand pages the contents of ROM in a similar way to code paging, but ROM has many attributes that make the implementation much simpler, namely: • The virtual address and the size of the ROM is fixed and known at boot time. So it is a trivial matter to determine whether a particular memory access occurred in demand-paged memory or not. • The ROM cannot be unloaded, so the kernel does not need to guard against as many race conditions. • The ROM is globally mapped by the MMU, so even in the multiple memory model there is only a single MMU mapping that needs updating when the demand-paging subsystem manipulates pages of RAM. ROM paging was first supported in Symbian OS v9.3. ROM Format When the ROMBUILD tool generates ROM images for demand paging, it divides the contents of the ROM into two sections. All unpaged content is placed in the first section, with paged content following it. The size of the two sections is stored in the ROM header – the unpaged part is stored at ROM offset zero through TRomHeader::iPageableRomStart and the paged part is stored at ROM offset TRomHeader::iPageableRomStart through
Under the Hood: The Implementation of Demand Paging
39
TRomHeader::iUncompressedSize. For each 4 KB page of ROM, there is an entry (SRomPageInfo) in the array TRomHeader::iRomPageIndex giving information about the storage location and compression used for that page’s contents. This index is stored in the unpaged part of ROM. If a ROM image is not to be demand paged, iPageableRomStart and iRomPageIndex are both zero. Initialization The first, unpaged, part of ROM is loaded into RAM by the system’s boot/code loader before any Symbian code is executed. Then, during kernel start-up, MemModelDemandPaging::Init3()(xmmu.cpp) initializes ROM paging. This method checks the ROM header information and allocates MMU page tables for the virtual memory region to which the ROM will be mapped. Paging In a ROM Page When paging in ROM, it is easy to find out which page is needed by performing pointer arithmetic on the virtual address being accessed. The storage information for this page can then be obtained from the array stored at TRomHeader::iRomPageIndex. Each entry in this array has this structure:
// e32rom.h // e32rom.h struct SRomPageInfo struct SRomPageInfo { { enum TAttributes enum { TAttributes { EPageable = 1<<0 EPageable = 1<<0 }; }; enum TCompression enum{TCompression { ENoCompression, ENoCompression, EBytePair, EBytePair, }; }; TUint32 iDataStart; TUint32 iDataStart;
40
Demand Paging on Symbian
TUint16 iDataSize; TUint8 iCompressionType; TUint8 iPagingAttributes; };
iDataStart gives the offset from the start of the ROM for the page, and iDataSize gives the number of bytes actually used. iCompressionType indicated how the data is compressed. This can be either ENoCompression, in which case iDataSize is always 0x1000 (4 KB), or it may be EBytePair, in which case iDataSize indicates the size of the compressed data. iPagingAttributes either contains EPageable to indicate the page is demand paged, or zero if not. (There are entries in iRomPageIndex for the unpaged part of the ROM, even though the paging system doesn’t access them. This is to allow them to be used by the boot/core loader.) As the ROM image is always stored in its own partition, starting at storage block zero, those attributes are all that is required to locate, read and decompress the data for a ROM page.
4.1.8 Blockmap Data Structures
When a thread takes a fault in paged code, the kernel must load the data corresponding to that page from the relevant file on the media drive. Because the kernel doesn’t use the file server for this, there needs to be a way for it to work out which parts of the media to read. This calculation is complicated by several factors. The data may or may not be compressed, so its size is not necessarily the same as the size of the page. The data can start at any offset into the file, because the file contains a small header and any number of previous (possibly compressed) pages. And finally, the file itself may be split over multiple, discontinuous sectors on the media. What is needed is a representation of how the file is laid out on the media. We call this abstraction a ‘blockmap,’ because it is structured in terms of blocks that
Under the Hood: The Implementation of Demand Paging
41
roughly correspond to sectors on the media. It defines a logical-to-physical mapping for some portion of a file. There are two types of blockmap used in the demand paging system: the userside blockmap and the kernel blockmap. User-Side Blockmap The file system provides the user-side blockmap to the loader, which in turn passes it to the kernel. The user-side blockmap is defined by the SBlockMapInfoBase and TBlockMapEntryBase classes. It consists of a single context structure SBlockMapInfoBase that contains information relating to the blockmap as a whole, and a series of TBlockMapEntryBase structures describing the file layout. These structures are defined in e32ldr.h. // // e32ldr.h e32ldr.h struct struct SBlockMapInfoBase SBlockMapInfoBase { { TUint TUint iBlockGranularity; iBlockGranularity; TUint TUint iBlockStartOffset; iBlockStartOffset; TInt64 TInt64 iStartBlockAddress; iStartBlockAddress; TInt TInt iLocalDriveNumber; iLocalDriveNumber; }; }; class class TBlockMapEntryBase TBlockMapEntryBase { { public: public: TUint TUint iNumberOfBlocks; iNumberOfBlocks; TUint TUint iStartBlock; iStartBlock; }; };
The user-side blockmap describes a series of contiguous runs of blocks by one or more TBlockMapEntryBase structures. Each one holds the number of blocks in the run (iNumberOfBlocks) and the index of the first block in the run (iStartBlock). The SBlockMapInfoBase structure defines context that applies to the whole blockmap. The size of each block in bytes is given by iBlockGranularity. This will typically be the sector size for the media. The address of the first block on the partition is held in iStartBlockAddress – this is a byte offset from the
42
Demand Paging on Symbian
start of the partition. This is necessary because some file systems (for example, FAT) have data before the first sector, and the size of this data may not be a multiple of the block size. The blockmap does not have to start at offset zero in a file, or even if it does, the first byte of the file may not lie on a block boundary, so iBlockStartOffset is the offset of the first byte represented by the blockmap from the first block given. Finally, iLocalDriveNumber indicates the media the file is stored on. It is a local drive number (rather than the more widely used drive numbers that correspond to letters of the alphabet). This is because media drivers only operate in terms of the underlying local drive number. The Kernel Blockmap The kernel blockmap is defined by the TBlockMap class in kblockmap.h. The data members are shown as follows: class TBlockMap class TBlockMap { { struct SExtent struct SExtent { { TInt iDataOffset; TInt iDataOffset; TUint iBlockNumber; TUint iBlockNumber; }; }; TInt iDataLength; TInt iDataLength; TInt iExtentCount; TInt iExtentCount; SExtent* iExtents; SExtent* iExtents; // ... // ... }; };
While the user-side blockmap is stored as a list of runs of blocks starting at a particular block, the kernel blockmap is stored as a list of logical file offsets that start at a particular block. The list is ordered, so the length of a run can be found by calculating the difference between successive file offsets in the list. The kernel creates the kernel blockmap from the user-side blockmap so that it can look-up physical media locations efficiently, finding the block corresponding to a particular file offset via a binary search.
Under the Hood: The Implementation of Demand Paging
43
The kernel blockmap may have a different block size to its user-side equivalent, because the kernel needs to communicate with media drivers in terms of read units, which may be different to the sector size of the media. The read unit size is usually 512 bytes. The kernel blockmap provides a method that reads part of a file into memory. It takes a logical file offset and a size describing the area of the file to read, a buffer, and a function that performs the actual task of reading blocks from the media.
4.1.9 Code Segment Initialization
A code segment is a kernel data structure representing a unit of executable code, whether it is XIP or RAM loaded, or an executable, a DLL or a device driver. Code segments can be mapped into individual processes or they can be mapped globally, so that they appear in the address space of all processes. Code segments can be mapped into more than one process at a time, so that (in general) when multiple processes load the same library, only one code segment will be used. In this section, I will discuss the use of code segments for code-paged executables. All references to code segments will refer to non-global RAM-loaded code segments. When a new executable or DLL is loaded, a code segment object is created and initialized. This is a three-stage process. First the loader calls the kernel to create the code segment, then it loads the code into memory and fixes it up, and finally it calls the kernel a second time to complete initialization and indicate that the code can now be mapped into processes ready for execution. For a demand-paged code segment, the procedure is similar, except that the loader does not actually load the code itself. Instead, it provides the necessary information to the kernel, which performs load and fix-up on demand. The three stages are still necessary because the loader needs to access the code itself to generate the code-relocation table. I describe these stages in detail in the next section.
44
Demand Paging on Symbian
Code Segment Creation The loader asks the kernel to create the code segment by E32Loader::CodeSegCreate(). This results in a call to DCodeSeg::Create(), which in turn calls DEpocCodeSeg::DoCreate() and ultimately DMemModelCodeSeg::DoCreateRAM(). E32Loader::CodeSegCreate() passes a TCodeSegCreateInfo structure, defined in e32ldr.h, to the kernel. This is also passed to E32Loader::CodeSegLoaded(), which is called in the third stage. The data members relevant to code paging are: class TCodeSegCreateInfo class TCodeSegCreateInfo { { TUint32* iCodeRelocTable; TUint32* iCodeRelocTable; TInt iCodeRelocTableSize; TInt iCodeRelocTableSize; TUint32* iImportFixupTable; TUint32* iImportFixupTable; TInt iImportFixupTableSize; TInt iImportFixupTableSize; TUint32 iCodeDelta; TUint32 iCodeDelta; TUint32 iDataDelta; TUint32 iDataDelta; TBool iUseCodePaging; TBool iUseCodePaging; TUint32 iCompressionType; TUint32 iCompressionType; TInt32* iCodePageOffsets; TInt32* iCodePageOffsets; TInt iCodeStartInFile; TInt iCodeStartInFile; TInt iCodeLengthInFile; TInt iCodeLengthInFile; SBlockMapInfoBase iCodeBlockMapCommon; SBlockMapInfoBase iCodeBlockMapCommon; TBlockMapEntryBase* iCodeBlockMapEntries; TBlockMapEntryBase* iCodeBlockMapEntries; TInt iCodeBlockMapEntriesSize; TInt iCodeBlockMapEntriesSize; RFileClamp iFileClamp; RFileClamp iFileClamp; }; };
The location of the executable file on the media is given by a user-side blockmap, which is made up of iCodeBlockMapCommon, iCodeBlockMapEntries and iCodeBlockMapEntriesSize. The position of the text section in the executable file is given by iCodeStartInFile and iCodeLengthInFile. If the text section is compressed, iCompressionType is set to the unique identifier (UID) of the compression type. Otherwise, it is set to KFormatNotCompressed. Symbian OS v9.4 supports only byte-pair compression (KUidCompressionBytePair). Individual pages of code are compressed
Under the Hood: The Implementation of Demand Paging
45
independently, and iCodePageOffsets is used to pass a look-up table of the offsets from the start of the text section to the compressed data for each page. The length of this table is calculated from the size of the code. A table of code-relocation information is passed in iCodeRelocTable, its size indicated by iCodeRelocTableSize. Similarly, import fix-up information is passed in iImportFixupTable and iImportFixupTableSize. iCodeDelta and iDataDelta contain the offsets to be added to code and data relocations. These are not valid during the initial call to CodeSegCreate, but are filled in later when CodeSegLoaded is called. The loader passes iFileClamp to stop the file being deleted while it is being used for paging, and sets the flag iUseCodePaging to tell the kernel to page this code segment. When the kernel creates a non-paged code segment, its main task is to allocate RAM for the code and map it into the fileserver process (in which the loader thread runs). This is done using conventional means – in the moving memory model, the kernel commits memory to the global code chunk, and in the multiple memory model, it allocates physical RAM pages then maps them into the fileserver process’s code chunk. For a demand-paged code segment, the kernel must allocate not physical RAM but address space in the appropriate chunk. This is done using a new commit type ‘ECommitVirtual.’ This marks part of the chunk’s address space as used, so that nothing else may be committed there. For non-paged code segments, the loader would load any static data directly following the code in memory. This is not possible with code paging, because the kernel loads code on demand but the loader loads the data. So the kernel allocates memory, starting at the page after the end of the code, ready for the loader to load the data into. As well as committing address space for the code, the kernel must initialize its internal data structures and read all the relevant parts of TCodeSegCreateInfo. It copies the code page offsets table to the kernel heap, and creates a kernel blockmap from the user-side version passed. Finally, the code segment
46
Demand Paging on Symbian
is marked as paged – from now on any read access will cause code to be loaded from media and decompressed, although no relocation or fix up will be performed yet. Loading of Code and Data For a non-paged code segment, this is when the code itself is loaded. The loader reads text and data sections and decompresses them into the RAM allocated by the kernel in the previous stage. It relocates the code, loads dependencies and fixes up imports. For a paged code segment, the loader does not need to load the code, because the kernel handles this later, when a fault is taken. This means that the kernel must do the relocations and fix-ups, so the loader must now generate the data the kernel will need to perform these operations. The loader does load static data now, to the address allocated by the kernel, and then loads dependencies in the usual way. The loader then calls back to the kernel, passing the relocation and fix-up tables in the TCodeSegCreateInfo structure. Finalization of Loaded Code Segment The loader calls E32Loader::CodeSegLoaded(), which results in a call to the kernel method DMemModelCodeSeg::Loaded(). For a non-paged code segment, all that the kernel has to do at this stage is the necessary cache maintenance to ensure that the instruction cache is up-to-date with the contents of RAM. It also unmaps the code segment from the fileserver process. For a paged code segment, the kernel reads the relocation and import tables and stores them on the kernel heap, as it does the initial static data and the export table (part for the text section). It then frees the memory allocated for the loader to write the static data into. Any page that has already been loaded (because the loader accessed it when compiling the relocation table) is fixed up. Cache maintenance happens as before, and the code segment is unmapped from the file server.
Under the Hood: The Implementation of Demand Paging
4.1.10 Page Locking
47
In Symbian OS v9.3, kernel-side code must, at times, be able to access memory without the risk of generating paging faults – for example, media drivers used by the paging system must not cause faults when they are used for normal filesystem access, otherwise deadlock would result. (The driver would be called to read the demand-paged data it was trying to access itself!) For situations such as this, the kernel provides the DDemandPagingLock class for temporarily locking (pinning) demand-paged contents so they will not be paged out. class class DDemandPagingLock DDemandPagingLock : : public public DBase DBase { { public: public: // // Reserve Reserve memory memory so so that that this this object object can can be be // used for locking up to aSize bytes. // used for locking up to aSize bytes. IMPORT_C IMPORT_C TInt TInt Alloc(TInt Alloc(TInt aSize); aSize); // // Perform Perform Unlock(), Unlock(), then then free free the the memory memory reserved reserved by by Alloc(). Alloc(). IMPORT_C void Free(); IMPORT_C void Free(); // // Ensure Ensure all all pages pages in in the the given given region region are are present present and and lock lock // them so that they will not be paged out. If the region // them so that they will not be paged out. If the region // nodemand demand paged memory, no action is per// contained contained no paged memory, thenthen no action is performed . formed. // This function may not be called again until the previous // function may not be called again until the previous // This memory has been unlocked. // memory has been unlocked. IMPORT_C TBool Lock(DThread* aThread, TLinAddr aStart, IMPORT_C TBool Lock(DThread* aThread, TLinAddr TInt aStart, aSize); TInt aSize); //Unlock any memory region which was previously locked with Lock().
//Unlock any Unlock(); memory region which was previously locked with inline void Lock(). inline void Unlock(); IMPORT_C DDemandPagingLock(); inline ~DDemandPagingLock() { Free(); } IMPORT_C DDemandPagingLock(); }; inline ~DDemandPagingLock() { Free(); } };
Initialization (Reserving Memory) When kernel-side code locks demand-paged memory, there must be sufficient free RAM into which to load the contents of that memory. But, usually, in situa-
48
Demand Paging on Symbian
tions in which we wish to lock memory, the operation must not fail due to out-ofmemory conditions. Because of this, the DDemandPagingLock object provides the Alloc() method which reserves memory for later use. Any code that needs to lock pages should create a DDemandPagingLock object and allocate memory during its initialization phase. It can later lock this memory without risk of failing. To avoid wasting reserved memory when it is not being used, the kernel does the reservation by increasing the minimum size of the live list (iMinimumPageCount) by the number of reserved pages, (iReservePageCount). The kernel calls DemandPaging::ReserveAlloc() to do this. After this, because the live list is now larger than it would otherwise have been, more demand-paged and file-cache content can reside in RAM, so the memory reserved for locking is being put to good use until it is needed. So, at any one time, a page is in one of the following states: • young • old • transitioning (dead) • locked Locking Memory To lock memory ready for safe access, kernel-side code calls DemandPaging::LockRegion(), specifying a region of virtual memory in a given thread’s process.This method first checks if the region can contain demand-paged memory, and immediately returns false if it can’t. This provides very fast execution for typical use cases. (The only memory that can be demand paged is code and constant data in executable images, and it is very unlikely that an application would ask a device driver to operate on this sort of data.) If the memory region to be locked does reside in a pageable area – that is, it is a ROM or code chunk – then the method goes on to repeatedly call LockPage() for each page in the region. LockPage() calls EnsurePagePresent() to page the memory in (if it is not
Under the Hood: The Implementation of Demand Paging
49
already present), then examines the SPageInfo for the page to determine its type. If the page is pageable and on the live list, then it is removed from the live list and its state is changed to EStatePagedLocked with a lock count of one. If the page was already locked, then the lock count is simply incremented. Because locked pages are not on the live list, they will not be selected for paging out. Locking Large Regions When locking memory, the system must reserve it ahead of time. This means the kernel-side code doing the locking must know the maximum amount of RAM that it will need. But in practice, this size is often not known, is unbounded or is deemed to be too large to reserve in advance. In situations such as these, operations on memory must be broken down into smaller fragments, so that only a small region of memory is locked at any one time. Symbian provides a way to optimize this fragmentation: DDemandPagingLock::Lock() returns a truth value indicating whether the memory is in a pageable region. This means that in the typical case, where this is false, no fragmentation is necessary. The following code example demonstrates this, but omits error checking for brevity. const const TInt TInt KMaxFragmentSize KMaxFragmentSize = = 0x8000; 0x8000; // // one one time time initialisation... initialisation... DDemandPagingLock* DDemandPagingLock* iPagingLock iPagingLock = = new new DDemandPagingLock; DDemandPagingLock; iPagingLock.Alloc(KMaxFragmentSize); iPagingLock.Alloc(KMaxFragmentSize); // // example example function function which which locks locks memory memory in in fragments... fragments... void DoOperation(TUint8* buffer, TInt size) void DoOperation(TUint8* buffer, TInt size) { { while(size) while(size) { { TUint8* TUint8* fragmentStart fragmentStart = = buffer; buffer; do do { { TInt TInt lockSize lockSize = = Min(size, Min(size, KMaxFragmentSize); KMaxFragmentSize); if(iPagingLock->Lock(buffer,lockSize)) if(iPagingLock->Lock(buffer,lockSize)) break; break; // // lock lock used used so so now now process process fragment fragment
50
Demand Paging on Symbian
// expand fragment... buffer += lockSize; size -= lockSize; } while(size);
}
// process fragment... TInt fragementSize = buffer-fragmentStart DoOperationOnFragment(fragmentStart, fragmentSize); iPagingLock->Unlock(); }
Note that although the previous code makes use of the DDemandPagingLock API, as it should, the actual work is done by the DemandPaging object.
4.1.11 Chunk APIs
The kernel makes memory available to applications via chunk objects (class RChunk). These represent a region of virtual address space, reserved upon creation, into which physical memory may be committed or de-committed. The granularity of this allocation is governed by the MMU page size – this is 4 KB on all current CPU architectures. Various changes have been made to the RChunk API in support of demand paging. Virtual Memory Commit New RChunk APIs have been added to allow virtual memory to be committed and de-committed. This reserves address space in the chunk, but does not allocate physical RAM or change any memory mappings. The paging subsystem uses these APIs to reserve address space for code that will be paged on use. A new commit type ECommitVirtual has been added, and the DMemModelChunk::Decommit() method has been expanded to take a TDecommitType argument, which allows the caller to specify EDecommitVirtual. These new APIs are present in non-demand-paged systems, but they are never called.
Under the Hood: The Implementation of Demand Paging
51
Additions to Disconnected Chunk API The existing API for disconnected chunks is briefly summarized as follows: RChunk::CreateDisconnectedLocal(TInt aInitialBottom, TInt aIniRChunk::CreateDisconnectedLocal(TInt aInitialBottom, tialTop, TInt aInitialTop, TInt aMaxSize, TOwnerType TInt aMaxSize, TOwnerType aType) aType);
This API creates a chunk that is local (that is, private) to the process creating it. The size of the reserved virtual address space is given by aMaxSize. Memory may be committed at creation by setting appropriate values for aInitialBottom and aInitialTop. RChunk::Commit(TInt RChunk::Commit(TInt aOffset, aOffset, TInt TInt aSize); aSize);
Commit (allocate) aSize bytes of memory at position aOffset within the chunk. RChunk::Decommit(TInt RChunk::Decommit(TInt aOffset, aOffset, TInt TInt aSize); aSize);
Decommit (free) aSize bytes of memory at position aOffset within the chunk. New Lock/Unlock APIs In the past, physical RAM could be in one of two states – committed and owned by a thread or process, or uncommitted and owned by the kernel. To support file caching, Symbian has introduced a new, intermediate, state – ‘unlocked.’ The kernel may de-commit this memory whenever it needs to, but a thread may relock the memory, which returns it to the committed state with its contents preserved. Two new methods have been added for disconnected chunks: RChunk::Unlock(TInt RChunk::Unlock(TInt aOffset, aOffset, TInt TInt aSize); aSize);
This method places aSize bytes of memory at position aOffset within the chunk in the unlocked state. While the memory is in this state, the system is free to reclaim it for other purposes, so its owner must no longer access it. Both aSize and aOffset should be a multiple of the MMU page size.
52
Demand Paging on Symbian
RChunk::Lock(TInt aOffset, TInt aSize);
This method reverses the operation of Unlock() for aSize bytes of memory at position aOffset. It returns the RAM to the fully committed state, which means its owner can access it once more, and its contents are exactly as they were when Unlock() was originally called. Both aSize and aOffset should be a multiple of the MMU page size. However, if in the interim the system has reclaimed any RAM in the region, then the method fails with KErrNotFound and the region remains in the unlocked state. RChunk::Commit(TInt aOffset, TInt aSize);
The existing Commit() function has been modified to operate on a region of the chunk that has been unlocked. It performs the lock operation on all pages that have not been reclaimed by the system and allocates new pages where the previous ones have been reclaimed. After this operation, the contents of the region are undefined. This chance was made to optimize situations in which a virtual address region is to be reused for new cache contents. It avoids the extra overheads involved in separate Decommit() and Commit() operations, which could be significant because any new RAM allocated to a chunk must be wiped clean first, to avoid security issues with data leakage between processes. RChunk::Decommit(TInt RChunk::Decommit(TInt aOffset, aOffset, TInt TInt aSize); aSize);
The existing Decommit() function has been modified so that it also returns any unlocked memory in the region back to the system.
4.1.12 File-System Cache Support
Support for file-system caching first appears in Symbian OS v9.4, and is built upon the disconnected chunk, which allows physical RAM to be committed and de-committed anywhere within the virtual address space of the chunk, with no requirement that it is contiguous. Most of the changes to the disconnected chunk mentioned were made in support of file-system caching.
Under the Hood: The Implementation of Demand Paging
53
File-System Cache in Use Here is a brief overview of the interactions the file server makes with the chunk API to implement caching: Initial file read request 1. File read request received by file server. 2. File data determined not to be in cache. 3. Cache buffers allocated using RChunk::Commit(). 4. File system reads data from media into cache. 5. Some data copied from cache to client and read request completed. 6. Cache buffer is unlocked with RChunk::Unlock(). At the end of this operation, the pages used for this cache are placed at the front of the live list because they are now the youngest pages in the system. As demand-paging or other file-system caching operations occur, the cache pages will grow older and move down the live list. Subsequent file read request (cache hit) 1. File read request received by file server. 2. File data determined to be in cache. 3. Cache buffer locked with RChunk::Lock(). This removes the buffer’s pages from the live list. 4. Some data copied from cache to client and read request completed. 5. Cache buffer unlocked again with RChunk::Unlock(). At the end of this operation, the pages used for this cache are back on the live list as the youngest pages in the system. The cached data for the file will stay in RAM for as long as it is accessed sufficiently often to remain in the live list. It will only be lost from the live list if there is no free memory in the RAM allocator and all the other memory used for paging and caching has been more recently accessed than it has (that is, when these cache pages have become the oldest pages in the system).
54
Demand Paging on Symbian
Subsequent file read request (cache miss) If the cache pages are removed from the live list, then the file-system cache will behave like this: 1. File read request is received by file server. 2. File data determined to be in cache. 3. Cache buffer locked with RChunk::Lock(). This returns an error because the cache pages have been reclaimed by the kernel. 4. Cache buffer re-allocated with RChunk::Commit(). 5. File system reads data from media into cache. 6. Some data copied from cache to client and read request is completed. 7. Cache buffer unlocked with RChunk::Unlock(). File-system caching uses a ‘backing off’ algorithm, so that when cache contents are lost, the size of the memory being used for caching is reduced, reducing the likelihood of future cache misses.
4.1.13 Byte-Pair Compression
The default ‘deflate’ compression algorithm used in Symbian only allows compressed data to be decompressed in one whole block. This is fine when decompressing a complete XIP ROM image or a whole executable, but it is not acceptable for demand paging where only a single page of an image/executable needs to be decompressed during a page-in event. To support demand paging, Symbian introduced the ‘byte-pair’ compression algorithm in Symbian OS v9.3, which allows data to be compressed and decompressed in individually addressable blocks. Each decompressed 4 KB block can then be mapped directly to a page of RAM. (Symbian can only demand-page data that is byte-pair-compressed or uncompressed.) Another reason to use byte-pair compression is its fast decompression time. Decompressing byte-pair-compressed data is approximately twice as fast as decompressing deflate-compressed data. Compression time is slower but this is not a concern because compression usually occurs at build-time and performance here is less critical than performance at run-time. (A possible exception to this is flash over the air (FOTA) functionality, which may require run-time compression of data.)
Under the Hood: The Implementation of Demand Paging
55
The compressed size of byte-pair-compressed data is around 10%-20% larger than deflate-compressed data due to the additional administrative overhead required for each 4 KB block. Where performance is more important than ROM size, it may be better to use byte-pair compression instead of deflate compression, even outside the context of demand paging. Byte-pair compression can be applied while building executables or during ROM building. This is implicitly applied as appropriate when using the demand paging keywords (discussed further in Section 7.15). The Symbian Developer Library documentation at developer.symbian.org explains how to apply this compression method explicitly, if you need it for other purposes than demand paging. Support for byte-pair compression has been part of the loader since Symbian OS v9.2. It is only possible to demand-page data that has either been byte-pair compressed or is uncompressed. Byte-pair decompression is implemented in the method BytePairDecompress(). The loader calls this method on byte-pair compressed, non-paged executables, and the kernel calls it on similar demand-paged code. Algorithm The input stream is compressed one 4 KB block at a time. This block size was chosen to match the MMU page size, enabling the paging subsystem to decompress each page as needed. Fortuitously, this seems to be the optimum block size for compression efficiency. In some cases, the block size may be less than 4 KB – for example, when the last block in a compression stream is ‘short.’ The compression algorithm is as follows: 1. Find the least frequently occurring byte, X. This will be used as the escape byte. 2. Replace all occurrences of X with the pair of bytes {X,X}. 3. Find the new least frequently occurring byte, B. 4. Find the most frequently occurring pair of consecutive bytes, {P,Q}. 5. Check for terminating condition.
56
Demand Paging on Symbian
6. Replace all occurrences of B with {X,B}. 7. Replace all occurrences of {P,Q} with token B. 8. Go to step three. When calculating frequencies in steps three and four, exclude those that involve any bytes from the escaped form {X,?}. When calculating frequencies in step four, do not count overlapping pairs that occur in a repeated byte sequences. So the sequence of five identical bytes (P,P,P,P,P) contains the two pairs and singleton ({P,P},{P,P},P) – not four pairs. The terminating condition in step five checks if the substitution performed in steps six and seven will not result in any further compression. This depends on the storage format for the compressed data (discussed in the next section) and is calculated as follows: 1. Let f(B) be the frequency of byte B, and f(P,B) the frequency of the pair {B,P 2. If the number of byte-pair tokens created so far is <32 then terminate if f(P,B) - f(B) <= 3. 3. If the number of byte-pair tokens created so far is >=32 then terminate if f(P,B) - f(B) <= 2. Storage Format The compressed data for each block is stored in one of three forms, depending on the number of tokens created during compression: 0 Tokens When the compression hasn’t performed any token substitutions, the ‘compressed’ data for the block is stored in a format as shown in Table 1. Table 1 Values
Meaning
0x00 ...
One byte token count value (zero) Bytes of uncompressed data
Under the Hood: The Implementation of Demand Paging
57
1 to 31 tokens When the compression step produced between one and 31 token substitutions, the compressed data for the block is stored in a format as shown in Table 2. Table 2 Values
Meaning
N
One byte token count value N
X {B0,P0,Q0}... {BN-1,PN-1,QN-1} ...
One byte with the value of the escape character X N times 3 bytes of token/pair values {B,P,Q} Bytes of compressed data
32 to 255 tokens When the compression step produced between 32 and 255 token substitutions, the format is modified to store the token values in a bit vector, as shown in Table 3 on the next page. Table 3 Values
Meaning
N
One byte token count value N
X
One byte with the value of the escape character X
32 bytes containing a vector of 256 bits. The least significant bit of the first byte is index zero. A set bit indicates that the corresponding token value B is used and has its substitution pair {P,Q} as follows.
{P0,Q0}... {PN-1,QN-1}
N times two bytes of pair values {P,Q}. These are stored in ascending order of token values that represent them.
...
Bytes of compressed data
58
Demand Paging on Symbian
4.1.14 Debugger Breakpoint Support
The kernel provides the DebugSupport::ModifyCode() method to allow debuggers to modify the contents of code to implement breakpoints, and this method has been updated to work as expected when demand paging is in use. Each breakpoint now makes use of the new DDemandPagingLock object to force paged memory to be loaded and locked before inserting breakpoints. This prevents the paging system from discarding the page in question, losing the breakpoint. When a breakpoint is removed, the kernel unlocks the page and demand pages it as usual. The code to create shadow pages now makes use of the new M::LockRegion() and M::UnlockRegion() methods to force demandpaged ROM to be loaded into memory before the page is shadowed.
4.2 Media Driver Support 4.2.1 Media Drivers
Since they form the interface to backing store for the paging system, media drivers are of prime importance. I shall discuss the modifications you will need to make to them in you are enabling demand-paging for the first time in Section 5.4. Here, I will limit myself to a brief overview of the new classes provided by the kernel in support of media drivers and demand paging.
4.2.2 Paging Device APIs
Media drivers must now implement and expose a new API, DPagingDevice, which allows the kernel to access storage media in support of demand paging. This class provides methods for reading data from the storage media. It also supplies media metrics. Media drivers that support paging should register themselves with the paging system by calling Kern::InstallPagingDevice() during system boot. If demand paging is to operate for all user-side code, media drivers must be created and installed via kernel extensions, rather than waiting until they are loaded
Under the Hood: The Implementation of Demand Paging
59
by EStart. To support this, Symbian has added two new APIs to enable device driver creation from kernel-side code – Kern::InstallLogicalDevice() and Kern::InstallPhysicalDevice(). The DPagingDevice class is shown as follows: // // kernel.h kernel.h class class DPagingDevice DPagingDevice : : public public { { public: public: enum enum TType TType // // The The type type of of
DBase DBase device device this this represents. represents.
{ ERom = 1<<0, /**< Paged ROM device type. */ { ECode = 1<<1 /**< Code paging device type. */ ERom = 1<<0, /**< Paged ROM device type. */ }; ECode = 1<<1 /**< Code paging device virtual TInt Read(TThreadMessage* aReq, type. */ }; TLinAddr aBuffer, TUint aOffset, virtual TInt Read(TThreadMessage* aReq,TLinAddr TUint aSize, TInt aDrvNumber)aBuffer, = 0; TUint aOffset,TUint aSize, TInt aDrvNumber) = 0; public: public: TUint32 iType; TUint32 TUint32 iType; iDrivesSupported; TUint32 iDrivesSupported; const char* iName; const char* iName; TInt iReadUnitShift; TInt TInt iReadUnitShift; iDeviceId; TInt iDeviceId; }; };
The read method is called by the paging system to read data from the media represented by this device. This method should store aSize bytes of data at offset aOffset in the buffer aBuffer. iType is the type of paging device: ERom or ECode. iDrivesSupported tells the system which local drives are supported for code paging. It is a bitmask containing one bit set for each local drive supported, where the bit set is 1 << the local drive number. If this device does not support code paging, iDrivesSupported should be zero.
60
Demand Paging on Symbian
iName is a zero-terminated string representing the name of the device. This is only used for debug tracing purposes. iReadUnitShift is the Log2 of the read unit size. A read unit is the number of bytes that the device can optimally read from the underlying media. For example, for small block NAND, a read unit would be equal to the page size, 512 bytes, and iReadUnitShift would be set to nine. iDeviceId is the value, chosen by the kernel, that the device should use to identify itself. Paging Request Objects When loading a page from media, the kernel needs a buffer into which the media driver will load the page’s data, and some virtual address space into which it can temporarily map the new page, before copying (and possibly decompressing) the data from the buffer. The kernel can’t rely on allocating a buffer every time a page fault occurs, because of performance issues and the possibility that there might not be enough free RAM. Accordingly, these resources are packaged up into paging request objects, and a fixed number of them are allocated when the system is initialized. The paging request object is implemented by the DemandPaging::DPagingRequest class, declared in demand_paging.h. The relevant parts of the class declaration are: class DemandPaging : public RamCacheBase class DemandPaging : public RamCacheBase { { ... ... // Resources needed to service a paging request. // Resources needed to a paging request. class DPagingRequest : service public SDblQueLink class DPagingRequest : public SDblQueLink { { public: public: ~DPagingRequest(); ~DPagingRequest(); public: public: TThreadMessage iMessage; TThreadMessage iMessage; DMutex* iMutex; DMutex* iMutex; TInt iUsageCount; TInt iUsageCount; TLinAddr iBuffer; TLinAddr iBuffer;
Under the Hood: The Implementation of Demand Paging
61
TLinAddr iLoadAddr; TPte* iLoadPte; };
TUint iPagingRequestCount; DPagingRequest* iPagingRequests[KMaxPagingRequests]; SDblQue iFreeRequestPool; TInt iNextPagingRequestCount; ... };
Overview When a thread takes a paging fault, it acquires a paging request object and maps a new page of RAM at the temporary address (iLoadAddr). It then calls the media driver to load the data into the request object’s buffer (iBuffer) and copies or decompresses the data from it into the page at iLoadAddr. Finally, the request object is released and the new page mapped in to the correct location in the memory map. This process is implemented in DemandPaging::ReadRomPage() and DemandPaging::ReadCodePage() in mmubase.cpp. Concurrency Since request objects are only created at boot, the number of them limits the number of paging faults that can be processed concurrently. Because there can be many threads taking paging faults at the same time, some care is needed to co-ordinate access to request objects. One problem that must be avoided is priority inversion, in which a high-priority thread waits to get its paging fault processed because low-priority threads are holding all the request objects. To avoid this, Symbian added a mutex (iMutex) and a usage count (iUsageCount) to each request object. The kernel maintains a pool of free request objects (iFreeRequestPool) as well as an array of all the request objects in the system (iPagingRequests). When a thread tries to acquire a request object, the kernel checks the free pool first. If it is non-empty, an unused request object is taken for the pool. If not, the
62
Demand Paging on Symbian
kernel selects a request object at random. The kernel increments the request object’s usage count and makes the thread wait on its mutex. When a thread releases a request object, the kernel signals the object’s mutex and decrements its usage count. Once the count reaches zero, the kernel places the object back in the free pool. Both these operations occur with the system lock held, to synchronize access to the relevant data structures. They are implemented in DemandPaging::AcquireRequestObject() and ReleaseRequestObject() in mmubase.cpp. This scheme distributes free objects while they are available, and then makes threads queue for a randomly selected object. Use of a mutex for this purpose provides priority inheritance, avoiding the issue of priority inversion. Random selection mitigates the possibility of pathological behavior. A media driver may also need to queue multiple incoming requests, and so we added a TThreadMessage member (iMessage) to DPagingRequest, allowing these request objects to be placed on a message queue and used in inter-thread communications. Initialization During initialization, the kernel creates a fixed number of paging request objects per paging device in the system. This is set by the constant KPagingRequestsPerDevice, which is (at the time of writing) two. This number was chosen because media drivers cannot issue more than one request to hardware at a time, and if we have two objects, then one thread can be waiting for data to be read in and another thread can be decompressing data. Adding more request objects would only result in more threads waiting for data to be read. The request objects are created by DemandPaging::CreateRequest Object(), defined in mmubase.cpp. This is called repeatedly from DemandPaging::InstallPagingDevice() when a paging device is installed. CreateRequestObject() creates a single request object. It allocates an ID for the object (by atomically incrementing iNextPagingRequestCount with NKern::LockedInc()). It allocates the object’s buffer from a chunk used for
Under the Hood: The Implementation of Demand Paging
63
the purpose, basing the offset on the ID. It calls down to the memory-model-specific code to allocate the temporary virtual address (using AllocLoadAddress()), passing the ID. It creates the mutex. CreateRequestObject() also resizes the live list if necessary, to ensure that if every request object is in use (and the corresponding number of pages removed from the live list), then there are still enough pages on the live list to satisfy the system’s constraints. The ResizeLiveList() method makes use of iNextPagingRequestCount, which now holds the number of request objects that will be present in the system if this call to CreateRequestObject() succeeds. When the object has been initialized, CreateRequestObject() adds it to the free pool and request object array, and increments the request object count (iPagingRequestCount). This is done with the system lock held to serialize access to the data structures. No locking is necessary for the rest of the function because the system will only use the new request object when iPagingRequestCount is updated at the end.
4.3 File Server Changes 4.3.1 File Clamping
If a file system is to support code paging (first available in Symbian OS v9.3), it must implement file clamping. This is because when the kernel is demand paging read-only code, the content and location of the files it is paging from must remain unchanged throughout the duration of this paging. Symbian introduced file-clamping functionality to the file server in support of this. While a file is clamped, calls to methods that would otherwise modify its content are prevented from doing so and return the error code KErrInUse (-14). Similarly, if an attempt is made to perform a synchronous dismount of a file system mount while one of the files is clamped, the dismount will be prevented and KErrInUse will be returned. Attempts to perform an asynchronous dismount will be deferred until all clamps have been closed.
64
Demand Paging on Symbian
Only open, read-only3, non-empty files may be clamped. For each clamp, a handle is generated (encapsulated by the RFileClamp class) and returned to the user. Files may be clamped multiple times. A new handle is generated in each case. To close (remove) a clamp, the specific handle must be passed back to the file server (invalid handles lead to the error code KErrNotFound). It is only when all the clamps for a file have been closed that the file can be considered unclamped. Operations Affected by File Clamping The following list shows the APIs that are affected by file clamping. RFs • Replace() • Delete() • NotifyDismount() (with argument EFsDismountNotify Clients and EFsDismountForceDismount) • AllowDismount() • DismountFileSystem() • SwapFileSystem() RFile • Replace() • SetSize() • Open() (with argument EFileWrite) File System Support for File Clamping The FAT and ROFS file systems both support file clamping (they provide a unique identifier for a file when clamping is requested). The composite and ROM file systems provide ‘pseudo’ support for file clamping – they always return a zero-value identifier for a file when clamping is requested. Because files cannot be modified on these file systems, nor can the mount be dismounted, clamping them is pointless. 3 To ensure that their contents are not changed.
Under the Hood: The Implementation of Demand Paging
65
Other file systems will show the default behavior. Requests for file clamping will return the error code KErrNotSupported.
4.3.2 Implementation and Operation of File Clamping
When a file is clamped, the mount generates a handle. This handle is encapsulated by the RFileClamp class: // // e32ldr.h e32ldr.h class class RFileClamp RFileClamp { { public: public: inline inline RFileClamp() RFileClamp() { { iCookie[0] iCookie[0] = = 0; 0; iCookie[1] iCookie[1] = = 0; 0; } } IMPORT_C IMPORT_C TInt TInt Clamp(RFile& Clamp(RFile& aFile); aFile); IMPORT_C TInt IMPORT_C TInt Close(RFs& Close(RFs& aFs); aFs); public: public: TInt64 TInt64 iCookie[2]; iCookie[2]; }; };
The class keeps two cookies. The first, iCookie[0], is generated by the file system to which the file belongs and represents a unique identifier for the file. The second, iCookie[1], is made up of two 32-bit values: the drive number and a count value that is incremented by the file-server mount instance on creation of each new clamp. The RFileClamp class provides two APIs: EXPORT_C EXPORT_C TInt TInt RFileClamp::Clamp(RFile& RFileClamp::Clamp(RFile& aFile) aFile)
Called on an uninitialized RFileClamp object. Clamps the supplied file, stores the cookie in the RFileClamp object and returns an error. EXPORT_C EXPORT_C TInt TInt RFileClamp::Close(RFs& RFileClamp::Close(RFs& aFs) aFs)
66
Demand Paging on Symbian
Unclamps the file that was clamped with this RFileClamp object. It is safe to call this function with a handle that was not successfully opened. Classes and Basic Clamp Functionality The bulk of the file clamping functionality is provided by the class ‘CMountBody’, a composite member of the file-server class that represents a file-system independent mount, CMountCB. (If CMountBody has not been initialized, calls to file clamping functionality receive the return value KErrNotSupported.) Two key member data of CMountBody are: RArray iClampIdentifiers; RArray iClampIdentifiers; TInt32 iClampCount; TInt32 iClampCount;
iClampIdentifiers holds RFileClamp instances, ordered to facilitate searching, while iClampCount is a one-based count value that increments for each new clamp. When a user-side thread makes a request to clamp a file, CMountBody seeks a file-system specific unique identifier for the file and stores it in iCookie[0] of the RFileClamp instance. It then increments iClampCount and stores a 64bit composite of the drive number and the incremented count in iCookie[1]. Finally, it inserts the RFileClamp instance into its rightful place in iClampIdentifiers and sets a flag for the mount to indicate that (at least) one file is clamped. When a request is made to unclamp a file, CMountBody searches iClampIdentifiers for an RFileClamp instance that matches both the value in iCookie[0] and the count value in iCookie[1]. If no matching instance is found, it returns KErrNotFound. If it does find a match, the RFileClamp instance is removed from iClampIdentifiers. If iClampIdentifiers is now empty, the flag for the mount is cleared and any pending dismount is instigated. File System Functionality To support file clamping, the file system of the clamped file must implement the CMountCB::MFileAccessor interface. This provides the method:
Under the Hood: The Implementation of Demand Paging
67
TInt GetFileUniqueId(const TDesC& aName, TInt64& aUniqueId) TInt GetFileUniqueId(const TDesC& aName, TInt64& aUniqueId)
The parameter aName is name of the file and aUniqueId is the value returned by the file system. During initialization of the CMountCB object, its InitL() method is invoked, which executes these lines of code: MFileAccessor* MFileAccessor* fileAccessor fileAccessor = = NULL; NULL; GetInterface(CMountCB::EFileAccessor, NULL); GetInterface(CMountCB::EFileAccessor, (void*&) (void*&) fileAccessor, fileAccessor,NULL); iBody = new(ELeave)CMountBody(this, fileAccessor); iBody = new(ELeave)CMountBody(this, fileAccessor);
The GetInterface() call to the file system determines whether it provides support for the MFileAccesor interface. If not, the CMountBody is passed a NULL parameter for fileAccessor, and requests for clamping functionality receive the error code KErrNotSupported. Otherwise, the CMountBody is passed a pointer to the file-system mount object – the object on which the GetFileUniqueId() method is invoked. Clamping and Deferred Dismount To support deferred dismount, the CMountBody class includes the following member data: TBool TBool iDismountRequired; iDismountRequired; TInt TInt (*iCallBackFunc)(TAny*); (*iCallBackFunc)(TAny*); TDismountParams* TDismountParams* iCallBackParams; iCallBackParams;
iDismountRequired indicates if there is a dismount pending. iCallBackFunc is a function to call when dismount may proceed. iCallBackParams is a list of parameters to pass to iCallBackFunc. These members are initialized when an asynchronous request to dismount fails because the mount has one or more clamps. The details of the request are stored for later and used to dismount when the last clamp is removed.
68
Demand Paging on Symbian
4.3.3 RFile::BlockMap() API
In Symbian OS v9.3, a new API was added to provide a mechanism for retrieving a map of the logical sectors representing the file to be paged. The API enables code to access the paged file by first obtaining its blockmap, and then accessing the media directly. Each file on the media will consist of a number of groups of contiguous blocks. Each group is represented by its starting address, the media block size, the address of the first block on the media and the number of consecutive blocks in the group. If the file is not fragmented, an RFile::BlockMap() call will retrieve the blockmap for the whole file (or a specified part of it). If the file is fragmented, subsequent calls to RFile::BlockMap() will return the blockmap information for the next group of contiguous blocks in the file. RFile::BlockMap() will return KErrCompletion when it has reached the end of the requested part of the file. Implementation Symbian has added another new class. This one represents a group of contiguous blocks, and is called TBlockMapEntry. It contains the number of blocks in the group and the number of the first block in that group: class class TBlockMapEntry TBlockMapEntry { { public: public: TBlockMapEntry(); TBlockMapEntry(); void void SetNumberOfBlocks( SetNumberOfBlocks( TUint TUint aNumberOfBlocks aNumberOfBlocks ); ); void SetStartBlock( TUint aStartBlock void SetStartBlock( TUint aStartBlock ); ); public: public: TUint TUint iNumberOfBlocks; iNumberOfBlocks; // // number number of of contiguous contiguous blocks blocks in in map map TUint iStartBlock; // number for first block in the map TUint iStartBlock; // number for first block in the map }; };
A container structure, SBlockMapInfo, describes a group of such blockmaps, and also carries information such as the media block size in bytes, the offset to start of the file or the requested file position within a block and the address of the first block on the media. This container structure is passed as a parameter to the following API to carry blockmap information:
Under the Hood: The Implementation of Demand Paging
69
TInt RFile::BlockMap(SBlockMapInfo& aInfo, TInt64& aStartPos, TInt64 aEndPos=-1, TInt aBlockMapusage=EBlockMapUsagePaging);
Parameters: SBlockMapInfo& SBlockMapInfo& aInfo: aInfo:
A structure describing a group of blockmaps:
const TUint KMaxMapsPerCall=8; const TUint KMaxMapsPerCall=8; typdef TBuf8< KMaxMapsPerCall*sizeof(TBlockMapEntry)> TBlockArtypdef TBuf8< KMaxMapsPerCall*sizeof(TBlockMapEntry)> TBlockArrayDes; rayDes; struct SBlockMapInfo struct SBlockMapInfo { { TUint iBlockGranularity; // size of a block in bytes iBlockGranularity; // of to a block TUint iBlockStartOffset; //size offset startin ofbytes the file or TUint iBlockStartOffset; // to start of the file or //offset requested file position within // requested file position within block block Tint64 iStartBlockAddress // address of the first block // of the file TBlockArrayDes iMap; };
iBlockGranularity is the size of the block for a given device in bytes. This field is only filled on the first call to RFile::BlockMap(). iBlockStartOffset is the offset into the first block of a file containing the start of the file, or the start of the section of the file that has been requested. This field is only filled on the first call to RFile::BlockMap(). iStartBlockAddress is the address of the first block of the file. This field is only filled on the first call to RFile::BlockMap(). iMap is a descriptor holding an array of up to KMaxMapsPerCall TBlockMapEntry entries. If you don’t need the blockmap for the whole, then you can specify a start and
70
Demand Paging on Symbian
end position for a section of the file (aStartPos, aEndPos). This is useful for the demand paging where only the executable section of a file will be needed. Both of these parameters specify offsets from the start of the file in bytes, and if they are not passed, the whole file is assumed. aBlockMapUsage is the reason for the blockmap API use, which is set to EBlockMapUsagePaging by default. The return value is one of the following: • Until the end of the file or the file section is reached, KErrNone will be returned. • If the end of the file is reached, KErrCompletion will be returned. In this case the length of the iMap may be smaller than its maximum. • An error code. RFile::BlockMap() was implemented by extending the interface of CFileCB using the GetInterface() mechanism, and modifying the derivative classes of CFileCB for file systems that support blockmaps (CRofsFileCB and CFatFileCB). A new API, BlockMapReadFromClusterListL() has been added to CFatFileCB. This is used to calculate physical addresses from FAT table entries. There is no ROFS equivalent function as ROFS organizes all its data contiguously. The default implementation conveniently returns KErrNotSupported for all other file systems. Example: RArray<SBlockMapInfo> map; RArray<SBlockMapInfo> map; SBlockMapInfo info; SBlockMapInfo info; do do{ { r = myFile.BlockMap(info, aStartPos, aEndPos); r = myFile.BlockMap(info, aStartPos, aEndPos); map.Append(info); map.Append(info); } while ( r == KErrNone && r != KErrCompletion ); } while ( r == KErrNone && r != KErrCompletion ); TInt granularity; TInt for (granularity; TInt c = 0; c < map.Count(); c++ ) for { ( TInt c = 0; c < map.Count(); c++ ) { granularity = map[c].iMap.Size()/KMaxMapsPerCall; granularity = map[c].iMap.Size()/KMaxMapsPerCall;
Under the Hood: The Implementation of Demand Paging
71
TBlockMapEntry* myBlockMapEntry = (TBlockMapEntry*)map[c].iMap.Ptr(); // Then read the contents of iMap using // the myBlockMapEntry pointer } map.Close(); if (KErrCompletion!=pos) … // deal with error ...
This API will not be available to any process other than the file server. It is used in the loader (for demand paging), and in the loopback proxy extension. Appropriate security vetting by SID will be required on the file server. Changes to File Systems The API obtains the information it needs by calling into the file. To ensure binary compatibility with earlier versions, the API uses the GetInterface() method of the CFileCB-derived object. This means that file systems that do not store files in a manner consistent with this API will not require any changes. This is because their current implementations will automatically return KErrNotSupported. If a file system does implement block-based storage, you will have to add a new interface, and return it through the GetInterface() method. For example, CFileCB gets a new inline: inline TInt CFileCB::BlockMap(SBlockMapInfo& aInfo, TInt64& inline TInt CFileCB::BlockMap(SBlockMapInfo& aInfo, TInt64& aStartPos, TInt64 aEndPos = -1) aStartPos, TInt64 aEndPos = -1) { { CBlockMapInterface* pM; CBlockMapInterface* pM; TInt r = GetInterface(EBlockMapInterface,(TAny*) pM,(TAny*) this); TInt r = GetInterface(EBlockMapInterface,(TAny*) pM,(TAny*) if (KErrNone!=r) return r; this); return ( pM->BlockMap(anInfo,aStartPos,aEndPos) ); if (KErrNone!=r) return r; } return ( pM->BlockMap(anInfo,aStartPos,aEndPos) ); }
Supporting file systems such as FAT could then implement the interface like this: class CFATBlockMapInterface: CFATBlockMapInterface: public public CBlockMapInterface CBlockMapInterface class { { ... ... // overwrite pure virtual virtual // overwrite pure
72
Demand Paging on Symbian
TInt BlockMap(SBlockMapInfo& aInfo, TInt64& aStartPos, TInt64 aEndPos = -1); };
TInt CFatFileCB::GetInterface (TInt aInterfaceId, TAny*& aInterface, TAny* aInput) { switch(aInterfaceId) { case EBlockMapInterface: aInterface = (TAny*) &iBlockMapInterface; return KErrNone; //... } return KErrNotSupported }
The same applies for ROFS. You need a CRofsBlockMapInterface class to inherit CBlockMapInterface and implement a BlockMap() function. Additionally, CRofsFileCB::GetInterface() needs to be EBlockMapInterface aware. Finally, the FAT file system itself would return the blockmap, using the mount object to walk the FAT table to obtain the requisite information.
4.3.4 Loader Server Loading Code to be Paged
In Symbian OS v9.3 we have updated the loader so that it can load code-paged binaries. When loading code into RAM, the loader now checks whether the code should be paged. If so, it does not load, relocate or fix up the code, but instead collates extra information that it passes to the kernel, allowing the latter to perform these steps when each page is loaded. The loader also prevents the image file from being deleted or modified while it is being paged.
Under the Hood: The Implementation of Demand Paging
73
Implementation The method E32Image::ShouldBeCodePaged() has been added to determine whether an image should be code paged. This decision is based on several factors: 1. Whether the kernel’s paging policy allows code paging 2. Whether the code is appropriately compressed (uncompressed or bytepair compressed) 3. Whether the code itself is marked as paged or unpaged. The file clamp API is used to prevent the image file from being deleted. The method E32Image::BuildCodeBlockMap() has been added to retrieve the blockmap data for the file being paged. The kernel uses the blockmap to establish where the code is located on the media. E32Image::LoadCompressionData() and associated methods have been added to locate where each page of code resides in the image file, for the supported compression types. E32Image::BuildFixupTable(), E32Image::AllocateRelocationData () and associated methods have been added to build the import fix-up table and relocation tables to pass to the kernel. The functions that perform the relocation and import fix-up are still called for code-paged images, however rather than doing the fix up themselves, they populate the appropriate tables. Summary This chapter has covered a lot of ground! You should have the details necessary to understand the implementation, and you can find out more by consulting documentation and source available on developer.symbian.org.
74
Demand Paging on Symbian
Enabling Demand Paging on a New Platform
75
5 Enabling Demand Paging on a New Platform In this section, we’ll discuss the main factors that should influence the form of demand paging you choose to implement on your system. For example, demand paging can have a significant impact on device driver code; we’ll discuss the reasons and some ways to manage the changes necessary. We’ll also discuss some of the broader impacts of demand paging on the underlying system.
5.1 Choosing Which Type of Demand Paging to Implement Here we examine the different paging scenarios and describe factors to consider when choosing which type to implement.
5.1.1 All Paging Scenarios
If large parts of the system have to be wired for real-time/performance reasons, then the RAM savings achieved by demand paging may not be worth the cost of implementation. You should also note that device driver and system server APIs may need changing to support the implementation changes you’ll have to make to enable demand paging.
5.1.2 ROM Paging (Symbian OS v9.3)
This is the simplest form of demand-paging to implement, but offers the least RAM saving, especially if large parts of ROM have to be wired for real-time and performance reasons. Also, over-the-air updates are more complex for XIP ROM images.
5.1.3 Code Paging (Symbian OS v9.3)
This allows the paging of after-market applications, but is the scheme that takes the longest to implement. Code paging requires a new executable image format,
76
Demand Paging on Symbian
in which data compression and relocation information are based around pagesized data blocks.With code paging, there is much scope for deadlock and poor performance, because: • File systems must provide a control path that doesn’t cause paging faults. aPriority is the priority of the thread create • File system meta-data caching is required for acceptable performance • Third-party plug-ins must not take paging faults or use any service that might do so.
5.1.4 Data Paging (not yet implemented)
At the time of writing, data paging is not implemented on Symbian. Data paging has the potential to offer the greatest RAM saving, but can only work on devices with a suitable backing store. Power management would be a significant problem because the backing store is likely to be heavily used. You would need to implement a smart memory-caching system to mitigate this.
5.2 Migrating Device Drivers to a Demand-Paged System Impact of Demand-Paging on Kernel-Side Code
I have already referred (in Section 2.2) to some of the disadvantages of a demand-paged operating system. In this section, I discuss in more detail the impacts of these on kernel-side code, such as device drivers, and show how affected areas may be identified and modified. Page faults lead to unpredictable delays If a thread takes a paging fault, it is blocked for a significant and unpredictable time while the kernel services the fault. As I have already explained in Section 2.2.1, the delay will be of the order of one millisecond (assuming that the media you are paging from is not being used for other file system activity) and in busy systems the delay for a single fault could be hundreds of milliseconds or more. You should also note that a page fault could occur for each separate 4 KB page of memory accessed, so an object straddling two pages may cause two page faults.
Enabling Demand Paging on a New Platform
77
A faulting driver can have a wider impact on system performance: • The Symbian kernel services page faults at the priority of the NAND media driver thread (KNandThreadPriority = 24). This means any thread of higher priority will have its priority reduced while the driver faults. • In a system in which two or more device drivers share the same thread, the page fault taken by one driver can reduce the performance of the other drivers using that thread. Mutex misuse causes system deadlock When the kernel services a page fault, it must execute system code that uses certain system mutexes. If the thread that caused the fault already holds one of those mutexes, system deadlock will result. The only safe rule to apply is that demand-paged memory must not be accessed while holding any kernel-side mutex. In the following sections, I’ll look at these problem areas in more detail and discuss the actions you can take to mitigate them.
5.2.1 Page Faults and Device Drivers: Where Can Page Faults Occur?
Device drivers, like all kernel-side components, are wired in memory; once loaded, their code and read-only data sections will never be paged out. The ROMBUILD tool ensures this by placing drivers in the unpaged part of an XIP ROM. If any device driver is not resident in ROM, the loader will always copy it (in its entirety) into RAM, and wire it there. Correctly written kernel-side code should only access user memory using special functions provided by the kernel, which I discuss next. 1. Functions for Accessing Memory in an Arbitrary Process If a device driver needs to access data structures via pointers passed by its paged clients, and these data structures live in the client’s read-only (constant) data section, then passing a pointer to this paged data to the driver could result in a page fault when the driver de-references the pointer. (This situation might arise, for example, when accessing image data that is built into the ROM.) Typically, if the client needs to give the driver access to an arbitrary amount of data in its
78
Demand Paging on Symbian
address space, it will pass a descriptor encapsulating the data, or a pointer to a buffer containing the data. In this case, the driver needs to access the data from a different address space from the one associated with the thread it executes in, using the inter-thread kernel APIs listed as follows: // kernel.h // kernel.h Kern::ThreadDesRead() Kern::ThreadDesRead() Kern::ThreadGetDesLength() Kern::ThreadGetDesLength() Kern::ThreadGetDesMaxLength() Kern::ThreadGetDesMaxLength() Kern::ThreadGetDesInfo() Kern::ThreadGetDesInfo() Kern::ThreadRawRead() Kern::ThreadRawRead()
Note that, at time of writing, Symbian only supports the demand paging of readonly data, and so writes to pageable memory will not cause paging faults (but will cause the normal permissions check exception). The write functions can be considered safe for demand paging: // kernel.h // kernel.h Kern::ThreadDesWrite(DThread*, TAny*, const TDesC8&, TInt, TInt, Kern::ThreadDesWrite(DThread*,TAny*,const TDesC8&,TInt,TInt,DThread*) DThread*) Kern::ThreadRawWrite(DThread*,TAny*,const TAny*,TInt,DThread*) Kern::ThreadRawWrite(DThread*, TAny*, const TAny*, TInt, DThread*) Kern::ThreadDesWrite(DThread*,TAny*,const TDesC8&,TInt,DThread*) Kern::ThreadDesWrite(DThread*, TAny*, const TDesC8&, TInt, DThread*)
2. Functions for Accessing Memory in the Current Process Driver code may also need to access user memory directly using the following functions. These functions cause an exception if the user memory address is invalid, so they should not be used while the driver thread is in a critical section. This means the driver thread should not hold any mutexes. If this condition is met, then deadlock caused by demand paging is impossible, unless the function is called under an XTRAP harness. In this case, the code could be holding a mutex and you should check it for demand paging safety. // // klib.h klib.h kumemget(TAny* kumemget(TAny* aKernAddr, aKernAddr, const const TAny* TAny* aAddr, aAddr, TInt TInt aLength) aLength) kumemget32(TAny* aKernAddr, const TAny* aAddr, kumemget32(TAny* aKernAddr, const TAny* aAddr, TInt TInt aLength) aLength) umemget(TAny* umemget(TAny* aKernAddr, aKernAddr, const const TAny* TAny* aUserAddr, aUserAddr, TInt TInt aLength) aLength)
Enabling Demand Paging on a New Platform
79
umemget32(TAny* aKernAddr, const TAny* aUserAddr, TInt aLength) // kernel.h Kern::KUDesGet(TDes8& aDest, const TDesC8& aSrc) Kern::KUDesInfo(const TDesC8& aSrc, TInt& aLength, TInt& aMaxLength) Kern::KUDesSetLength(TDes8& aDes, TInt aLength)
Similarly, you can also assume the following write functions to be demand-paging safe: // klib.h kumemput(TAny* aAddr, const TAny* aKernAddr, TInt aLength) kumemput32(TAny* aAddr, const TAny* aKernAddr, TInt aLength) kumemset(TAny* aAddr, const TUint8 aValue, TInt aLength) umemput(TAny* aUserAddr, const TAny* aKernAddr, TInt aLength) umemput32(TAny* aUserAddr, const TAny* aKernAddr, TInt aLength) umemset(TAny* aUserAddr, const TUint8 aValue, TInt aLength) // kernel.h Kern::KUDesPut(TDes8& aDest, const TDesC8& aSrc)
The following functions are Symbian’s internal technology and so shouldn’t be used in partner code. Because they do not generate exceptions, you would need to check them for demand-paging safety, irrespective of whether an XTRAP harness is (unnecessarily) used. // kernel.h Kern::KUSafeRead(const TAny* aSrc, TAny* aDest, TInt aSize) Kern::KUSafeWrite(TAny* aDest, const TAny* aSrc, TInt aSize) Kern::KUSafeInc(TInt& aValue) Kern::KUSafeDec(TInt& aValue) Kern::SafeRead(const TAny* aSrc, TAny* aDest, TInt aSize) Kern::SafeWrite(TAny* aDest, const TAny* aSrc, TInt aSize)
3. Access to ROM headers Headers for User Mode Executables You should check both the following functions for demand paging safety. The first is defined in kernel.h and the second in platform.h.
80
Demand Paging on Symbian
// kernel.h // kernel.h Kern::CodeSegGetMemoryInfo(DCodeSeg&, TModuleMemoryInfo&, DProKern::CodeSegGetMemoryInfo(DCodeSeg&, TModuleMemoryInfo&, DProcess*) cess*) // platform.h Epoc::RomProcessInfo(TProcessCreateInfo&, const TRomImageHeader&)
4. Debugger Support APIs that Set Breakpoints The kernel implements the functions below in a demand-paging-aware manner. You do not need to modify code that uses them, unless such code isn’t tolerant of the indefinite delay caused by a page fault. // // platform.h platform.h DebugSupport::CloseCodeModifier() DebugSupport::CloseCodeModifier() DebugSupport::ModifyCode(DThread*, DebugSupport::ModifyCode(DThread*, TLinAddr, TLinAddr, TInt, TInt, TUint, TUint, TUint) TUint) DebugSupport::RestoreCode(DThread*, TLinAddr) DebugSupport::RestoreCode(DThread*, TLinAddr)
5. Direct Access to XIP ROM Any code that reads directly from the contents of an XIP ROM may cause a page fault. The only parts of the ROM that you can safely assume not to be demand paged are: • The ROM header (TRomHeader) • The contents of any kernel-mode executable • The ROM C++ exception search table (addressed by TRomHeader::iRom ExceptionSearchTable). You should assume that all other parts of the ROM could be demand paged. Mutex Problems with Page Faults in Drivers Page faults in device drivers can cause a different problem, related to mutex ordering. Page faults are handled in the context of the thread that took the fault. The code that handles page faults makes use of kernel resources and requires the use of synchronization objects such as NFastMutex and DMutex. NFastMutex objects are not nestable, and the nesting of DMutex objects can lead to deadlock if the mutex ordering is violated. If the order of the DMutex used by the device driver is higher than any other
Enabling Demand Paging on a New Platform
81
DMutex objects used by the kernel in all possible operations where the device driver may take a page fault, then the mutex order will not be violated. In practice, it is very difficult for the device driver writer to guarantee this in all the possible situations that mutex nesting could take place. Because of this, Symbian has decided that the only safe rule to apply is that demand-paged memory must not be accessed while holding any kernel-side mutex. The following lists the most likely code in the base port or debugging modules to hold system mutexes. 1. Code that Creates its Own Mutex Using Kern::MutexCreate() This is to be expected. 2. Device Driver Power Handlers Base port code implements classes deriving from DPowerController (kpower.h). The kernel calls the following methods in those classes with the power management feature lock mutex held: • DPowerController::DisableWakeupEvents() • DPowerController::EnableWakeupEvents() • DPowerController::PowerDown(TTimeK aWakeupTime) 3. The Power Handler Base port code implements classes deriving from DPowerHandler (kpower.h). The kernel calls the following methods in those classes with the power-management feature lock mutex held: • DPowerHandler::PowerUp() • DPowerHandler::Wait() • DPowerHandler::PowerDown(TPowerState aState) 4. DKernelEventHandler Implementations The kernel calls these in many different places while it holds internal kernel mutexes.
82
Demand Paging on Symbian
5. Code that Examines Kernel Containers This means code that calls Kern::Containers() and then calls Wait() on the container mutex. Note that these APIs are internal to Symbian and shouldn’t be used in partner code. Changing this code so that it doesn’t wait on the mutex is NOT a good solution, because a container cannot be used safely unless the caller holds its mutex lock. In the next section, I discuss how to avoid this mutex issue, and other issues arising from demand paging.
5.2.2 Addressing Issues Arising from Demand Paging Each device driver should have its own DFC thread A good first step is to investigate whether any part of the driver executes in a shared kernel thread context. Device drivers execute operations in a kernel thread context by placing deferred function calls (DFCs) on a queue, from which they are later executed sequentially by the corresponding DFC thread. DFCs are typically (but not exclusively) queued in response to interrupts so that they can perform operations that are not possible in an interrupt service routine (ISR). Since EKA2, there has been support for multiple DFC queues and threads. However, it is common for device drivers that execute in a context other than their client’s thread to share the kernel’s DFC thread zero, which is the thread associated with DFC queue zero. This approximates to the behavior of drivers on EKA1, where only a single DFC queue is supported. Any driver that uses DFC thread zero will execute in a shared thread context. So Symbian now recommends that each driver uses its own thread and DFC queue. We have already modified all of our own drivers in this way. To assist you in this change, the kernel (from version Symbian OS v9.3 onwards) provides a new dynamic queue class, TDynamicDfcQue. // TDynamicDfcQue derives from TDfcQue and adds // TDynamicDfcQue derives from TDfcQue and adds // a method to destroy the queue easily: // a method to destroy the queue easily: class TDynamicDfcQue : public TDfcQue class TDynamicDfcQue : public TDfcQue { { public: public:
Enabling Demand Paging on a New Platform
83
TDynamicDfcQue(); IMPORT_C void Destroy(); private: TDfc iKillDfc; };
You create queues by calling a new method in the Kern class: TInt Kern::DynamicDfcQCreate(TDynamicDfcQue*& aDfcQ, TInt aPriorTInt Kern::DynamicDfcQCreate(TDynamicDfcQue*& aDfcQ, ity, TInt aPriority, const TDesC& aBaseName); const TDesC& aBaseName);
The arguments are used as follows: • aDfcQ is set to the created queue if the operation is successful. • aPriority is the priority of the thread created. • aBaseName is used to name the thread; an eight-digit hex number is appended to it to make it unique. The method returns KErrNone if successful, or one of the standard error codes. For example, the following code could be added to a physical device driver (PDD) entry point to create a DFC queue (error handling omitted for brevity): const const TInt TInt KThreadPriority KThreadPriority = = 27; 27; _LIT8(KThreadBaseName,”DriverThread”); _LIT8(KThreadBaseName,”DriverThread”); TDynamicDfcQue* TDynamicDfcQue* pDfcQ; pDfcQ; TInt TInt r r = = Kern::DynamicDfcQCreate(pDfcQ, Kern::DynamicDfcQCreate(pDfcQ, KThreadPriority, KThreadPriority, KThreadBaseName); KThreadBaseName); pdd->SetDfcQ(pDfcQ); pdd->SetDfcQ(pDfcQ);
Remember to delete the DFC queue from the PDD object’s destructor:
84
Demand Paging on Symbian
DPddObject::~DPddObject() DPddObject::~DPddObject() { if (iDfcQ) iDfcQ->Destroy(); }
Note that if you have several drivers making use of a common peripheral bus, then you will need to ensure that the code managing the bus is thread safe. You will do this by using mutexes to protect state, rather than relying on only one driver being able to execute at once as you might have done before. There are several types of device driver architecture. A driver may be dynamically loaded or boot loaded (if it is a kernel extension). It might have a single channel or multiple channels, or a PDD and an LDD, or an LDD only. Each of these different architectures needs you to create the dedicated DFC queue at a different place in your code. I discuss this in detail, with code examples, in Section 5.3. Code running in DFCQue1 must not access user memory Symbian’s system timer code uses DFCQue1, so any paging fault caused in this queue’s associated DFC thread one will have a serious negative impact on the system and could result in deadlock. Try to access user memory in the context of the client thread If the driver takes a page fault while it executes in the client thread’s context, or in a thread context exclusively used by the driver channel associated with the client, then it will only affect the performance of the driver and its client. It will not delay the running of other drivers as it might if the fault were taken in a kernel context. This makes it a better design decision to derive your device drivers from DLogicalChannelBase rather than DLogicalChannel. This ensures that code will access user memory in the context of the client thread rather than the kernel’s DFC thread, and means that only clients accessing demand-paged memory will suffer the impacts of paging. This is especially important if it’s possible for the driver to have more than one client. Following this rule effectively moves the impact of demand paging into the user-
Enabling Demand Paging on a New Platform
85
side client, which can then choose to use demand-paged memory (or not), knowing that other clients won’t affect this choice. Avoiding page faults in your kernel-side driver code Instead of the driver copying demand-paged client memory directly into kernel memory that uses mutex protection, it should copy this data to a temporary kernel-side buffer first, and then copy it safely to its final location. There are a couple of techniques you can use for this: Exchanging data using shared chunks Your device driver can create a shared chunk and map it to a linear address space in the kernel process, which is never paged out. Future accesses to the data in this shared chunk will not cause page faults. But note that changing an existing driver that exchanges data using buffers or descriptors to use shared chunks is not a trivial task. I recommend that you only use this method in exceptional circumstances. Copying data to the kernel stack Another useful technique is to copy the data to a kernel-side stack-based buffer in the context of the client’s thread. This means that any accesses to paged data will fault in the context of the client thread. This method is not efficient because it involves a data copy. It should only be used when the amount of the data is known to be small (less than about 512 bytes), and when the additional overhead is considered acceptable. Unfortunately, apart from the methods mentioned above, there is no general technique you can use when reworking kernel-side code to avoid accessing pageable memory while holding mutexes. You will need to find specific solutions for your own particular situation, and may have to re-architect your code. If it were feasible to have complete knowledge of all the software on the phone and its interactions, then it might be possible to prove that a certain mutex usage could never cause deadlock, and so was ‘safe’. This is almost impossible on a complex phone and, even if safe mutex usage could be proven, this is likely to be a fragile situation, susceptible to breaking when system code changes. I repeat that the only safe assumption you can make is that any access to pageable memory while holding a mutex has the potential to cause system deadlock.
86
Demand Paging on Symbian
Mutex use If a device driver accesses paged data from a thread other than that of its client, it should use one of the Kern::ThreadXxx() APIs listed in Section 5.2.2.1. These APIs use the system lock, which automatically excludes the use of another NFastMutex. You should ensure that they are never called while holding a DMutex. If a driver reads from its client’s user side memory space while executing in its client’s thread context, it must use one of the following published APIs: // // klib.h klib.h umemget() umemget() umemget32() umemget32() kumemget() kumemget() kumemget32() kumemget32() // // kernel.h kernel.h Kern::KUDesGet() Kern::KUDesGet() Kern::KUDesInfo() Kern::KUDesInfo() Kern::InfoCopy() Kern::InfoCopy()
These APIs have a precondition that excludes their use with a DMutex, because they cannot be called from a critical section. Again, you must ensure they are not called while holding an NFastMutex. Kernel ASSERTs The functions listed in Section 5.2.3.1 contain asserts, active in UDEB builds, which cause a system fault if called while (most kinds of) system mutex are held. This can help you to identify code that needs modification for demand paging. However, to ease the integration of demand paging, these asserts do permit mutexes with an order value of KMutexOrdGeneral0 through KMutexOrdGeneral7. This should not be taken as indicating that these mutexes are safe for use with demand paging. The assertion statements in Kern::ThreadRawWrite() do not trigger if the source address is in the kernel heap or a thread’s supervisor stack. This is because this usage cannot cause paging faults and is explicitly allowed.
Enabling Demand Paging on a New Platform
87
The assertions are not active unless there is demand-paged memory in the system, so will not affect products that do not make use of demand paging.
5.3 Guidelines for Migrating Device Drivers 5.3.1 Typical Device Driver Architectures in EKA2
In this section, I’ll take look at typical device driver architectures in EKA2 and point out those that are more susceptible to taking page faults. As we saw in Section 5.2.2, if a device driver needs to access data structures via pointers passed by its paged clients and these data structures live in the client’s read-only (constant) data section, then passing a pointer to this paged data to the driver could result in a page fault when the driver de-references the pointer. I’ll also point out those architectures that are most likely to affect the performance of other device drivers or other clients of those drivers. Boot-loaded non-channel-based drivers Boot-loaded device drivers are built as kernel extensions. Typically, this form is used by simple device drivers with limited or no user-side client interface. Kernel extensions are used to provide user-interface services, such as support for hardware keypads, keyboards, LCD and touch-sensitive screens. The interface to the user-side client (that is, the window server and associated components such as the user-mode screen driver) is either event-based (when the driver needs to pass data to the client) or uses the hardware abstraction layer (HAL). Kernel extensions in this category may execute some of their operation in the context of a kernel thread. Although HAL calls can be used to pass data structures (often configuration) to drivers in this category, their operation is usually safe because, typically, these data structures are accessed in the context of the calling client thread. But you must take care to validate this assumption, especially when HAL calls are used to reconfigure the driver and/or hardware. When the HAL call needs to synchronize with driver operation, it will be done in a kernel context, and paging issues could arise.
88
Demand Paging on Symbian
Another possible use of extensions is to provide services to other kernel-side components, such as other device drivers. These extensions do not have a userside client interface and are typically used to provide access to services offered by hardware such as direct memory access (DMA) controllers, I2C (inter integrated circuit) buses and power resource controllers. They may execute in the context of a kernel thread, which may not be the same one as their client device driver’s thread. In this situation, you need to take care when the extension accesses user-side data passed in by its client. This driver may pass a pointer to a data structure received from its user-side client, without verifying the pointer or accessing the data. In this situation, the extension could fault. Media drivers Media drivers are channel-based device drivers (PDDs) that are also kernel extensions. They interface to user-side clients (file systems) via the local media subsystem (an LDD and a kernel extension) which creates and manages the channel objects. Typically, the extension part of the media subsystem will perform early initialization – creating the LDD factory object – but the extension entry point of media drivers servicing page-in requests will also create and install the PDD factory object. The channel objects are created on the first access. The recommended model for media drivers either uses a unique DFC queue and associated kernel thread, or has the media driver executing wholly in the context of its client (as the internal RAM media driver does). As we’ve seen, this is because operations on media can be long running, so sharing the execution thread with another driver could result in unacceptable impact on the performance of that driver. The parallelism of file and disk operations would also be impaired. This means that there is no issue with page faults and shared thread contexts in media drivers. One issue arises with media drivers that service page-in requests: if their client – a file system running on an associated file-server drive thread – passes an address in memory that is paged out, and the driver needs to read this in the context of its unique kernel thread, a deadlock could occur, with the media driver thread taking a page fault and thus becoming unable to service the ensuing page-in request. To mitigate this, the media subsystem now ensures that the data is paged in (if necessary taking a page fault in the context of the file-server drive thread) before passing it to the media driver thread. Please refer to Section 5.5
Enabling Demand Paging on a New Platform
89
for more detail on media driver migration, and to Section 4.3 for more on demand page locking. Dynamically loaded, channel-based IO device drivers Channel-based IO drivers may derive from DLogicalChannelBase or DLogicalChannel. They may require a PDD to interface to the hardware, or the LDD may interface to the hardware directly. They may enforce a single client policy or allow multiple clients to open channels, or share an open channel handle. They may allow multiple channels to be opened or enforce a single channel policy. They may support one or many hardware units (typically there is a one-to-one relationship between channels and units, but a driver may support multiple channels on the same unit). These different options affect the likelihood of the occurrence of the issues described in previous sections. Drivers derived from DLogicalChannelBase usually execute in the context of their client. In this case, there is no impact on other drivers if they take a page fault. Multi-threaded drivers may also derive from DLogicalChannelBase, in situations where accesses to hardware can be done concurrently. In this case, the driver will typically create separate kernel threads rather than using shared DFC queues. This architecture is demand-paging safe. If the driver uses a shared DFC queue and single associated kernel thread, then the discussion in the next paragraph about drivers derived from DLogicalChannel applies. The DLogicalChannel framework requires a message queue, DFC queue and associated kernel thread. The class has a pointer to a DFC queue so that it is possible to have each channel execute on a separate kernel thread. Although not enforced, it is expected that each channel will operate on a separate hardware unit, if more than one channel per hardware unit is supported. Those channels should then use the same DFC queue, to avoid complex synchronization mechanisms. If the driver needs a PDD to interface to the hardware, it makes sense to relegate the decision about which DFC queue and kernel thread to use to the PDD, which creates the mapping between channels and hardware units. Although the mechanism exists to allow unique channel contexts, it has been common practice to use the shared DFC queue thread zero on drivers derived from DLogicalChannel. Much of the following discussion will consider this type of driver.
90
Demand Paging on Symbian
I shall next propose some simple solutions for migrating ‘problem’ drivers to a demand-paged system. The overriding principle, which we have seen, is that drivers accessing data structures in their client’s user-side address space do so from either the client’s thread context or from the context of the driver’s unique kernel thread – and if the driver can have more than client, in a driver kernelthread context that is particular to the client. Device drivers execute operations in a kernel-thread context by queuing DFCs on a DFC queue running on that thread. DFCs may be queued as a result of client requests, interrupts, IDFCs, timer expiration and system power up/down. So the requirement for a unique kernel thread translates into one for a unique DFC queue.
5.3.2 Boot-Loaded Device Drivers
You will typically create a DFC queue and associated kernel thread in the kernel extension entry point: const TInt KMyDriverDfcQuePriority = XX; const TInt KMyDriverDfcQuePriority = XX; _LIT(KMyDriveThreadName,MY_DRIVER_THREAD_NAME); _LIT(KMyDriveThreadName,MY_DRIVER_THREAD_NAME); DMyDriver : public DPowerHandler DMyDriver : public DPowerHandler { { ... ... public: public: ... ... TDfcQue* iDfcQ; TDfcQue* iDfcQ; }; }; DECLARE_STANDARD_EXTENSION() DECLARE_STANDARD_EXTENSION() { { TInt r=KErrNoMemory; TInt r=KErrNoMemory; DMyDriver* pH=new DMyDriver; DMyDriver* pH=new DMyDriver; if (pH) if (pH) { { r = Kern::DfcQCreate(iDfcQ, KMyDriverDfcQuePriority, r = Kern::DfcQCreate(iDfcQ, KMyDriverDfcQuePriority, &KMyDriveThreadName); &KMyDriveThreadName); if(KErrNone==r) if(KErrNone==r) { { // second phase construction of DMyDriver // second phase construction of DMyDriver // second phase construction of DMyDriver
Enabling Demand Paging on a New Platform
91
} } } return r; r; return }
Kern::DfcQCreate() creates a DFC queue in the kernel heap, sets iDfcQ to point to it and then invokes Kern::DfcQInit() which creates the kernel thread associated with this DFC queue. Both these APIs must be called in a critical section, which is true for any extension entrypoint. One variation of this form has a global DFC queue and simply invokes Kern::DfcQInit(..) from the entry point – this is the construct used by media drivers: TDfcQue MyDriverDfcQ; const TInt KMyDriverDfcQuePriority = XX; _LIT(KMyDriveThreadName,MY_DRIVER_THREAD_NAME); DECLARE_STANDARD_EXTENSION() { TInt r=KErrNoMemory; DMyDriver* pH=new DMyDriver; if (pH) { r = Kern::DfcQInit(&MyDriverDfcQ, KMyDriverDfcQuePriority, &KMyDriveThreadName); if(KErrNone==r) { // second phase construction of DMyDriver } } return r; }
Kernel extensions are never unloaded, so there is no need to call destructors on the DFC queue or the associated thread.
92
Demand Paging on Symbian
5.3.3 Use of Unique Threads Ownership of DFC queues If a channel-based device driver does not need a PDD, then the DFC queue should be associated with either the LDD factory object (DLogicalDevice-derived) or the logical channel object (DLogicalChannel or DLogicalChannelBase-derived). The following discussion, on associating the DFC queue with the LDD factory or the logical channel, and on the time of the queue’s creation and destruction, applies to LDDs as well as to PDDs. 1. If the driver enforces a single-channel policy then the DFC queue should be associated with the PDD factory object (DPhysicalDevice-derived). The DFC queue should be created as a result of loading the driver and destroyed as a result of unloading the driver – so to enforce a single channel policy, the LDD factory DLogicalDevice-derived object’s Create() function will typically include something like the following:
DMyLDDFactory::Create(DLogicalChannelBase*& aChannel) { if(iOpenChannels!=0) // iOpenChannels is a member of DLogicalDevice // of DLogicalDevice return KErrInUse; ... // now create the Logical Channel }
2. If the driver does not support more than one hardware unit, then the DFC queue should again be associated with the PDD factory object (DPhysicalDevice-derived) that is created when the driver is loaded and destroyed when it is unloaded. The constructor of the LDD factory object of a driver that does not support more than one unit will not set bit one of DLogicalDevice::iParseMask (KDeviceAllowUnit). 3. If a driver supports more than one hardware unit, it might be that the units are implemented by the same hardware block with a shared control interface. In this case, it might be possible to bring the device to an inconsistent state if the shared control interface is accessed from multiple threads. Rather than implementing complex synchronization mechanisms, it may be easier to have all channel operations of the shared interface executing from the same kernel thread context. Again, the DFC queue is associated
Enabling Demand Paging on a New Platform
93
with the PDD factory object, and the queue and kernel thread are created when the driver is loaded, and destroyed when the driver is unloaded. 4. If a driver supports multiple hardware units that are independent from each other and independently controlled, then the ownership of the DFC queue should be given to the PDD object – the physical channel. The DFC queue (and its associated thread) should be created whenever a channel is opened, and destroyed when the channel is closed. Creation and destruction of DFC queues To support the use of DFC queues in dynamically loaded drivers, a new class TDynamicDfcQue has been added to the kernel. It derives from TDfcQue and adds a method to destroy the queue easily: class class TDynamicDfcQue TDynamicDfcQue : : public public TDfcQue TDfcQue { { public: public: TDynamicDfcQue(); TDynamicDfcQue(); IMPORT_C IMPORT_C void void Destroy(); Destroy(); private: private: TDfc TDfc iKillDfc; iKillDfc; }; };
You create queues by calling a new method in the Kern class: TInt Kern::DynamicDfcQCreate(TDynamicDfcQue*& aDfcQ, TInt aPriority, const TDesC& aBaseName);
Where: • aDfcQ is set to the created queue if the operation is successful • aPriority is the priority of the thread create • aBaseName is used to name the thread; an eight-digit hex number is appended to it to make it unique. The method returns KErrNone if successful, or one of the standard error codes. The destruction of the DFC queue used by a device driver should be triggered by
Demand Paging on Symbian
94
the destruction of the object it is associated with. So, to destroy the DFC queue and terminate the thread associated with it, the destroy method must be called: • from the LDD factory destructor or PDD factory destructor, whichever owns the DFC queue (steps 1 to 3 from Section 5.3.3.1) • from the LDD destructor or PDD destructor, whichever owns the DFC queue (point 4 in Section 5.3.3.1). The DLogicalChannel class holds a pointer to the DFC queue used by each logical channel object derived from it. This pointer is typically set up during the second-phase construction of the LDD in the DoCreate() function, which means the DFC queue must have been created by the time the LDD’s DoCreate() is invoked. This is guaranteed when the DFC queue is owned by the LDD or PDD factory objects, because these are created when first loading the logical or physical device. It is also guaranteed in the case when the PDD object owns the DFC queue, as the order of channel construction is as follows: 1. LDD constructor 2. PDD constructor 3. PDD DoCreate() 4. LDD DoCreate() However, when the LDD owns and creates the DFC queue, it is down to you, the developer, to guarantee that the correct pointer to the DFC queue is stored in DLogicalChannel::iDfcQ as part of DoCreate(). Note that the LDD has access to the LDD factory object, the PDD factory object and the PDD through the iDevice, iPhysicalDevice and iPdd pointers in DLogicalChannelBase base class. In the next four sections, I’ll give code examples for these different situations.
Enabling Demand Paging on a New Platform
5.3.4 DFC Queue in Logical Device const TInt KMyDriverThreadPriority = XX; _LIT(KMyDriverThread,”MyDriverThread”); const TInt KMyDriverThreadPriority = 27; _LIT(KMyDriverThread,”MyDriverThread”); class DMyLogicalDevice : public DLogicalDevice { class DMyLogicalDevice : public DLogicalDevice public: { DMyLogicalDevice(); public: ~DMyLogicalDevice(); DMyLogicalDevice(); void Construct(TDynamicDfcQue* aDfcQ); ~DMyLogicalDevice(); virtual TDfcQue* DfcQ(); void Construct(TDynamicDfcQue* aDfcQ); ... virtual TDfcQue* DfcQ(); public: ... ... public: TDynamicDfcQue* iDfcQ; ... }; TDynamicDfcQue* iDfcQ; }; DMyLogicalDevice::DMyLogicalDevice() // Constructor DMyLogicalDevice::DMyLogicalDevice() { // Constructor // { sets iVersion and iParseMask but leaves bit 2 // // (KDeviceAllowPhysicalDevice) sets iVersion and iParseMask unset but leaves bit 2 } // (KDeviceAllowPhysicalDevice) unset } DMyLogicalDevice::~DMyLogicalDevice() // Destructor DMyLogicalDevice::~DMyLogicalDevice() { // Destructor ... { // ...cancel any other DFCs owned by this device if // (iDfcQ) cancel any other DFCs owned by this device ifiDfcQ->Destroy(); (iDfcQ) } iDfcQ->Destroy(); } void DMyPhysicalDevice::Construct(TDynamicDfcQue* aDfcQ) { DMyPhysicalDevice::Construct(TDynamicDfcQue* aDfcQ) void iDfcQ=aDfcQ; { } iDfcQ=aDfcQ; } TDfcQue* DMyLogicalDevice::DfcQ() {
95
96
Demand Paging on Symbian
TDfcQue* DMyLogicalDevice::DfcQ() return iDfcQ; { } return iDfcQ; } DECLARE_STANDARD_LDD() { DECLARE_STANDARD_LDD() DMyLogicalDevice* pD=new DMyLogicalDevice; { if(pD) DMyLogicalDevice* pD=new DMyLogicalDevice; { if(pD) TDynamicDfcQ* q; { TInt r = Kern::DynamicDfcQCreate( q, KMyDriverThreadPriority, TDynamicDfcQ* q; KMyDriverThread); TInt r = Kern::DynamicDfcQCreate( q, KMyDriverThreadPriority, if(KErrNone==r) KMyDriverThread); { if(KErrNone==r) pD->Construct(q); { return pD; pD->Construct(q); } return pD; pD->AsyncClose(); } } pD->AsyncClose(); return NULL; } } return NULL; // }Logical Channel DMyDriverLogicalChannel::DoCreate(TInt aUnit, // Logical Channel const TDesC8* /*anInfo*/, DMyDriverLogicalChannel::DoCreate(TInt aUnit, const TVersion &aVer) const TDesC8* /*anInfo*/, { const TVersion &aVer) ... { SetDfcQ(iDevice->DfcQ()); ... SetDfcQ(iDevice->DfcQ()); } ... }
In the previous code extract, we create a dynamic DFC queue on the kernel heap and arrange for the logical device object to have a pointer to it. When the logical device is loaded, the DLL entry point is invoked with KModuleEntryReasonProcessAttach, which invokes the LDD-specific initialization DECLARE_STANDARD_LDD(). The LDD-specific entry point creates the LDD factory object and, if successful, creates the dynamic DFC queue (and
Enabling Demand Paging on a New Platform
97
associated thread). The logical channel uses the pointer to the logical device to obtain the DFC queue. A possible variation to this scheme creates the dynamic DFC queue the DLogicalDevice-derived Install() function. This simplifies the entry point: _LIT(KLddName,”MyDriver”); TInt DMyLogicalDevice::Install() // Install the device driver. { TDynamicDfcQ* q; TInt r = Kern::DynamicDfcQCreate(q, KMyDriverThreadPriority, KMyDriverThread); if(KErrNone==r) { Construct(q); r=SetName(&KLddName); } return r; } DECLARE_STANDARD_LDD() { return new DMyLogicalDevice; }
5.3.5 DFC Queue in Physical Device class class DMyPhysicalDevice DMyPhysicalDevice : : public public { { public: public: DMyPhysicalDevice(); DMyPhysicalDevice(); ~DMyPhysicalDevice(); ~DMyPhysicalDevice(); void void Construct(TDynamicDfcQue* Construct(TDynamicDfcQue* virtual virtual TDfcQue* TDfcQue* DfcQ(); DfcQ(); ... ... public: public: ... ... TDynamicDfcQue* TDynamicDfcQue* iDfcQ; iDfcQ; }; };
DPhysicalDevice DPhysicalDevice
aDfcQ); aDfcQ);
Demand Paging on Symbian
98
DMyPhysicalDevice::DMyPhysicalDevice() // Constructor { // sets iVersion and iUnitMask (if required) } DMyPhysicalDevice::~DMyPhysicalDevice() // Destructor { ... // cancel any other DFCs owned by this device if (iDfcQ) iDfcQ->Destroy(); } void DMyPhysicalDevice::Construct(TDynamicDfcQue* aDfcQ) { iDfcQ=aDfcQ; } TDfcQue* DMyPhysicalDevice::DfcQ() { return iDfcQ; } DECLARE_STANDARD_PDD() { DMyPhysicalDevice* pD=new DMyPhysicalDevice; if(pD) { TDynamicDfcQ* q;
TInt r = Kern::DynamicDfcQCreate(q, KMyDriverThreadPriority, KMyDriverThread);
if(KErrNone==r)
{ pD->Construct(q); return pD; } pD->AsyncClose(); } return NULL; }
Enabling Demand Paging on a New Platform
99
// Logical Channel DMyDriverLogicalChannel::DoCreate(TInt aUnit, const TDesC8* /*anInfo*/, const TVersion &aVer) { ... SetDfcQ(iPhysicalDevice->DfcQ()); ... }
The previous code extract uses the same principles outlined in the previous paragraph, with the main differences being: • The pointer to the DFC queue is owned by the physical device. • The PDD entry point creates the physical device and the DFC queue. • The logical channel uses the pointer to the physical device to obtain the DFC queue. Again, you can create the DFC queue in the DPhysicalDevice-derived Install() function.
5.3.6 DFC Queue in Logical Channel class DMyDriverLogicalChannel : public DLogicalChannel class DMyDriverLogicalChannel : public DLogicalChannel { { public: public: DMyDriverLogicalChannel(); DMyDriverLogicalChannel(); virtual ~DMyDriverLogicalChannel(); virtual virtual ~DMyDriverLogicalChannel(); TInt DoCreate(TInt aUnit, const TDesC8* anInfo, virtual TInt DoCreate(TInt aUnit,const constTVersion& TDesC8* anInfo, aVer); const TVersion& aVer); ... ... public: public: ... ... // DLogicalChannel has a public pointer to a TDfcQue, iDfcQ // }; DLogicalChannel has a public pointer to a TDfcQue, iDfcQ }; // Logical Device // Logical Device DMyLogicalDevice::DMyLogicalDevice() DMyLogicalDevice::DMyLogicalDevice() // Constructor //{Constructor {
100
Demand Paging on Symbian
... // sets iVersion and iParseMask with bit 1 (KDeviceAllowUnit) // set and bit 2 (KDeviceAllowPhysicalDevice) unset } TInt DMyLogicalDevice::Create(DLogicalChannelBase*& aChannel) { aChannel=new DMyDriverLogicalChannel; if(!aChannel) return KErrNoMemory; return KErrNone; } // Logical Channel DMyDriverLogicalChannel::DMyDriverLogicalChannel() // Constructor { // may set up pointer to owning client’s thread // and increase its reference count // iDfcQ=NULL; } DMyDriverLogicalChannel::~DMyDriverLogicalChannel() // Destructor { // may also decrease the owning client’s thread reference count ... // cancel any other DFCs owned by this channel if (iDfcQ) iDfcQ->Destroy(); }
TInt DMyDriverLogicalChannel::DoCreate(TInt aUnit, const TDesC8* anInfo, const TVersion& aVer) { // check platform security capabilities ... TInt r= KErrNoMemory;
Enabling Demand Paging on a New Platform
101
TDynamicDfcQue* q; r = Kern::DfcQCreate(q, KMyDriverThreadPriority, KMyDriverThread); if (KErrNone==r) { SetDfcQ(q); iMsgQ.Receive(); return r; } return r; }
// if error, framework will delete Logical Channel
In the example, the DFC queue is owned by the logical channel. The secondphase constructor (DoCreate()) creates the queue. If successful, it sets the DLogicalChannel pointer to DFC queue (iDfcQ). When the channel is closed, the destructor of the logical channel is invoked and this destroys the DFC queue.
5.3.7 DFC Queue in PDD
class DMyDriver DMyDriver : : public public DBase DBase class { { public: public: DMyDriver(); DMyDriver(); ~DMyDriver(); ~DMyDriver(); TInt DoCreate(TInt DoCreate(TInt aUnit, aUnit, const const TDesC8* TDesC8* anInfo); anInfo); TInt virtual TDfcQue* DfcQ(TInt aUnit); virtual TDfcQue* DfcQ(TInt aUnit); ... ... public: public: ... ... DLogicalChannel* iLdd; iLdd; DLogicalChannel* TInt iUnit; TInt iUnit; TDynamicDfcQue* iDfcQ; iDfcQ; TDynamicDfcQue* }; }; // Logical Logical Device Device // DMyLogicalDevice::DMyLogicalDevice() DMyLogicalDevice::DMyLogicalDevice() // Constructor Constructor // {// Sets iVersion iVersion and and iParseMask iParseMask with with bit bit 1 1 (KDeviceAllowUnit) (KDeviceAllowUnit) {// Sets
102
Demand Paging on Symbian
// and bit 2 (KDeviceAllowPhysicalDevice) set ... } // Physical Device TInt DMyPhysicalDevice::Create(DBase*& aChannel, TInt aUnit, const TDesC8* aInfo, const TVersion& aVer) { DMyDriver* pD=new DMyDriver; aChannel=pD; TInt r=KErrNoMemory; if (pD) r=pD->DoCreate(aUnit,aInfo); return r; } // PDD DMyDriver::DMyDriver() // Constructor { ... //iDfcQ=NULL; } DMyDriver::~DMyDriver() // Destructor { ... // cancel any other DFCs owned by this channel if (iDfcQ) iDfcQ->Destroy(); } TInt DMyDriver::DoCreate(TInt aUnit, const TDesC8* /*anInfo*/) { iUnit=aUnit; TInt r=KErrNoMemory; TDynamicDfc* q; r = Kern::DfcQCreate(q, KMyDriverThreadPriority, KMyDriverThread); if (KErrNone==r)
Enabling Demand Paging on a New Platform
103
{ } iDfcQ=pDfcQ; return r; // if error, framework will delete LDD and PDD } } return r; // if error, framework will delete LDD and PDD } TDfcQue* DMyDriver::DfcQ(TInt aUnit) { TDfcQue* DMyDriver::DfcQ(TInt aUnit) TDfcQue* pDfcQ=NULL; { if(aUnit==iUnit) TDfcQue* pDfcQ=NULL; pDfcQ=iDfcQ; if(aUnit==iUnit) return pDfcQ; } pDfcQ=iDfcQ; return pDfcQ; //}Logical Channel
DMyDriverLogicalChannel::DoCreate(TInt aUnit, const TDesC8* /*anInfo*/,
// Logical Channel
const TVersion &aVer)
DMyDriverLogicalChannel::DoCreate(TInt { aUnit, const TDesC8* /*anInfo*/,
... { SetDfcQ(iPdd->DfcQ(aUnit)); ... ... SetDfcQ(iPdd->DfcQ(aUnit)); } ...
const TVersion &aVer)
Key points:
• The PDD owns pointers to DFC queue. • The PDD requires a second-phase construction, which creates the DFC queue. • The DFC queue is associated to a hardware unit. • The logical channel obtains the DFC queue through its pointer to the PDD.
5.3.8 Changes to Symbian Device Drivers
In Symbian driver code, we have moved the initialization of DFC queues from platform-independent code to platform-specific code. This provides a mechanism for platform-specific code to create its own DFC queue as recommended in Section 5.2.4.1. To make use of this mechanism, some specific drivers have been modified in Symbian OS v9.3, breaking binary compatibility in the process. These cases are described next. Base ports that use these drivers will need to be migrated. The base ports supplied by Symbian have already been migrated.
104
Demand Paging on Symbian
USB driver The USB driver originally used DfcQue0 for its iPowerUpDfc and iPowerDownDfc members and set this in the platform-independent layer. From e32/drivers/usbcc/ps_usbc.cpp: DUsbClientController::DUsbClientController() DUsbClientController::DUsbClientController() { { __KTRACE_OPT(KUSB, __KTRACE_OPT(KUSB, Kern::Printf(“DUsbClientController::DUsbClientController()”)); Kern::Printf(“DUsbClientController::DUsbClientController()”)); #ifndef SEPARATE_USB_DFC_QUEUE #ifndef SEPARATE_USB_DFC_QUEUE iPowerUpDfc.SetDfcQ(Kern::DfcQue0()); iPowerUpDfc.SetDfcQ(Kern::DfcQue0()); iPowerDownDfc.SetDfcQ(Kern::DfcQue0()); iPowerDownDfc.SetDfcQ(Kern::DfcQue0()); #endif #endif } }
The driver now requires initialization of iPowerUpDfc and iPowerDownDfc in the platform-specific layer. In the Symbian-provided base-ports, a dedicated DFC queue is also created. From omap/shared/usb/pa_usbc.cpp: #ifdef SEPARATE_USB_DFC_QUEUE #ifdefTInt SEPARATE_USB_DFC_QUEUE const KUsbThreadPriority = 27; const TInt KUsbThreadPriority = 27; _LIT8(KUsbThreadName,”UsbThread”); _LIT8(KUsbThreadName,”UsbThread”); TInt TOmapUsbcc::CreateDfcQ() TInt { TOmapUsbcc::CreateDfcQ() { TInt r=Kern::DfcQCreate(iDfcQ,KUsbThreadPriority,&KUsbThreadNa TInt r=Kern::DfcQCreate(iDfcQ,KUsbThreadPriority,&KUsbThreadName); me); if (KErrNone (KErrNone != != r) r) if { { __KTRACE_OPT(KHARDWARE, Kern::Printf(“PSL: > Error initializing __KTRACE_OPT(KHARDWARE, Kern::Printf(“PSL: > Error initialUSB client support. Can’t create DFC Que”)) ; izing return r; USB client support. Can’t create DFC Que”)) } ; return r; iPowerUpDfc.SetDfcQ(iDfcQ); } iPowerUpDfc.SetDfcQ(iDfcQ);
Enabling Demand Paging on a New Platform
105
iPowerDownDfc.SetDfcQ(iDfcQ); return KErrNone; } #endif
You can restore the original functionality by undefining the SEPARATE_USB_DFC_ QUEUE macro in e32/kernel/kern_ext.mmh. Sound driver A new virtual abstract method has been added to the class DSoundPDD:
virtual TDfcQue* DfcQ() = 0;
You must define this method in the base-port-derived object and ensure it returns the DFC queue to use. I recommend that you change the driver so that it creates its own DFC queue, as described in Section 5.2.4.1. However, the minimum required change is to implement the function to return a pointer to DFC queue zero. For example, the following implementation would suffice: TDfcQue* TDfcQue* { { return return } }
DSoundPddDerived::DfcQ() DSoundPddDerived::DfcQ() Kern::DfcQue0(); Kern::DfcQue0();
DDigitiser The class DDigitiser no longer initializes the member variable iDfcQ (DFC queue pointer). You must initialize this variable in the base-port derived object. We recommend that you change the driver so that it creates its own DFC queue, as described in Section 5.2.4.1. However, the minimum required change is to set the variable to use DFC queue zero in the derived object. For example, you could add the following code to the derived class constructor: DDigitiserDerived::DDigitiserDerived() DDigitiserDerived::DDigitiserDerived() { { // ... // ... = Kern::DfcQue0(); iDfcQ iDfcQ = Kern::DfcQue0(); }
106
Demand Paging on Symbian
// ... }
5.3.9 Additional Impact of Migrating Device Drivers
Once you have implemented the recommendations in Section 5.2.4, you will have a more multi-threaded base port. In general, this is a good thing because it provides fairer scheduling of kernel-side code. However, there may be negative impacts that need considering: • A base-port component may provide a service to device drivers, hardware or software, which, once initiated, should not be disturbed until it completes – but now the service might be pre-empted by another kernel thread. • A hardware component may have a control interface that can be used by a number of drivers. Operations on the control interface, although almost instantaneous, may not be atomic and therefore should not be interrupted. In the first situation, when the state of a resource needs to be protected from the effects of pre-emption for an appreciable period of time, the recommended approach is to use mutual exclusion, protecting the resource with a DMutex. An exception to this is where the only risk is of the same driver triggering the same operation before the previous one completes – that is, when an operation is nonblocking and occurs from different thread contexts. In that case, an NFastMutex should suffice. An example of the second situation is a ‘set-clear’ control interface, with a pair of registers, where one (A) contains bits to be set and the other (B) contains bits to be cleared. You have to write to both registers to produce the desired state. If the operation is pre-empted after A is set but before B is cleared, and a new setclear operation is initiated, the final state of the interface may be undetermined. Pre-emption protection in this case may be achieved by simply locking the kernel (using NKern::Lock()) before the operation starts and unlocking it (using NKern::Unlock()) after it completes. If the interface is to be used from an interrupt context, it is sufficient to disable all interrupts around it to protect against thread concurrency.
Enabling Demand Paging on a New Platform
107
5.4 Media Driver Migration The media driver is a key component in the operation of demand paging. As you can see in Figure 2, the NAND driver on an NAND XIP ROM is responsible for servicing page-in requests from the paging subsystem. This means that if you are implementing demand paging on your platform, you will have to modify your media drivers so that they support these additional requests. As before, it is essential that the thread in which the driver runs does not itself take a page fault, otherwise deadlock will occur. A media driver is typically a PDD with a filename in the form ‘med*.pdd’. Like other kernel-side components, it is always marked as unpaged, which means that its code and read-only data sections will never be paged out. The only time the media driver could theoretically take a page fault is when it accepts a write request from a user-side client whose source data is paged out – this could be data in the paged area of an XIP ROM or code that has been loaded into RAM from code-paging-enabled media. To remedy this, Symbian has modified the local media subsystem to ensure that the source data in a write request is paged in before the write request is passed to the media driver thread. This may mean taking a page fault in the context of the file-server drive thread before passing the request on. Large write requests of paged data are fragmented into a series of smaller ones to avoid exhausting available RAM. Such fragmentation is quite rare but it might happen, for example, when copying a large ROM data file into a temporary location on the user data drive. I explain the steps needed to enable a media driver to support XIP ROM and/or code paging in the following sections. For the specific changes required to support paging from internal MMC/SD card, see Section 5.5.6.
5.4.1 Changes to variantmediadef.h
To support paging, you should define the following parameters using appropriate macro names (the names are not important) in the variant’s variantmediadef.h file: 1. The paging flags – whether code paging and/or XIP ROM paging is supported.
108
Demand Paging on Symbian
2. The paging fragment size. If a write request points to paged data, then the request will be split up into separate fragments of this size. This value needs to be chosen with care. If it is too small, writes may take an unacceptably long time to complete. If it is too large, paging requests may take an unacceptably long time to be satisfied. 3. The number of drives that support code paging. If code paging is not supported (that is, only XIP ROM paging is supported), this should be zero. 4. The list of local drives that support code paging (if code paging is supported). This should be a subset of the overall drive list supported by the media driver. For example, here (in bold italics) are the changes made to support paging on NAND on the H4 reference platform: // Variant parameters for NAND flash media driver (mednand.pdd) // Variant parameters for NAND flash media driver (mednand.pdd) #define NAND_DRIVECOUNT 8 #define NAND_DRIVECOUNT 8 #define NAND_DRIVELIST 2,3,5,6,7,9,10,11 #define NAND_DRIVELIST 2,3,5,6,7,9,10,11 #define NAND_NUMMEDIA 1 #define NAND_NUMMEDIA 1 #define NAND_DRIVENAME “Nand” #define NAND_DRIVENAME “Nand” #define PAGING_TYPE DPagingDevice::ERom | DPagingDevice::ECode #define PAGING_TYPE DPagingDevice::ERom | DPagingDevice::ECode // code paging from writeable FAT, Composite FAT and first ROFS // code paging from writeable FAT, Composite FAT and first ROFS #define NAND_PAGEDRIVELIST 2,5,6 #define NAND_PAGEDRIVELIST 2,5,6 #define NAND_PAGEDRIVECOUNT 3 #define NAND_PAGEDRIVECOUNT 3 #define NUM_PAGES 8 // defines the size of fragment #define NUM_PAGES 8 // defines the size of fragment
The macros can then be picked up in the media driver source code and passed to LocDrv::RegisterPagingDevice(). This function is similar to LocDrv::RegisterMediaDevice() in that it takes a drive list as a parameter but in this case it identifies the drive(s) to be used for code paging (if any).
Enabling Demand Paging on a New Platform
109
5.4.2 Changes to the Driver’s Kernel Extension Entry Point
There are two initial stages in a media driver’s lifetime that need to be considered: 1. The kernel extension entry point – normally identified by the DECLARE_STANDARD_EXTENSION macro 2. The PDD entry point – identified by the DECLARE_EXTENSION_PDD macro. A media driver’s kernel extension entry point is called very early on in the boot sequence. Sometime later, the file server loads all media drivers and calls their PDD entry points. Each PDD exports a single function at ordinal one for creating the PDD factory object. When the file server issues the first request to a drive object associated with the media, the local media subsystem calls the factory object’s Create() function to instantiate the media driver object. However, for demand paging to start as soon as possible in the boot sequence, we need to instantiate and install the PDD factory object earlier – in the kernel extension entry point. Some media drivers may have no kernel extension entry point defined (for example, the MMC media driver). These will have a DECLARE_STANDARD_PDD macro defined rather than DECLARE_EXTENSION_PDD. You will need to modify these to have a DECLARE_EXTENSION_PDD / DECLARE_STANDARD_EXTENSION pair. The kernel extension entry point must create a dedicated DFC queue (as discussed earlier) – otherwise a page fault in a drive thread cannot be satisfied. The entry point must then create a DPrimaryMediaBase object and register it with the local media subsystem. To support demand paging, you should modify the entry point to register the paging device with the paging subsystem, and instantiate and install the driver factory object. The following is an example of such a change (changes in bold italics): static const TInt NandPagingDriveNumbers[NAND_PAGEDRIVECOUNT+1] = static const TInt NandPagingDriveNumbers[NAND_PAGEDRIVECOUNT+1] = {NAND_PAGEDRIVELIST}; {NAND_PAGEDRIVELIST}; DECLARE_STANDARD_EXTENSION() DECLARE_STANDARD_EXTENSION()
110
Demand Paging on Symbian
{ TInt r=Kern::DfcQInit(&NandMediaDfcQ, andThreadPriority, &KNandMediaThreadName); if (r!=KErrNone) return r; DPrimaryMediaBase* pM=new DPrimaryMediaBase; if (!pM) return r; pM->iDfcQ=&NandMediaDfcQ; r=LocDrv::RegisterMediaDevice(MEDIA_DEVICE_NAND, NAND_DRIVECOUNT, NandDriveNumbers, pM, NAND_NUMMEDIA, KNandDriveName); if (r != KErrNone) return r; r = LocDrv::RegisterPagingDevice(pM, NandPagingDriveNumbers, NAND_PAGEDRIVECOUNT, PAGING_TYPE, SECTOR_SHIFT, NUM_PAGES);
if (r == KErrNone) { device = new DPhysicalDeviceMediaNand; if (device == NULL) return KErrNoMemory; r = Kern::InstallPhysicalDevice(device); }
// Ignore error if demand paging not supported by kernel else if (r == KErrNotSupported) r = KErrNone; else
Enabling Demand Paging on a New Platform
111
return r; return r; pM->iMsgQ.Receive(); pM->iMsgQ.Receive(); return KErrNone; return KErrNone; } }
Note that: • A hardware component may have a control interface that can be used by a number of drivers. Operations on the control interface, although almost instantaneous, may not be atomic and therefore should not be interrupted. • The DECLARE_EXTENSION_PDD entry point will still be called some time later when the file server tries to load all the media drivers in the system. When this happens, the media driver will create a second factory object, but this will be deleted by the kernel when it discovers that another factory object bearing the same name is already in its internal list. • The fifth parameter passed to LocDrv::RegisterPagingDevice() is the log2 of the sector size for the given media, for example, nine (corresponding to a sector size of 512) for most media. • To prevent compilation errors when code paging is disabled (NAND_ PAGEDRIVECOUNT is zero), the drive number array passed to LocDrv::RegisterPagingDevice() is one greater in length than the drive count.
5.4.3 Changes to TLocalDriveCaps
You should modify the TLocalDriveCaps structure so that: • The KMediaAttPageable flag is set in iMediaAtt. • The KDriveAttPageable flag is set if the particular drive has been registered as a code-paging drive (determined by testing TLocDrvRequest::Drive()->iPagingDrv). Here is an example (changes in bold italics): TInt DMediaDriverNand::Request(TLocDrvRequest& aRequest)
112
Demand Paging on Symbian
{ TInt r=KErrNotSupported; TInt id=aRequest.Id(); if (id == DLocalDrive::ECaps) { TLocDrv* drive = aRequest.Drive(); TLocalDriveCapsV4& c = *(TLocalDriveCapsV4*)aRequest.RemoteDes(); r=Caps(*drive,c); } // etc } TInt DMediaDriverNand::Caps(TLocDrv& aDrive, TLocalDriveCapsV4& caps) {// fill in rest of caps structure as usual… if(aDrive.iPrimaryMedia->iPagingMedia) caps.iMediaAtt|=KMediaAttPageable; if(aDrive.iPrimaryMedia->iPagingMedia) if(aDrive.iPagingDrv) caps.iMediaAtt|=KMediaAttPageable; caps.iDriveAtt|=KDriveAttPageable; if(aDrive.iPagingDrv) } caps.iDriveAtt|=KDriveAttPageable; }
Additionally, the TLocalDriveCaps::iDriveAtt member must have the KDriveAttLocal and KDriveAttInternal flags set, and the KDriveAttRemovable flag cleared. Demand paging is only supported for internal nonremovable media.
5.4.4 Handling Paging Requests
You need to handle four new request types to support paging; the enumeration is TPagingRequestId in the DMediaPagingDevice class. • ERomPageInRequest – treat this as a normal read except that the position stored in the request is the offset from the start of the XIP ROM image, not the start of the media. This is because the local media subsystem has no way of knowing the absolute position of a particular XIP ROM page from the start of the media. Also, to write the data back to the
Enabling Demand Paging on a New Platform
113
client, use TLocDrvRequest::WriteToPageHandler() instead of TLocDrvRequest::WriteRemote(). • ECodePageInRequest – treat this as a normal read, but use TLocDrvRequest::WriteToPageHandler() instead of TLocDrvRequest::WriteRemote() to write data back to the client. The position in the request is the offset from the start of the media, as for a normal read. • EWriteRequestFragment, EWriteRequestFragmentLast – these requests mark the start, middle or end of a sequence of writes. Each sequence is terminated by a EWriteRequestFragmentLast request (so long as one of the previous requests does not complete with an error).
5.4.5 Coping with Fragmented Write Requests
In many respects, you can treat EWriteRequestFragment and EWriteRequestFragmentLast as normal write requests. However, you should note that any of these write requests may be interleaved with requests from other file-server drive threads (assuming the media supports more than one partition) – which could be seen as a functional break in behavior. If you need to maintain backwards compatibility, and prevent write requests from being interleaved in this way, it is up to the media driver itself to keep track of the ‘current’ write request chain and defer requests from other drive threads while a write fragment chain is in progress. To achieve this, two steps are necessary: 1. Ensure the local media subsystem LDD (elocd.ldd) has been built with the __ALLOW_CONCURRENT_FRAGMENTATION__ macro undefined. This ensures that the local media subsystem never issues more than one write fragment at a time. 2. Change the paging media driver so that it keeps track of write request chains and defers any read or format requests received after the first fragment and before the last in a sequence. Note that write fragments should never be deferred. One way in which you could implement step two is for the media driver to maintain a bit mask, with each bit representing a ‘write fragment in progress’ flag for a particular drive. For example:
114
Demand Paging on Symbian
iFragmenting |= (0x1<Drive()->iDriveNumber); iFragmenting |= (0x1<Drive()->iDriveNumber); Then if a read or format request is received while any of the bits in iFragmenting is set, then the request may be deferred.
5.4.6 Paging From an Internal MMC/SD Card MMC PSL changes You can enable ROM and code paging for an MMC card, provided the card is non-removable. (If a page-in request was issued when the card was removed, the kernel would fault.) Because the MMC media driver is entirely generic, we need a way of returning the paging-related information contained in variantmedia.def to the generic part of the MMC stack. We do this by modifying the PSL layer of the MMC stack to implement the (new) DMMCStack ::MDemandPagingInfo interface method, as shown in the following code block (new code is in bold italics). // mmc.h mmc.h // class DMMCStack : : public DBase class DMMCStack public DBase { { public: public: ... ... // Demand Demand paging paging support support // // see see KInterfaceDemandPagingInfo KInterfaceDemandPagingInfo // class TDemandPagingInfo TDemandPagingInfo class { { public: public: const TInt* TInt* iPagingDriveList; iPagingDriveList; const TInt iDriveCount; iDriveCount; TInt TUint iPagingType; iPagingType; TUint TInt iReadShift; iReadShift; TInt TUint iNumPages; iNumPages; TUint TBool iWriteProtected; iWriteProtected; TBool TUint iSpare[3]; iSpare[3]; TUint }; }; class MDemandPagingInfo MDemandPagingInfo class { {
115
Enabling Demand Paging on a New Platform
public:
virtual TInt DemandPagingInfo(TDemandPagingInfo& aInfo) = 0; }; ... };
Here is an example, taken from the H4 HRP: variantmedia.def changes (shown in bold italics):
Variantparameters parametersfor forthe theMMC MMCController Controller(EPBUSMMC.DLL) (EPBUSMMC.DLL) Variant #define MMC_DRIVECOUNT 1 #define MMC_DRIVECOUNT 1 #define MMC_DRIVELIST MMC_DRIVELIST 1 1 #define #define MMC_NUMMEDIA 1 #define MMC_NUMMEDIA 1 #define MMC_DRIVENAME MMC_DRIVENAME “MultiMediaCard0” “MultiMediaCard0” #define
#define MMC_PAGING_TYPE MMC_PAGING_TYPE DPagingDevice::ERom | | #define DPagingDevice::ERom DPagingDevice::ECode DPagingDevice::ECode #define MMC_PAGEDRIVELIST MMC_PAGEDRIVELIST 1 1 // // code code paging paging from from user user #define #define MMC_PAGEDRIVECOUNT 1 #define MMC_PAGEDRIVECOUNT 1 #define MMC_NUM_PAGES MMC_NUM_PAGES 8 #define 8
data data
H4 MMC stack class definition (changes shown in bold italics): DMMCStack::MDemandPagingInfo DMMCStack::MDemandPagingInfo { { public: public: virtual TInt DemandPagingInfo(DMMCStack::TDemandPagingInfo& aInfo); virtual TInt DemandPagingInfo(DMMCStack::TDemandPagingInfo& aInfo); }; }; class DOmapMMCStack : public DCardStack class DOmapMMCStack : public DCardStack { { public: public: virtual void GetInterface(TInterfaceId aInterfaceId, virtual void GetInterface(TInterfaceId aInterfaceId, MInterface*& aInterfacePtr); MInterface*& aInterfacePtr); ... ... private: private: DDemandPagingInfo* iDemandPagingInfo; DDemandPagingInfo* iDemandPagingInfo; ... ... }; };
116
Demand Paging on Symbian
H4 MMC stack class implementation: TInt DOmapMMCStack::Init() { if((iDemandPagingInfo = new DDemandPagingInfo()) == NULL return KErrNoMemory; } void DOmapMMCStack::GetInterface(TInterfaceId aInterfaceId, MInterface*& aInterfacePtr) { if (aInterfaceId==KInterfaceDemandPagingInfo) aInterfacePtr=(DMMCStack::MInterface*)iDemandPagingInfo; } TInt DDemandPagingInfo::DemandPagingInfo(DMMCStack::TDemandPagi ngInfo& aDemandPagingInfo) { static const TInt pagingDriveNumbers[MMC_PAGEDRIVECOUNT] = {MMC_PAGEDRIVELIST+1}; aDemandPagingInfo.iPagingDriveList = pagingDriveNumbers; aDemandPagingInfo.iDriveCount = MMC_PAGEDRIVECOUNT; aDemandPagingInfo.iPagingType = MMC_PAGING_TYPE; aDemandPagingInfo.iReadShift = 9; aDemandPagingInfo.iNumPages = MMC_NUM_PAGES; return KErrNone; } }
Preparing an internal MMC card for ROM paging – MMCLoader The MMCLoader utility can be found in e32utils/mmcloader. It is used to write a ROM image to the internal MMC card, ready for paging. The syntax is as follows: mmcloader <PagedRomDstFileName>
For example: mmcloader mmcloader z:\\core.img z:\\core.img d:\\sys$rom.bin d:\\sys$rom.bin d:\\sys$rom.pag d:\\sys$rom.pag
MMCLoader performs the following steps:
Enabling Demand Paging on a New Platform
117
1. Splits RomSrcFileName into non-paged and paged files 2. Formats the MMC card 3. Writes the paged part of the ROM to a standard FAT image file on the MMC card 4. Checks that the file’s sectors are contiguous (which should normally be case because the card has just been formatted) 5. Stores a pointer to the image file in the boot sector. Then, when the board is rebooted, the MMC/SD media driver reads the boot sector and uses the pointer stored to determine the location of the image file, so that it can begin to satisfy paging requests. Modifying EStart Now we need to prevent the paged and unpaged image files from being unintentionally deleted from the internal MMC drive. To support this, Symbian has added a new mechanism to EStart to allow it to permanently clamp the image files. The variant part of EStart must now implement a new virtual function that returns the image file names: Tint TFSStartup::SysFileNames(RArray& aFileNames); Here is an example taken from the H4 variant layer (in \omap_hrp\h4\estart\estartmain.cpp): // Return the filenames of any “System” files on // Return the filenames of any “System” files on // a writeable drive (e.g internal MMC). // a writeable drive (e.g internal MMC). // If the files are found, then they are clamped // If the files are found, then they are clamped // (and never unclamped) to prevent them // (and never unclamped) to prevent them // from being overwritten. // from being overwritten. TInt TH4FSStartup::SysFileNames(RArray& aFileNames) TInt TH4FSStartup::SysFileNames(RArray& aFileNames) { { _LIT(KPagedRomFileName,”\\SYS$ROM.BIN”); _LIT(KPagedRomFileName,”\\SYS$ROM.BIN”); aFileNames.Append(KPagedRomFileName()); aFileNames.Append(KPagedRomFileName()); _LIT(KUnPagedRomFileName,”\\SYS$ROM.PAG”); _LIT(KUnPagedRomFileName,”\\SYS$ROM.PAG”); aFileNames.Append(KUnPagedRomFileName()); aFileNames.Append(KUnPagedRomFileName()); return KErrNone; return KErrNone; } }
118
Demand Paging on Symbian
5.5 Implementing File Clamping To implement support for file clamping in other file systems, the file system mount class (derived from CMountCB) must implement the MFileAccessor interface. This requires that: • A call to GetInterface() with CMountCB::EFileAccessor updates the fileAccessor argument to point to the mount class. • A call to GetFileUniqueId() updates the aUniqueId argument to a valid, file-specific identifier. Here are examples for the ROFS: TInt CRofsMountCB::GetInterface(TInt aInterfaceId, TInt CRofsMountCB::GetInterface(TInt aInterfaceId, TAny*& aInterface, TAny* aInput) TAny*& aInterface, TAny* aInput) { { TInt r= KErrNone; TInt r= KErrNone; switch(aInterfaceId) switch(aInterfaceId) { { case (CMountCB::EFileAccessor): case (CMountCB::EFileAccessor): ((CMountCB::MFileAccessor*&) aInterface) = this; ((CMountCB::MFileAccessor*&) aInterface) = this; break; break; ... ... } } TInt CRofsMountCB::GetFileUniqueId(const TDesC& aName, TInt CRofsMountCB::GetFileUniqueId(const TDesC& aName, TInt64& aUniqueId) TInt64& aUniqueId) {// Get unique identifier for the file {// Get unique identifier for the file const TRofsEntry* entry=NULL; const TRofsEntry* entry=NULL; TInt err; TInt err; TRAP(err,iDirectoryCache->FindFileEntryL(aName, entry)); TRAP(err,iDirectoryCache->FindFileEntryL(aName, entry)); if(err!=KErrNone) if(err!=KErrNone) return err; return err; aUniqueId = MAKE_TINT64(0,entry->iFileAddress); aUniqueId = MAKE_TINT64(0,entry->iFileAddress); return KErrNone; return KErrNone; } }
If pseudo clamping is required (file content will not be modified, nor will dismount be attempted, but the kernel is required to load executables from the file system), then a ‘random’ value may be provided for aUniqueId – an example of this is
Enabling Demand Paging on a New Platform
119
provided by the ROM and composite file systems. In addition, the file system implementations of the methods affected by file clamping must check for the existence of file clamps. An example of this in writeable file systems is provided in the FAT code. ROFS provides an example for a read-only file system. Here is the FAT DeleteL() method: void void CFatMountCB::DeleteL(const CFatMountCB::DeleteL(const TDesC& TDesC& aName) aName) { { __PRINT(_L(“CFatMountCB::DeleteL”)); __PRINT(_L(“CFatMountCB::DeleteL”)); CheckStateConsistentL(); CheckStateConsistentL(); CheckWritableL(); CheckWritableL(); TFatDirEntry TFatDirEntry fileEntry; fileEntry; TEntryPos TEntryPos fileEntryPos(RootIndicator(),0); fileEntryPos(RootIndicator(),0); FindEntryStartL(aName,KEntryAttMaskSupported,fileEntry, FindEntryStartL(aName,KEntryAttMaskSupported,fileEntry, fileEntryPos); fileEntryPos); TEntryPos dosEntryPos=fileEntryPos; TEntryPos dosEntryPos=fileEntryPos; TFatDirEntry TFatDirEntry dosEntry=fileEntry; dosEntry=fileEntry; MoveToDosEntryL(dosEntryPos,dosEntry); MoveToDosEntryL(dosEntryPos,dosEntry); if if ((dosEntry.Attributes()&KEntryAttReadOnly) ((dosEntry.Attributes()&KEntryAttReadOnly) || || (dosEntry.Attributes()&KEntryAttDir)) (dosEntry.Attributes()&KEntryAttDir)) User::Leave(KErrAccessDenied); User::Leave(KErrAccessDenied); // // Can Can not not delete delete a a file file if if it it is is clamped clamped CMountCB* basePtr=(CMountCB*)this; CMountCB* basePtr=(CMountCB*)this; TInt TInt startCluster=StartCluster(dosEntry); startCluster=StartCluster(dosEntry); if(basePtr->IsFileClamped(MAKE_TINT64(0,startCluster)) if(basePtr->IsFileClamped(MAKE_TINT64(0,startCluster)) > > 0) 0) User::Leave(KErrInUse); User::Leave(KErrInUse); EraseDirEntryL(fileEntryPos,fileEntry); EraseDirEntryL(fileEntryPos,fileEntry); FAT().FreeClusterListL(StartCluster(dosEntry)); FAT().FreeClusterListL(StartCluster(dosEntry)); FAT().FlushL(); FAT().FlushL(); } }
120
Demand Paging on Symbian
5.6 System-Wide Impact of Demand Paging 5.6.1 Binary Compatibility Impact for All Systems
The demand paging functionality added to Symbian is largely transparent on platforms that do not support it. However, there are some subtle binary-compatibility (BC) breaks that do affect all platforms built on Symbian OS v9.3. These are listed in the following sections, together with their Symbian break request (BR) number. BR1924: Bootstrap changes for demand paging Symbian has modified the data structures used for kernel memory management to support demand paging. This has necessitated changes to the platformindependent part of the Symbian bootstrap to make use of these structures. Also, Symbian now copies just the unpaged part of the core ROM image to RAM, rather than the entire core image. Since these changes are compiled into the platform-specific bootstrap, you will need to rebuild it. BR1982: Kernel-side read from user memory must not occur while holding a mutex As discussed earlier, kernel-side code must not read from paged (user-side) memory while the current thread holds any mutexes. The kernel functions that access user memory have been changed in debug builds to assert some (but not all) of the new restrictions. It is possible (though unlikely) that existing kernel-side code may panic in debug builds if it doesn’t conform to the new restrictions. In release builds, it may intermittently hang. BR1988: Device driver deferred function call (DFC) queue migration When Symbian migrated device drivers to use their own DFC queues, we added a new pure virtual method to the sound driver, and the digitizer must now initialize a new iDfcQ member variable. The required changes are explained in Sections 5.3.8.2 and 5.3.8.3.
Enabling Demand Paging on a New Platform
121
BR1991: USB DFC queue performance improvement Now Symbian device drivers use their own DFC queues, the same is true of USB, which has its own separate DFC queue. (Originally, USB made use of DfcQue0.) Also, the DFC queue initialization has moved from the platform-independent layer to the platform-dependent layer. The required changes are explained in Section 5.3.8.1.
5.6.2 Binary Compatibility Impact for Systems with Demand Paging Switched On
In addition to the BC breaks listed in the previous section, there are other BC issues that are only relevant when demand paging is switched on. I discuss these next. Behavior of file modification operations on paged executables If an executable on non-XIP media is unpaged, the loader loads the entire executable from storage media into RAM when it is first accessed. The kernel does not need to access the executable on the storage media again (unless the executable is unloaded and loaded again). This means that any attempt to modify the executable while it is loaded will succeed. If an executable on non-XIP media is code paged, the loader loads individual pages of the executable from media as they are accessed, one by one. It is important that the partially loaded executable is not modified on media. To prevent this, the executable is ‘clamped’ while any part of it is being used by the paging subsystem. Any attempt to modify the ‘clamped’ executable will fail with KErrInUse. The impact of the above is that any code that modifies RAM-loaded executables may now unexpectedly fail. Since executables are stored in the \sys\bin\ directory,
and only components with the TCB capability can modify or delete files in this directory, this compatibility issue should be limited to a very few components (for example, software install, debuggers and possibly some Java code).
Symbian has introduced a new API called RLoader::Delete() to support the deletion of code-paged executables. This is used by the Symbian software installer instead of RFs::Delete() when uninstalling components. This API should also be used by other components affected by this issue. When using the
122
Demand Paging on Symbian
new function, the system keeps track of any pages that are in use within the executable to be deleted, and only deletes the executable when it is no longer used by the paging subsystem. As a result, disk space may not be released until some time after the call completes. Modification of read-only code areas This issue affects any code that writes into code chunks or an XIP code area. If the target code area is paged, the page will eventually be evicted from the paging cache, causing any code modifications to be forgotten. In practice this problem is very rare, since the DebugSupport API is available to support kernel-side components (such as debuggers) that need to modify code. This DebugSupport API works transparently with demand paging. Potentially, any code that uses Cache::IMB_Range() is affected but there may be other causes. You should modify any source that writes into code chunks or an XIP code area, so that it makes its own copy of the code before writing back into it. For some components, such as FOTA clients, this may not be possible. In these cases, you will have to take application-specific measures to ensure the code you are writing to is not paged. Visibility of paged data for tools This issue affects any tools that make assumptions about the visibility of code segments or an XIP ROM area. Existing stop-mode debuggers with Symbian awareness assume that: • XIP ROM memory is visible at all times after the system has booted. • RAM-loaded code segments are visible at all times. Demand paging makes these code areas intermittently unavailable – if your tool accesses them at such a time, it will be faulted by the MMU. This means that existing stop-mode debuggers (or similar tools) will not work reliably on demandpaged ROMs. Image format changes for tools The XIP ROM image format is now different, since it needs to support a paged area of XIP ROM, which can be byte-pair compressed, as can executable files. Any tools that parse ROM images or dynamically analyze executables will need
Enabling Demand Paging on a New Platform
123
to be changed to support the new format. This also affects any tool that updates the XIP ROM image (such as FOTA clients). It is likely that most tools affected by this are contained within a software development context and so the impact is likely to be limited. Compatibility of installed executables With the introduction of code paging in Symbian OS v9.3, the possibility arises of third parties developing software on an older SDK, which is installed on a demand-paged device, or vice versa. Whether these executables run as expected depends on a number of factors: • Whether the device supports code paging (Symbian OS v9.3+). • Whether the device supports the byte-pair compression format (Symbian OS v9.2+). • What the compression format of the executables is. This will probably be the default compression format used by the SDK build tools, since developers are unlikely to change this. The Symbian build tools compress executables in the ‘deflate’ format by default. • Whether the executables are marked as paged or unpaged. The Symbian build tools automatically compress executables marked as paged in the byte-pair format (unless they are explicitly made uncompressed). • Whether the loader demand-pages executables that are neither marked as paged nor as unpaged. The default Symbian configuration does demand page unmarked executables. Table 4 below lists a number of possible scenarios and their impact. Scenario 1 is the default case if the device manufacturer does not alter the Symbian build tools. Scenarios 2 to 4 look at what happens if the device manufacturer makes different modifications to those. Please note that scenarios 5 to 7 require support in Symbian that does not exist at the time of writing, and these scenarios are included for completeness only.
Demand Paging on Symbian
124
Table 4 Scenario
Impact
1
Unmodified Symbian tools. Device manufacturer build tools compress executables in the ‘deflate’ format by default.
Third-party executables are compatible with pre-Symbian OS v9.2 devices. Executables are not paged by default. The developer has to explicitly mark executables as paged or change the compression format to enable paging. Not all developers will choose to do this, so RAM savings for third-party code will be reduced. The additional RAM usage may affect the performance of other software in the device, including ROM-based software.
2
Device manufacturer build tools com- Most third-party software is autopress executables in the ‘byte-pair’ matically paged. The developer has format by default. to make a conscious effort to disable demand paging. RAM savings are maximized. Disk usage is slightly increased. Executables will not run on pre-Symbian OS v9.2 devices.
3
Device manufacturer build tools leave executables uncompressed by default.
Third-party executables are compatible with pre-Symbian OS v9.2 devices. Most third-party software is automatically paged. Default size of executables on disk increases by ~50%. The developer can explicitly compress executables if required. SIS files will be approximately the same size since they are compressed.
Enabling Demand Paging on a New Platform
4
Hybrid solution. Device manufacturer build tools build ‘deflate’ format executables by default then switch to ‘byte-pair’ at some future point.
Allows compatibility with pre-Symbian OS v9.2 devices up to a certain point in time. At that time, there will be more third-party software that doesn’t support demand paging compared to an early switch.
5
Device manufacturer build tools build Not supported at the time of ‘deflate’ executables by default and writing. Would also have platform the Symbian software installer consecurity implications. verts these to ‘byte-pair’ at installation time on Symbian OS v9.3+ devices.
6
‘Fat’ SIS solution. Device manufactur- Not supported at the time of writing. er tools build both ‘deflate’ and ‘byte- Larger SIS files. pair’ binaries. The MAKESIS tool puts both in the SIS package, and the Symbian software installer selects the appropriate binary at install time.
7
Patch pre-Symbian OS v9.2 devices that don’t support the byte-pair compression format.
Not supported at the time of writing.
125
126
Demand Paging on Symbian
Component Evaluation for Demand Paging
127
6 Component Evaluation for Demand Paging This chapter describes how you might evaluate the impact of demand paging on a component or a group of components, and how to mitigate any negative impact. The processes that are described here are heavily influenced by the Symbian system-wide evaluation that was carried out during the prototype phase of demand paging. You are encouraged to use these ideas in whole or in part for your own components. At a minimum, the paging categories of Symbian-owned components (Section 6.4) must be respected.
6.1 Static Analysis You should analyze several aspects of the component architecture before even considering demand paging. We first consider static analysis techniques, before moving on to dynamic analysis in the next section.
6.1.1 Dependency Information
First, list all the executables belonging to the component under evaluation, together with their static dependencies. For each executable, note its size and pageability (if known). Do the same for all key components that are statically linked to executables in the component. Also note dynamic dependency information – for example, whether the component is a plug-in to a framework or is a framework itself. This information is an important input into subsequent sections. It enables you to identify the key interactions between components and to identify the executables that are likely to be affected most by demand paging.
128
Demand Paging on Symbian
6.1.2 Use-Case Analysis
List all the real-time and performance-critical use cases for the component, including
which individual executables are used and to what extent. ‘Real-time’ here does not mean hard real-time – it also includes use cases that lead to an acceptable user perception of responsiveness. Here are some examples of real-time use cases: • Video playback. Dropped frames are not acceptable. • VoIP phone call. Audio quality must be maintained. • File download over USB. Unbounded packet-response times could cause a catastrophic performance drop. The performance-critical category contains use cases that are benchmarked to measure the overall performance of the component (or the OS as a whole). For instance: • Standard boot time • Application start-up time • Camera image-capture time. This category could also include complex compound use cases such as ‘Receive text message while playing mp3 audio.’ Other use cases may have a benchmarked performance that is important to the component owner but would not be considered real-time or performance critical from a system-wide perspective. For example: ‘time taken to notify user of text message reception.’ In this case, it is unlikely that the user would notice any time delay caused by demand paging because the text message is processed in the background. There will also be some use cases that fall into grey areas and their importance may be somewhat subjective. In these cases, it’s probably appropriate for the component owner to negotiate with the system-wide design authority.
Component Evaluation for Demand Paging
129
6.1.3 IPC Analysis
Another important piece of information is whether paged clients interact with realtime or performance-critical servers – reading paged data may have a negative impact on the performance guarantees of the server. Note down any cases in which a server reads paged data from a client’s address space. Paged data is RAM-loaded code and any read-only XIP data structures – for example, XIP bitmaps, constant descriptors, constant data arrays in code, XIP data files accessed through a pointer or exported DLL data. The list should also include any cases of custom IPC architectures where one thread reads from another thread’s address space, for instance by using RMessage::Read().
6.1.4 Analysis of Components Affected by BC issues
Also list any components affected by the binary compatibility issues mentioned in Section 5.6.2.
6.2 Dynamic Analysis 6.2.1 Functional Equivalence
The most important practical test is whether the test code of the component under evaluation has the same pass rate on a demand-paging ROM as a nondemand-paging ROM. You should choose the most stressed configuration possible – the more stressed the configuration, the greater the confidence in the robustness of the component. You should, at the very least, ensure that all the executables under evaluation are paged. For evaluation purposes, you should limit the maximum paging cache size to simulate OOM behavior. In general, a sensible maximum cache used during evaluation translates into a sensible minimum cache size to use in production. Some experimentation may be needed with various configurations. If the pass rate is not as high as expected, then you need to either fix the code or understand the reasons for the failure. If an immediate fix is not possible, then the demand-paging configuration should be relaxed (either by making more dependent components unpaged or increasing the minimum/maximum size of the paging cache), to establish the point at which functional equivalence is achieved.
130
Demand Paging on Symbian
Note this is a temporary measure for evaluation: eventually all known defects exposed by demand paging should be fixed. Marking a component unpaged to make it more robust should only be done in extenuating circumstances and increasing the minimum paging cache size for this reason is never a good idea.
6.2.2 Performance Data
Your aim here is to characterize the RAM footprint versus performance trade-off for the component – that is, to find the minimum acceptable performance configuration. This allows use-case-specific optimization and gives you an indication of the performance profile as free RAM declines. You can use your existing benchmarking code for the component, or write new code specifically for demand paging. If you don’t have any benchmarking then re-running existing test code with time stamping enabled may provide you with sufficient information. You should run the tests using several demand-paging configurations with different maximum paging cache sizes (to simulate OOM behavior). A higher number of different configurations gives a more accurate picture of the performance impact of demand paging. When you make a graph of maximum paging cache size versus performance, there is often a point at which performance drops off dramatically. This indicates that page thrashing is occurring. Sometimes this is so dramatic that you will need a logarithmic scale on one or both axes to determine the drop-off point. At other times, the performance drop-off is more gradual, indicating that the use case is less sensitive to page faults. Figure 10 gives some example performance data. The performance profiles for two use cases, A and B, are presented in two different ways. The top graph has linear axes and appears to show that performance for use case A drops sharply as the maximum paging cache goes below 96 pages. For use case B, performance drops less sharply at around 128 pages. The bottom graph has a logarithmic Y-axis and shows that the drop-off point for use case A is actually nearer 160 pages. For use case B, no additional information is revealed.
Component Evaluation for Demand Paging
131
132
Demand Paging on Symbian
Figure 10: The change in performance with maximum paging cache size for two use cases. The same data is presented in both graphs, using a linear Y-axis (top) and a logarithmic Y-axis (bottom).
Component Evaluation for Demand Paging
133
6.2.3 Demand-Paging Logs
The logging tools described in Chapter 8 enable the production of demand-paging logs while running component tests. Analysis of these logs may reveal page thrashing behavior and dynamic dependency information. You could try making a key executable unpaged and re-running the tests to give better results. For example, the functional pass rate may be improved or the performance drop-off may occur at a lower cache size and/or be less dramatic.
6.3 Identifying Demand-Paging Problems and Mitigation Techniques Some demand-paging problems identified during evaluation, such as defects, may have a clear solution. Others may require more work and I’ll discuss those next.
6.3.1. Protecting Real-Time and Performance-Critical Code Paths
Ideally, the static analysis (Section 6.1) will make clear the separation between the code paths involved in real-time or performance-critical use cases, and those that are not – for example, the separation of data and control planes in a communications protocol. You must ensure that you protect the former from page faults due to the unbounded and unpredictable cost of servicing the fault. There are two possible strategies for doing this: • Ensure the minimum paging cache size is large enough to accommodate the protected code path and any other paged data required at the same time, for all use cases. • Make the protected code path unpaged. In practice, option one is difficult to achieve because there is usually a compound use case that means more data enters the paging cache than you had budgeted for. For example, you might choose the minimum paging cache size so that it is large enough to accommodate all the code required for audio playback. Then, in testing, you might find that ordinary audio playback works fine, but problems occur when the user navigates the file system at the same time. This is because the code executed when navigating the file system ejects some of the code executed when paying audio from the paging cache, inducing additional page faults. These
134
Demand Paging on Symbian
page faults result in the audio buffers not filling in time and audio playback stutters. In general it is not wise to allow real-time or performance-critical code (such as audio playback) to compete for space in the paging cache with non-critical code (such as navigating the file system). We recommend option two, which has the further advantage that it may well cost less in terms of RAM than option one.
6.3.2 Improving the Component Architecture
Your analysis may also show up deficiencies in the component architecture. For example, there may not be a clear separation between the data plane and control plane, or there may be a monolithic library that contains just a small amount of performance-critical code. In these cases, the ideal solution is to redesign the component appropriately or split the monolithic library into two parts: one paged, one unpaged. Then you only need set the minimum of code to be unpaged. This re-factoring may produce benefits unrelated to demand paging, such as breaking unnecessary static dependencies and making the component easier to maintain. Conversely, demand paging may help you to disguise architectural problems, when it means lazily loading only the code currently in use. A monolithic library with many static dependencies may only need to be partially loaded and some dependencies may not need to be loaded at all. However, demand paging should not be used as a cure for these kinds of problems – there is no substitute for good architecture. Major component redesign may be impractical in the short term. In this case, you may temporarily have to set large parts of the component to be unpaged. This greatly reduces RAM-saving potential.
6.3.3 Building a Candidate Unpaged List
One of the outputs from the static analysis (Section 6.1) should be an initial candidate list of unpaged files. Dynamic analysis (Section 6.2) enables you to build on that by refining the unpaged list. For example, you may find that an executable that you think is performance critical is rarely used in practice. Conversely, a seemingly innocuous library may actually contain a utility function that is heavily used by real-time code.
Component Evaluation for Demand Paging
135
Dynamic analysis may also reveal that some executables are paged in for much of the time, despite not being involved in any real-time or performance-critical use cases. In this case, since the executable is always in RAM, you might gain by making the executable unpaged and reducing the minimum size of the paging cache accordingly. As well as identifying unpaged files within the component, your evaluation may give you data on other dependent components, possibly requiring those components to be unpaged to meet certain performance guarantees. For example, an unpaged real-time server may require that all its third-party plug-ins are also unpaged. It is important that these cross-component requirements are considered from a system-wide perspective. Once you have a candidate unpaged list, how to act upon it is a decision shared between the component owner, the system architects of the platform and the (software) customers of the platform (if any). A good unpaged list should guarantee the robustness and functionality of the system irrespective of how small the paging cache is.
6.3.4 Choosing a Minimum Paging Cache Size
I have previously noted that production devices should not limit the maximum size of the paging cache. You do need to choose a minimum page cache size, however, to guarantee a sensible minimum level of performance. This minimum is heavily dependent on the amount of paged code in the system. A value that is suitable for a device with a simple UI may be far too small for a device with a complex UI, where there is a lot more paged code. It is not practical to find an optimum minimum cache size from static analysis. Device manufacturers will determine it empirically, according to the final contents of the ROM and the performance requirements of the device. One way they might do this is to build a performance profile (as in Section 6.2.2) using a selection of the most code-intensive use cases on the device.
136
Demand Paging on Symbian
6.4 Symbian’s Pageability Categories In Section 6.1.2, I mentioned that Symbian makes a distinction between real-time and performance-critical use cases. This is to help to distinguish between those files that must always be unpaged in all configurations and those that ought to be unpaged in all configurations for the best performance, but don’t have to be. These groups are called the ‘mandatory unpaged’ and ‘recommended unpaged’ lists respectively. The existence of the latter provides some flexibility to customers who wish to save more RAM at the expense of performance. The complete list of categories and their inclusion criteria are described in Sections 6.4.1 to 6.4.6. Symbian now maintains a list of the all files in Symbian and their demand-paging categories, and monitors it to ensure conformance.
6.4.1 Kernel Unpaged
This group defines all the kernel-side files that are implicitly unpaged in all demand paging configurations. No action needs to be taken to make these files unpaged but it is useful to separate them from other classifications for audit purposes.
6.4.2 Mandatory Unpaged
This category consists of the files that are explicitly made unpaged in all demand paging configurations. There are several criteria for being in this category, described in Table 5 on the following page.
Component Evaluation for Demand Paging
137
Table 5 Criterion
Description
Device stability
When the file is paged, the device is unstable. The instability should be fixed by normal coding methods where possible.
Functional equivalence
When the file is paged, there is a functional failure. The failure should be fixed by normal coding methods where possible. Being involved in a real-time use case is a valid reason for being in this category. When the file is paged, its contents are largely present in the paging cache for all use cases. Therefore there is little or no benefit to paging it.
Permanent presence
General performance
When the file is paged, performance for all use cases is degraded due to page thrashing.
Security
It is necessary to prevent the file from paging for security reasons (for example, the file is located in a special area or must be excluded from any integrity checks made while paging in).
Power management
When the file is paged, battery life is reduced unacceptably for all devices.
There is some overlap between these criteria. For instance, it is likely that any file that is made mandatory unpaged due to ‘permanent presence’ will also satisfy ‘general performance’. The list of criteria is not exhaustive and may expand in the future.
6.4.3 Recommended Unpaged
This category is for those files that should be unpaged to sustain or improve existing level of performance for performance-critical use cases, where this term
138
Demand Paging on Symbian
has the same definition as in Section 6.1.2. Note that paging can improve performance in some use cases so there should be evidence that making a file (or group of files) unpaged is better than making it (or them) paged.
6.4.4 Test/Reference Unpaged
In this category, place any test files that you as the component owner feel should be unpaged. This category is also for any reference code (such as a plug-in) where the equivalent production code would be unpaged. You should use the same criteria as in Sections 6.4.2 and 6.4.3, but the level of justification required for being in this category is lower.
6.4.5 Optional Unpaged
This category contains the files that you could set as unpaged to sustain or improve the existing level of performance for any other use cases not covered in Sections 6.4.2 and 6.4.3. By default, this category is not treated any differently to the ‘paged’ category but it is helpful to separate out the files that you might make unpaged if you wanted to improve performance. Within this section, it is helpful to group files by use case in case any customer has strict performance requirements for those use cases.
6.4.6 Paged
This is the ‘catch-all’ category for files that don’t fit into any of the other categories. By default, all files will be paged unless otherwise stated.
Configuring Demand Paging on a Device
139
7 Configuring Demand Paging on a Device This chapter describes how to use and configure demand paging, assuming that the platform already has support for it enabled. I will cover the most sensible ways to switch on demand paging for XIP ROM images and executables.1
7.1 Building a Basic Demand-Paged XIP ROM This section discusses the configuration changes necessary for introducing demand paging to a basic demand-page ROM.
7.1.1 OBY File
At least three OBY file keywords should be present to create a demand-paged XIP ROM and I describe these in the following sections. Only the first is mandatory but the others are usually required in any meaningful setup. It is important these keywords are applied to the core ROM image rather than any ROFS images. You can make sure of this by enclosing the keywords in a block as follows: ROM_IMAGE[0] { { ROM_IMAGE[0] keywords>
In addition to the new keywords, the location of files in the ROM must be considered. I will describe this in Section 7.1.2, following it with an XIP ROM paging example. pagedrom keyword The pagedrom keyword takes no arguments. It instructs ROMBUILD to sort the 1 For further information about the basic usage of demand-paging tools and keywords, please consult the Symbian Developer Library documentation at developer.symbian. org/sfdl
140
Demand Paging on Symbian
contents of the core ROM image so that all the unpaged files appear at the start of the image, followed by all the paged files. This is so that the kernel can copy the unpaged part of the image into RAM during boot, while leaving the rest of the image to be paged into RAM on demand (see Figure 2 on page 11). If the keyword compress is also specified, then the paged part of the image will be compressed using the byte-pair algorithm. The unpaged part of the image remains uncompressed. Contrast this with the behavior of compress when pagedrom is not specified. In that case, the entire image is compressed using the deflate algorithm. pagingoverride keyword The pagingoverride keyword determines the pageability of the executables in the ROM/ROFS section it is defined in. It is operated on by ROMBUILD/ROFSBUILD and takes a single argument, which can be one of those shown in Table 6: Table 6 Argument
Effect
nopaging nonaging
Marks all executables unpaged, irrespective of whether they are already marked as paged or unpaged in their MMP file.
alwayspage
Marks all executables paged, irrespective of whether they are already marked as paged or unpaged in their MMP file. This can be useful for debugging or analysis.
defaultunpaged
All executables that are neither marked as paged or unpaged in their MMP file are marked as unpaged.
defaultpaged
All executables that are neither marked as paged or unpaged in their MMP file are marked as paged.
The default value of the keyword is nopaging, so it is important to specify a different value if paging of executables is required. The most common value is defaultpaged, because it is usually best to page all executables except those that are marked otherwise. Note this keyword has no effect on the pageability of non-executable files in an XIP ROM. These will always be paged unless explicitly configured to be unpaged (see Section 7.3).
Configuring Demand Paging on a Device
141
demandpagingconfig keyword The demandpagingconfig keyword takes the following arguments in the order shown in Table 7: Table 7 Argument
<MinLivePages>
<MaxLivePages>
Effect
This is the minimum number of RAM pages to reserve for the paging subsystem. The number must be at least equal to 2*(YoungOldPageRatio+1). If a smaller number is specified, a number equal to this formula is used instead. If zero is specified or the demandpagingconfig keyword is missing, then a value of 256 is used. The maximum number of RAM pages the paging subsystem may use. The number must be greater than or equal to MinLivePages. If zero is specified or the demandpagingconfig keyword is missing, then the system uses the maximum possible value (1,048,575). On a production system, it should always be set to zero, so that as many pages as possible are used. Low values may be used to test paging under more stressed conditions. The ratio of young to old pages maintained by the paging subsystem. This is used to maintain the relative sizes of the two live lists. The default value is three.
Some demandpagingconfig statements specify additional arguments to those specified in Table 7, but these are now obsolete and will be ignored by the paging subsystem.
7.1.2 Arranging the Core/ROFS for XIP ROM Paging
As we have seen in Section 3.3, a typical NAND ROM has a small core ROM image and a large primary ROFS image. To make best use of XIP ROM paging, you need to move any files that should be paged from the ROFS image to the core image. In Section 7.4 I explain more sophisticated ways of doing this, but the simplest method is to remove the ROFS altogether so that all files are in the core ROM image. Each platform can have its own way of configuring the primary ROFS, so there is
142
Demand Paging on Symbian
no generic way of removing it. On Symbian’s reference platform, you would configure out the ‘ROM_IMAGE[1] {’ statement in \epoc32\rom\include\base. iby, which defines the start of the ROFS section. Basic XIP ROM paging example Using the instructions in Section 7.1.1 with typical keyword arguments, a simple demand-paged OBY file might look like this: // MyDPConfig.oby // MyDPConfig.oby #if !defined PAGED_ROM #define PAGED_ROM #if !defined PAGED_ROM #endif #define PAGED_ROM #endif ROM_IMAGE[0] { pagedrom ROM_IMAGE[0] { compress pagedrom // Min Max Young/Old compress // Live Live Page // Min Max Young/Old // Pages Pages Ratio // Live Live Page demandpagingconfig 512 32767 3 // Pages Pages Ratio pagingoverride defaultpaged demandpagingconfig 512 32767 3 } pagingoverride defaultpaged }
If the OBY file that defines the start of the primary ROFS image (for example, base.iby) contains a section like this: #if defined(_NAND) || defined(_NAND2) REM Start of ROFS image ROM_IMAGE[1] { #endif
then it should be adjusted as follows (changes shown in bold italics): #if #if #if #if
defined(_NAND) || defined(_NAND2) defined(_NAND) || defined(_NAND2) !defined PAGED_ROM !defined PAGED_ROM
REM Start of REM Start of ROM_IMAGE[1] ROM_IMAGE[1]
ROFS image ROFS image { {
Configuring Demand Paging on a Device
143
#endif #endif #endif #endif
To build a demand-paged ROM, simply include the new OBY file in the buildrom statement. For example, when using the Symbian Techview reference platform on H4: buildrom –D_NAND2 MyDPConfig h4hrp techview
It is important that MyDPConfig.oby appears before base.iby in the buildrom command to ensure that the PAGED_ROM flag is acted upon; base.iby is included by techview.oby in the example above. Alternatively, the flag can be defined via ‘–DPAGED_ROM’ in the buildrom command and then the ordering of OBY files is irrelevant. It is possible that you will still end up with a small ROFS image being produced if one of the included OBY/IBY files explicitly places files in the primary ROFS section. However, most or all of the ROM contents will be in the core image.
7.2 Building a Basic Code-Paged ROM When you are building a code-paged ROM, you can choose whether to enable XIP ROM paging or not. The pagedrom keyword only affects XIP ROM paging, but the demandpagingconfig keyword applies to code paging as well because both types of demand paging use the same underlying paging cache. There are two further things to consider, the pagingpolicy keyword and the location of executables to be code paged (for example, in ROFS or in the user data area).
7.2.1 pagingpolicy Keyword
The pagingpolicy keyword takes a single argument, which can be one of the values specified in the pagingoverride table in Section 7.1.1.2. It sets a flag in the core ROM that tells the loader, at runtime, what the code paging policy should be. The default is nopaging so you need to change this for any code paging to occur.
144
Demand Paging on Symbian
Note the difference between this keyword, which operates on files at runtime, and pagingoverride, which operates at rombuild/rofsbuild time. Note also that pagingpolicy only has any meaning in the core ROM image, so it should be directed to ROMBUILD, not ROFSBUILD. You can ensure that this is the case by enclosing the keyword in a ‘ROM_IMAGE[0] {}’ block.
7.2.2 Code Paging from ROFS
You will usually want to favor XIP ROM paging over code paging since the RAM and performance overhead of code paging is generally higher. However, there may be circumstances in which it is necessary to code page executables in ROFS. For example: • For testing purposes. To compare the functional equivalence and performance of a code-paged system with an XIP-ROM-paged or non-demandpaged system • To reduce the size of the unpaged part of the core ROM image. If a paged executable has a number of unpaged static dependencies, there may be a RAM benefit to code paging the executable and placing its dependencies in ROFS. This would be the case if the dependencies are not statically linked to by other executables in the core image. The easiest way to use code paging instead of XIP ROM paging is to ensure paged executables are placed in a ROFS partition that supports code paging, instead of in the core ROM image. This may involve reversing the decision to move the contents of the ROFS to the core, as required for XIP ROM paging. Furthermore, you need to add a pagingoverride keyword for each ROFS partition that you want to code page from. Use a ‘ROM_IMAGE[<partition>] {}’ block to distinguish each pagingoverride statement. As well as ensuring that executables are correctly marked as paged or unpaged (rather than making everything unpaged), this keyword implicitly ensures that executables are compressed in the byte-pair format (if compression is specified). This may involve decompressing executables from the default deflate format and recompressing them it into the byte-pair format during the ROFSBUILD process.
Configuring Demand Paging on a Device
145
7.2.3 Code Paging from Internal Writeable Media
If you only want to code page from internal writeable media, such as the user data drive, then no change is needed to the location of executables in ROM/ ROFS. However, simply specifying a pagingpolicy is not enough. Even if the value given is defaultpaged, the default behavior of the Symbian build tools is to compress executables using the deflate algorithm. If these executables are installed into the \sys\bin\ directory and are still deflate compressed, they will not be code paged. You need to do one of the following for an executable to be code paged from internal writeable media: 1. In the MMP file of the executable, explicitly specify the paged keyword. This will implicitly ensure the executable is byte-pair compressed (or uncompressed if compression is disabled). 2. Explicitly convert the executable to the byte-pair or uncompressed format. For example, use the elftran command as follows: 3. elftran -compressionmethod bytepair \epoc32\release\ armv5\urel\mylibrary.dll. 4. Modify the build tools to compress executables in the byte-pair format by default. 5. Modify the build tools to uncompress executables by default. All these options except option four have BC implications (see Section 5.6.2 for the BC impact on installed executables).
7.2.4 Basic Code-Paging Example
Using the instructions in Section 7.1.1, a typical OBY file that uses code paging from ROFS might look as follows (changes from the XIP ROM paging example in Section 7.1.2.1 are in bold italics): // MyDPConfig.oby MyDPConfig.oby // #if !defined PAGED_ROM PAGED_ROM #if !defined #define PAGED_ROM #define PAGED_ROM #endif #endif
146
Demand Paging on Symbian
#if #if !defined !defined USE_CODE_PAGING USE_CODE_PAGING // // Uncomment Uncomment next next line line if if code code paging paging is is wanted wanted #define #define USE_CODE_PAGING USE_CODE_PAGING #endif #endif #if #if !defined !defined CODE_PAGING_FROM_ROFS CODE_PAGING_FROM_ROFS // // Uncomment Uncomment next next line line if if code code paging paging from from primary primary rofs rofs is is wanted wanted #define #define CODE_PAGING_FROM_ROFS CODE_PAGING_FROM_ROFS #endif #endif ROM_IMAGE[0] ROM_IMAGE[0] {{ pagedrom pagedrom compress compress // Max Young/Old // Min Min Max Young/Old // Live Page // Live Live Live Page // Pages Ratio // Pages Pages Pages Ratio demandpagingconfig 512 demandpagingconfig 256 256 512 pagingoverride pagingoverride defaultpaged defaultpaged
33
#if #if defined defined USE_CODE_PAGING USE_CODE_PAGING pagingpolicy pagingpolicy defaultpaged defaultpaged #endif #endif }} #if #if defined defined CODE_PAGING_FROM_ROFS CODE_PAGING_FROM_ROFS ROM_IMAGE[1] ROM_IMAGE[1] {{ pagingoverride pagingoverride defaultpaged defaultpaged }} #endif #endif
You would also adjust the OBY file that determined the start of the primary ROFS partition (for example, base.iby) like this: #if #if defined(_NAND) defined(_NAND) || || defined(_NAND2) defined(_NAND2) #if !defined PAGED_ROM #if !defined PAGED_ROM || || defined defined CODE_PAGING_FROM_ROFS CODE_PAGING_FROM_ROFS REM Start of ROFS image REM Start of ROFS image ROM_IMAGE[1] ROM_IMAGE[1] { { #endif #endif #endif #endif
Configuring Demand Paging on a Device
147
No change to the buildrom command is required if MyDPConfig.oby appears before base.iby. If this is not the case, then USE_CODE_PAGING and CODE_ PAGING_FROM_ROFS must be defined on the command line (as well as PAGED_ ROM). For example, for Techview on H4: buildrom buildrom –D_NAND2 –D_NAND2 –DPAGED_ROM –DPAGED_ROM –DUSE_CODE_PAGING –DUSE_CODE_PAGING –DCODE_PAGING_ –DCODE_PAGING_ FROM_ROFS FROM_ROFS h4hrp h4hrp techview techview MyDPConfig MyDPConfig
7.3 Fine-Grained Configuration In the previous two sections, I explained how to create demand-paged ROMs and configure the general paging behavior (that is, the default paging policy and size of the paging cache). This section describes how to configure whether individual files are paged or not.
7.3.1 MMP File Configuration
Symbian has extended the MMP file syntax to add two new keywords, unpaged and paged, which mark the executable as unpaged or paged respectively. Each keyword should appear on a line of its own and takes no arguments. If the build tools are configured to compress executables while building (the default behavior), then executables marked with the paged keyword will be bytepair-compressed. If a ROM/ROFS partition has defined a pagingoverride of defaultpaged or defaultunpaged, then the pageability indicated in the MMP file will be respected when the executable is placed in ROM/ROFS. If the pagingoverride is nopaging or alwayspage, then this keyword will take precedence, and the executable will be placed in ROM/ROFS as unpaged or paged respectively, and the pageability indicated in the MMP file will be ignored.
7.3.2 OBY File Configuration
Symbian has extended the OBY file syntax to include two new modifiers, unpaged and paged, which mark an object as unpaged or paged during the rom
148
Demand Paging on Symbian
build/rofsbuild process. The modifiers should appear at the end of an OBY statement like this: file=ABI_DIR\DEBUG_DIR\MyLibrary.dll \sys\bin\MyLibrary.dll unpaged
file=ABI_DIR\DEBUG_DIR\MyLibrary.dll \sys\bin\MyLibrary.dll unpaged
You should only use these modifiers for user-side ROM objects (such as ‘file=’, ‘dll=’, ‘data=’ and ‘secondary=’ statements). Kernel-side ROM objects (‘primary=’, ‘extension=’, ‘variant=’ and ‘device=’ statements) are always unpaged so any modifier will be ignored. Furthermore, the modifier will be ignored for ‘data=’ statements if the object is in a ROFS partition; the pageability of non-executable files is only relevant in an XIP ROM image (see Section 3.4). If ROM/ROFS compression is switched on, objects marked as paged will be bytepair-compressed automatically. The OBY file pageability modifier overrides any pageability defined in the executable’s MMP file. However, it does not override the pagingoverride statement in the case of nopaging or alwayspage.
7.3.3 Central Configuration Via the Configpaging Tool
You may find it difficult to manage the configuration of the pageability of files using the methods discussed in Section 7.1.1 if you need a different pageability for the same file on different devices. If each MMP or OBY/IBY file has two possible configurations and there are many such files in ROM, then maintaining the overall configuration could become very expensive. When you need a more ‘fluid’ configuration, it is better to use the configpaging tool as a single place to manage the paging configuration of many files. This optional tool runs during the OBY pre-processing phase, before ROMBUILD and ROFSBUILD are executed. You provide it with a centralized list of paged and unpaged files, and the tool uses this to add paged/unpaged modifiers to individual statements in the intermediate OBY files. This centralized list overrides any individually specified paged/unpaged modifiers and any pageability defined in individual MMP files. However, the list will not override the pagingoverride statement in the case of nopaging or alwayspage.
Configuring Demand Paging on a Device
149
There are two approaches for invoking the configpaging tool: 1. By using the externaltool statement in an input OBY file: externalt ool=configpaging[:] 2. By using the ‘-e’ buildrom command line option: buildrom -econfig paging[:] . where is a text file in the directory \epoc32\rom\configpaging\. The default file used is configpaging.cfg. The syntax of is as shown in Table 8. Table 8 Keyword
Effect
defaultpaged
Sets all unspecified files to be paged. Note this will effectively override any pagingoverride statement in the case of defaultpaged or defaultunpaged because all ROM files will then contain a pageability modifier.
defaultunpaged
Sets all unspecified files to be unpaged. Note this will effectively override any pagingoverride statement in the case of defaultpaged or defaultunpaged because all ROM files will then contain a pageability modifier.
unpaged or unpaged: paged or paged:
Sets the file(s) specified by the regular expression in to unpaged.
Set the file(s) specified by the regular expression in to paged.
150
include “”
Demand Paging on Symbian
Includes another file in \epoc32\rom\configpaging\ specified by . Included files will be processed before the remaining lines of the parent file. Included files can themselves include other files.
7.4 Optimizing the Configuration The primary purpose of demand paging is to save RAM, so optimization in this context usually means ‘how to save the most RAM’. In a given ROM, there are a number of configurable variables that may affect this: The amount of code and data that is marked as paged or unpaged The size of the paging cache The ratio of young pages to old pages in the paging cache Whether ROM or executable compression is switched on and which algorithm is used • Whether XIP ROM paging and/or code paging is used • Whether files are located in the core ROM image or the primary ROFS – also known as the core/ROFS split.
• • • •
The first four variables are essentially RAM/performance/disk usage trade-off decisions and were discussed in Section 3.11. Variable five is relatively easy to decide on: in general, the performance and RAM saving of XIP ROM paging is better than code paging so you should use XIP ROM paging where possible. However, code paging can give extra RAM savings, so you should enable this too, if possible. Variable six is interesting, because if you get this right it can give you greater RAM savings with little or no performance trade-off. So it is worth investing some effort in this, and I discuss it further in the rest of this section. The simplest core/ROFS split is to put everything in the core image with an empty ROFS – this is the optimum split if you want to page as much code as possible. However, it is suboptimal when there is a significant amount of unpaged code. In fact, if the amount of unpaged code plus minimum paging cache size approaches the size of code that needs to be loaded in a non-demand-paged ROM, there may be a net RAM loss (see Section 3.14).
Configuring Demand Paging on a Device
151
A good heuristic to follow is to try to arrange as much paged data as possible in the core ROM image (where it can be XIP ROM paged), while leaving as much unpaged code as possible in the primary ROFS (where it can be loaded as required – because any unpaged code in the core ROM permanently occupies RAM). This heuristic is complicated by the fact that all static dependencies of code in the core ROM image must also be in the core, even if those dependencies are rarely used. The RAM saved by having a paged executable in the core ROM might be offset by the overhead of having the unpaged dependencies of that executable in the core. You would usually put unpaged files in the ROFS to avoid the overhead of them permanently occupying RAM by being in the core ROM. You might also put some paged code in the ROFS if it means its unpaged dependencies can also be in ROFS, rather than placing the paged code in the core and being forced to place all the unpaged dependencies in the core. The RAM saving of paging additional code in the core can be outweighed by the RAM overhead of a bigger unpaged core. There are various strategies for dealing with this issue, and I discuss these in the following sections.
7.4.1 efficient_rom_paging.pm
You can invoke this optional tool, which runs during the buildrom phase between configpaging and execution of rombuild/rofsbuild. It searches for any paged files in the intermediate ROFS OBY file and moves these files to the core ROM OBY file, together with their static dependencies (both paged and unpaged). This ensures that as much paged code as possible is in the final core ROM image, making best use of XIP ROM paging. However, the tool makes no effort to limit the amount of unpaged code in the core ROM image. You invoke the tool in one of two ways: 1. By use of the ‘externaltool=’ OBY file syntax. The following statement should be used: externaltool=efficient_rom_paging 2. By using the ‘-e’ buildrom command line option. For instance: 3. buildrom –efficient_rom_paging .
152
Demand Paging on Symbian
7.4.2 Limiting Unpaged Code in the Core ROM Image
You can limit unpaged code in the core ROM image – either using regexps in the config file, or by writing a tool (perl script) similar to efficient_rom_paging, which only allows a ‘privileged’ set of unpaged executables to exist in the core image. Only paged executables with dependencies that are in this privileged set (or paged) would be allowed in the core image. Other executables, both paged and unpaged, would be placed in the primary ROFS. The difficulty here is choosing the privileged set. If the set is too small, then many paged executables will have to be in the primary ROFS, because they have nonprivileged, unpaged dependencies. If the set is too large, then the configuration would be much the same as with efficient_rom_paging. An ideal set would be one that contains the unpaged executables that are always loaded, plus those that have a significant amount of paged code dependent upon them.
7.5 Other Demand-Paged ROM Building Features Since the introduction of demand paging, you can pass a ‘-geninc’ switch to the buildrom command or to ROMBUILD to output additional ROM building information. When this switch is used, the tools create a file called .inc in the ROM building directory. The file has the following format: #define SYMBIAN_ROM_UNPAGED_SIZE unpaged_size SYMBIAN_ROM_UNPAGED_SIZE unpaged_size #define SYMBIAN_ROM_PAGED_SIZE paged_size_in #define SYMBIAN_ROM_PAGED_SIZE paged_size_in REM Start of ROFS image
where unpaged_size is the hexadecimal uncompressed size of the unpaged part of the core image and also defines the offset to the start of the paged part of the core image. In non-demand-paged ROMs, this is the uncompressed size of the whole core image. paged_size is the hexadecimal uncompressed size of the paged part of the core image. In non-demand-paged ROMs, this is zero. You can use this information in your custom ROM building tools that wrap Symbian’s ROM building tools.
Configuring Demand Paging on a Device
153
7.6 Using the Symbian Reference Configurations Symbian defines three standard demand-paging configurations that are suitable for a reference environment. Two of the configurations are based on the mandatory and recommended unpaged lists mentioned in Section 6.4. These can be used as the basis for a customer demand-paging configuration, with the caveats defined in Section 7.6.4.
7.6.1 The Default Demand-Paging Configuration
The first reference configuration is in \epoc32\rom\include\pagedrom.oby – this is the default demand-paging configuration. It looks similar to the example OBY file in Section 7.2.4 but with the following differences: 1. The pagedrom and compress keywords are not defined, because these are already defined in the reference board IBY files (such as \epoc32\ rom\include\base_h4hrp.iby for H4). 2. CODE_PAGING_FROM_ROFS is disabled. 3. configpaging.pm (see Section 7.3) is used to include the recommended unpaged list of components. The mandatory unpaged components are configured via their MMP files, so they do not need to be configured centrally. 4. efficient_rom_paging.pm (see Section 7.4.1) is used. 5. Different demandpagingconfig parameters are used. To build a reference ‘Techview’ ROM using the default configuration, simply add the pagedrom parameter to any NAND Techview buildrom command line, like this: buildrom –D_NAND2 h4hrp techview buildrom –D_NAND2pagedrom pagedrom h4hrp techview
It is important that pagedrom appears before techview so that the flags defined in pagedrom.oby are parsed before base.iby, which is included by techview.iby.
154
Demand Paging on Symbian
The default configuration provides a generous paging cache. The aim of this is to provide a modest RAM saving compared with a non-demand-paged NAND Techview ROM, while maintaining performance for all performance-critical use cases. Performance for some use cases is actually improved.
7.6.2 The Functional Demand-Paging Configuration
This configuration appears in \epoc32\rom\include\pagedrom_functional.oby. It differs from the default configuration in the following ways: 1. configpaging.pm is not used. Only the mandatory unpaged components are unpaged, and these are configured via their MMP files. No central configuration is required. 2. A more restrictive demandpagingconfig is used. You can build a ROM using this configuration in the same way as for the default configuration, but using pagedrom_functional in place of pagedrom. The purpose of the functional configuration is to provide a more aggressive paging environment, while maintaining functional equivalence with a non-demand-paged NAND Techview ROM. As a result, performance is worse for most use cases but there are significant RAM savings.
7.6.3 The Stressed Demand-Paging Configuration
This configuration appears in \epoc32\rom\pagedrom_stressed.oby. It differs from the default configuration in the following ways: • configpaging.pm is used with an alternative configuration file that pages as many files in the system as possible, overriding any mandatory unpaged components. • A more restrictive demandpagingconfig is used. The purpose of this configuration is to provide an extremely aggressive paging environment for stress testing. Although Techview ROMs using this configuration are functional at a basic level, functional equivalence with a non-demand-paged NAND Techview ROM is not guaranteed and performance is much worse. RAM savings are maximized.
Configuring Demand Paging on a Device
155
7.6.4 Defining Custom Demand-Paging Configurations
The configurations defined in Sections 7.6.1 to 7.6.3 are only suitable for Techview ROMs. Developers using demand paging are free to define their own configurations, either from scratch using the information in the earlier parts of this document, or by basing them on one of the Symbian-provided configurations. Here are some things to bear in mind when defining custom configurations: • Symbian only warrants the functionality of the OS when all mandatory unpaged components are unpaged. A configuration that overrides the mandatory unpaged components, like the stressed configuration mentioned above, is not warranted. • Symbian only warrants the performance of Symbian components involved in key use cases (see Section 6.1.2) when all mandatory and recommended unpaged components are unpaged. • The size of the paging cache is dependent on the amount of code loaded during the most extreme use case. Simple UI platforms (such as Techview) need a smaller cache size, whereas larger ones (such as S60) need a larger cache. • On platforms that add significant additional code to Symbian, those additional components should be evaluated (perhaps using the guidelines in Section 6) to see if there are any further unpaged components that should be added to the mandatory and recommended unpaged lists. • The minimum paging cache size should be large enough so that when you set the maximum cache size to the same value as the minimum (in testing), then the functional equivalence and robustness of the platform is maintained, and performance is at an acceptable minimum level. Lower values may cause stability problems when the device is low on free RAM. • On production devices, the maximum paging cache size should be set to the maximum possible value to minimize the number of page faults. The configurations provided by Symbian place a somewhat low upper limit on the paging cache size to induce additional page faults for testing purposes.
156
Demand Paging on Symbian
Testing and Debugging in a Demand-Paged Environment
157
8 Testing and Debugging in a Demand-Paged Environment This chapter discusses tracing in a demand-paged environment, using BTrace and the DPTest API. It also describes stop-mode hardware debugging and the potential issues that demand paging may cause or expose during testing, including strategies you can use to expose such problems.
8.1 Tracing and Debugging with Demand Paging 8.1.1 Demand-Paged BTrace Logging
Symbian provides a binary trace-logging framework called BTrace. BTrace logs are not human-readable but they are compact and in an ideal format for postprocessing by an analysis tool. BTrace has sub-categories to allow tracing of different functional areas in the system. The one for the kernel paging subsystem is EPaging, and for the media paging subsystem, EPagingMedia. Tracing the former should provide enough information for most purposes. The latter is probably only useful for detailed analysis of the paging implementation itself. The EThreadIdentification subcategory should also be enabled to receive useful context information for paging events. If you include the BTRACE.EXE console application in an OBY file called MyDPBTrace.oby, then you can create a basic demand-paged Techview ROM with tracing enabled using the following command: buildrom –D_NAND2 –DBTRACE pagedrom h4hrp techview MyDPBbuildrom –D_NAND2 –DBTRACE pagedrom h4hrp techview MyDPBTrace Trace
158
Demand Paging on Symbian
To enable demand-paging BTrace logging in the kernel’s paging code while running a test program RUNNYCHEESE.EXE, you should use the following commands from a command prompt on the device: btrace btrace –f3,10 –f3,10 –m1 –m1 –b1024 –b1024 RUNNYCHEESE.EXE RUNNYCHEESE.EXE btrace btrace d:\MyDPLog.txt d:\MyDPLog.txt
The first line does the following things: • Enables the EThreadIdentification and EPaging sub-categories (3 and 10 respectively) • Sets the trace buffer to be enabled but not in ‘free running’ mode • Sets the trace buffer size to 1024 KB. The second line executes RUNNYCHEESE.EXE and the third line dumps the BTrace log to d:\MyDPLog.txt. Sometimes you may need to enable demand-paging tracing immediately after boot. In this case, you’ll need to enable tracing in the OBY file. To do this using the same BTrace parameters as the above example, MyDPBTrace.oby should look as follows: // // MyDPBTrace.oby MyDPBTrace.oby file=ABI_DIR\DEBUG_DIR\btrace.exe file=ABI_DIR\DEBUG_DIR\btrace.exe \sys\bin\btrace.exe \sys\bin\btrace.exe ROM_IMAGE[0] ROM_IMAGE[0] { { // // Set Set the the Btrace Btrace flag flag (EThreadIdentification (EThreadIdentification = = 3) 3) + + (EPaging (EPaging = = 10) 10) BTrace BTrace 1032 1032 // // Set Set the the trace trace mode mode (enabled/not (enabled/not free free running) running) BTraceMode BTraceMode 1 1 // // Set Set the the buffer buffer size size BTraceBuffer BTraceBuffer 1024000 1024000 } }
Then you can dump the BTrace log from a command prompt in the same way as before:
Testing and Debugging in a Demand-Paged Environment
159
btrace d:\MyDPLog.txt
8.1.2 Demand-Paged Kernel Trace Logging
The older, less sophisticated kernel tracing method can also obtain the BTrace logging information mentioned above. These log events are human readable, but they are not as well defined as those in BTrace, so this approach may not be suitable for the post-processing of events. Also, the trace output is very verbose and so will have a significant performance impact on the system. You should only use it if fast tracing hardware is available. However, this approach does report additional events, especially during the boot sequence. This may be of use if you are debugging a new hardware platform with demand paging enabled. The kernel trace flags for the paging subsystem and the paging media subsystem are bits 62 and 61 respectively. To enable both these flags, adjust the kerneltrace keyword in the relevant OBY file as follows: kerneltrace 0x00000000 0x60000000
The flags should be OR’d with any other relevant kernel trace flags. You must use a debug version of the kernel (kernel trace logging is not enabled in release versions).
8.1.3 DPTest API
The DPTest API is documented in the Symbian Developer Library documentation, developer.symbian.org/sfdl, so I will not cover it in detail here. It consists of a number of static functions declared in \epoc32\include\dptest.h and is implemented in dptest.dll. Its API classification tag is @Test, meaning it shouldn’t be used in any production device. Essentially, this API allows the caller to retrieve information about which demand paging attributes are enabled, how many page faults and page ins have taken place, the paging cache size parameters and the current cache size. The API also allows the executable it runs in to flush the cache and change the cache size, so long as that executable has the WriteDeviceData platform security capability.
160
Demand Paging on Symbian
Symbian also provides a console program called dptestcons.exe that exercises the DPTest API. This can be easily included in a ROM by adding dptestcons.oby to the buildrom command: buildrom –D_NAND2 pagedrom h4hrp techview dptestcons buildrom –D_NAND2 pagedrom h4hrp techview dptestcons
For usage information, simply run dptestcons from an eshell instance with no parameters.
8.1.4 Stop-Mode Hardware Debugging
Stop-mode debugging of a demand-paged ROM (for example, using a Lauterbach via the JTAG interface) is supported, just as it is for non-demand-paged ROMs. However, you should disable the trapping of data and program aborts, since that is how the paging subsystem operates. If you don’t, execution will break on every page fault. On Trace32, you can do this in either of these ways: 1. From the menu, select Break -> OnChip Trigger... 2. A dialog will appear. Uncheck DABORT and PABORT from the ‘Set’ group of checkboxes. 3. Use the following Trace32 commands at the B:: prompt: TrOnChip.Set DABORT off 4. TrOnChip.Set PABORT off. These commands can be added to a script.
8.2 Testing In this section, I discuss the potential issues that demand paging may cause or expose during testing, and strategies you can use to expose such problems. The unpredictable nature of demand paging makes it very difficult to anticipate
Testing and Debugging in a Demand-Paged Environment
161
problems in the system. However, there are some patterns to the kinds of problems that are likely to be observed. I describe some of these and their possible solutions in the next sections.
8.2.1 Testing Approach
When demand-paging support is added to a platform, you will have to test at least one additional ROM configuration – a new demand-paged ROM as well as the existing non-demand-paged ROM. If you are supporting several different demand-paging configurations, then your testing burden will be increased further – in fact, this may introduce an intolerable test overhead on a project. In this section, I discuss strategies for maximizing the test benefit while minimizing the overhead. Demand-paged versus non-paged ROMs In a platform that supports both demand-paged and non-demand-paged configurations, the simplest test strategy is to duplicate all testing on both configurations. This may not be feasible, so it is worth comparing the configurations to see whether any duplication of effort can be removed. Functional differences Demand-paged ROMs contain code paths that are not executed in non-demandpaged ROMs – but is the converse true? Is there functionality in non-demandpaged ROMs that is no longer required (and hence not executed) in a demandpaged environment? If this were true, there would be no option but to test both demand-paged and non-demand-paged ROMs, because neither of them is a subset of the other. To answer that question, we need to look again at the NAND flash layouts in Figures 1 and 3 in chapter 3. Figure 1, layout B is a typical non-demand-paged NAND layout. When comparing this with Figure 3, layout C or D, it can be seen that all the elements of 1B are also present in 3C and 3D. There is a permanently RAM-shadowed area in 1B (the core image), which corresponds to the unpaged part of the core image in 3C and 3D. In all three layouts, there is a ROFS section, where whole executables are loaded into RAM as required. There is also a user data area in all three layouts. Assuming a typical demand-paged ROM layout like 3C or 3D, it is safe to say that
162
Demand Paging on Symbian
comprehensive functional testing of a demand-paged ROM will also exercise all code paths of an equivalent non-demand-paged ROM. Performance differences In Section 2, I discussed the trade-off between RAM and performance in demand-paged ROMs. We’ve seen that for some configurations it is difficult to predict whether a particular use case will be quicker on a demand-paged or a non-demand-paged ROM. However, we can choose a sufficiently aggressive demand-paged configuration such that all use cases run more slowly than on a non-demand-paged ROM. Analysis of demand-paged defects found thus far shows that those exposed by timing differences are only reproducible when the use case runs slower than on a non-demand-paged ROM. Furthermore, some defects are only reproducible on aggressive demand-paged configurations but not on less aggressive configurations (or non-demand-paged ROMs). There have been no cases of defect reproducibility increasing as the configuration is made less aggressive. So, assuming you make the demand-paged ROM configuration aggressive enough, there is no need for you to test for timing-related problems on an equivalent non-demand-paged ROM. RAM differences The primary purpose of demand paging is to save RAM, so any sensible demand-paged configuration will result in more free RAM than an equivalent nondemand-paged ROM. It is important that you continue to run any use cases and tests related to out-of-memory conditions in demand-paged ROMs, where outof-memory conditions are harder to reproduce. You can reproduce the behavior of the paging cache in out-of-memory conditions by limiting the maximum paging cache, but this does not limit other system memory allocations. In theory, some out-of-memory-related defects on a non-demand-paged ROM may not be reproducible on a demand-paged ROM. This is one of the benefits of paging, but it means that test code must be written well enough to exercise out-of-memory conditions if only demand-paged ROMs are tested.
Testing and Debugging in a Demand-Paged Environment
163
Functional testing We know that the more aggressive the demand-paged configuration, the greater the chances of reproducing problems such as those discussed in Section 8.2.1 and that Symbian only warrants configurations in which all the Symbianmandatory unpaged components are unpaged. So, a good configuration to use for functional testing would be one that only has these components unpaged (plus any additional non-Symbian components that fit the same criteria), together with a small maximum paging cache size. However, you should not choose a maximumpaging cache size so small that the time taken to execute the tests is unreasonably long. The Symbian functional configuration (see Section 7.6.2) fulfils these requirements for the Techview reference environment. Performance testing The relatively aggressive configuration used for functional testing may not be suitable for performance testing. Some tests may have to complete within a certain time interval to pass. So, to test performance, you may need to mark additional components as unpaged, and/or choose a larger paging cache. However, it is still sensible to limit the maximum paging cache size to reproduce out-ofmemory behavior. Remember: Symbian only warrants the performance of performance-critical use cases when the Symbian-recommended unpaged components are unpaged. We also recommend that any non-Symbian components involved in performancecritical use cases are unpaged for performance testing. The Symbian default configuration (see Section 7.6.1) matches the above requirements for the Techview reference environment. User testing The configurations that you use for functional or performance testing are not suitable for production devices. At some point, you will need to perform wider system testing with a production demand-paged configuration. Making this should be a simple matter of taking the configuration used for performance testing, changing the minimum paging cache size to the maximum size and changing the maximum size to the maximum possible value. This will means that the paging cache can
164
Demand Paging on Symbian
grow much larger, which means that fewer defects will be reproducible. Testing with this configuration should be delayed until as late as possible in the project, otherwise some problems in out-of-memory situations may be hidden.
In Conclusion
165
9 In Conclusion In this book, I’ve looked at demand paging at many levels - from a high-level overview in Chapters 2 and 3, to an in-depth study of the implementation in Chapter 4. In Chapter 5, I’ve given you a practical hands-on guide to using demand paging yourself, whether you are working with device drivers in a demandpaged system or enabling demand paging in a new device (in which case you’ll also find Chapter 7, on configuring device parameters, very useful). If you’re working at a higher level, then Chapter 6 gives you the nitty-gritty on getting your component ready for demand paging. Finally, in Chapter 8, I look at testing and debugging in a demand-paged environment. Demand paging on Symbian has been a great success. Not only does demand paging increase free RAM, it also speeds device boot and application start-up times, and makes for a much more robust device under low-memory conditions. So successful was demand paging that it has been back ported two operating system generations, to devices that have already been released into the market. I wish you all the best in working with it.
Index A
E
Aging a Page 36 Algorithm 55 Allocating Memory 33
Effective RAM Saving 19 efficient_rom_paging.pm 151 EStart 117
B
F
Binary Compatibility 120 Boot-Loaded 90 BTrace 157 Byte-Pair Compression 54 C Cache Support 52 Candidate Unpaged List 134 Chunk APIs 50 Clamping 63 Code-Paged ROM 143 Code Paging 13, 34 Code Segment Initialization 43 Component Architecture 134 Composite File System 8 Configpaging 148 Core/ROFS 141 Critical Code Paths 133 Custom Demand-Paging Configurations 155 D Data Paging 76 Data Structures 40 DDigitiser 105 Debugger Breakpoint 58 Debugging 157 Default Demand-Paging Configuration 153 defaultpaged 149 defaultunpaged 149 demandpagingconfig 141 Dependency Information 127 Device APIs 58 Device Stability 137 Disconnected Chunk API 51 DPTest API 159 Dynamic Analysis 129 Dynamic RAM Cache 33
File Server 63 File System Caching 14 Fine-Grained Configuration 147 Fragmented Write Requests 113 Freeing a Page 38 Functional Equivalence 129, 137 G General performance 137 H Hardware Debugging 160 I Implementing File Clamping 118 Improved application start-up times 3 Internal MMC/SD Card 114 Internal Writeable Media 145 IPC Analysis 128 K Kernel 136 Kernel Containers 82 Kernel Extension Entry Point 109 Kernel Implementation 21 Kernel RAM Usage 6 Kernel Trace Logging 159 Key Classes 21 L Live Page List 28 Locking Memory 48 Logical Channel 99
M MaxLivePages 141 Media Driver Migration 107 Media Drivers 58 Memory Accounting 34 Memory Allocation 32 Migrating 76 Migrating Device Drivers 87 Minimum Paging Cache Size 135 MinLivePages 141 Mitigation Techniques 133 MMP File Configuration 147 N NAND Flash 7 NAND Flash Structure 12 O OBY File 139 OBY File Configuration 147 Optimizing 150 P Pageability Categories 136 Paged 138 pagedrom keyword 140 Page-Fault Handler 22 Page-Information Structures 22 Page Locking 47 Paging Cache Sizes 18 Paging In 16 Paging In a ROM Page 40 Paging Out 17 pagingoverride 140 pagingpolicy 143 Paging Requests 112 Paging Scenarios 75 PDD 101 Performance Data 130 Permanent Presence 137 Physical Device 97 Physical RAM 22
Power management 137 Problems 133 R RAM Cache Interface 30 Rejuvenating a Page 36 RFile::BlockMap() API 67 ROFS 144 ROM 7 ROM paging structure 12 S Security 137 Shared Chunks 85 Sound driver 105 Static Analysis 127 Stressed Demand-Paging Configuration 154 Symbian OS v9.3 74 Symbian Reference Configurations 152 T Testing 159 The Kernel Blockmap 42 The Paging Algorithm 15 The Paging Configuration 17 TLocalDriveCaps 111 Tracing 156 U Unique Threads 92 Unpaged 137 Unpaged Files 17 USB driver 104 Use-Case Analysis 128 User-Side Blockmap 41 V Virtual Memory 50