Springer Computer Music Instruments Ii.pdf

  • Uploaded by: Miles de Lamentos
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Springer Computer Music Instruments Ii.pdf as PDF for free.

More details

  • Words: 116,800
  • Pages: 374
Victor Lazzarini

Computer Music Instruments II Realtime and Object-Oriented Audio

Computer Music Instruments II

Victor Lazzarini

Computer Music Instruments II Realtime and Object-Oriented Audio

Victor Lazzarini Department of Music Maynooth University Maynooth, Kildare, Ireland

ISBN 978-3-030-13711-3 ISBN 978-3-030-13712-0 (eBook) https://doi.org/10.1007/978-3-030-13712-0 Library of Congress Control Number: 2017953821

© Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

Today’s tools for music production have become increasingly democratised. Since the advent of the personal computer in the 1980s, means of audio synthesis, recording, editing and processing have become available to the general public. Before that time, a composer or other creative individual would need to go to a big studio or a computer centre to be able to work professionally with sonic creations. Likewise means of content distribution and tools for reaching audiences have become generally available, both for passive and for interactive listening media. Seen together, these technological changes have deeply affected the conditions for creative audio work. With wider and more affordable access, many more individuals from diverse backgrounds can work in this manner, and also the possible outcomes have multiplied. In tandem with this evolution, we have seen that the tools have become easier and easier to use. Many aspects of the expert knowledge of audio practitioners of earlier decades have been coded into the tools. Any piece of technology will affect the possible outcomes of a production process utilising it. This is also the case with audio production tools, by means of the affordances given to the creative individuals working with them. With ease of use comes also a delimitation of possible outcomes: some of the tools offered to the broad mass of creative consumers can be said to offer ‘off-the-shelf creativity’. The individual using these tools is not so much creating but instead recombining the elements offered in pleasing ways. With some of the creative decisions being aided by the properties of the tools used, it becomes increasingly important to be able to make our own tools. This book provides a solid basis for doing so by introducing computational concepts and audio programming paradigms together with a firm foundation in programming. As the book starts with the basics of the operating system, we are never lost for context. We then deal with compiling and running programs, getting to know C and C++ from the ground up and then proceed directly into realtime audio programming. There’s as much DSP as we need to get to work and make things. Then, by the time the need for more occurs, the reader’s general acquaintance with the field through practical work means that they should be well equipped to understand the literature needed to solve specialised problems outside the scope of this book. The interleaving of programming languages, by means of interfacing them with each v

vi

Foreword

other, allows freedom to choose the best tool for the job. This ability to create freely also allows freedom from the imperatives of commercial actors, as well as freedom to create commercial products should one wish to do so. I got to know Victor through international communities for open source audio programming, first and foremost though the Csound community. I deeply respect Victor’s skills as a programmer, composer, musician, researcher and writer. His productivity seems to know no limit. I had the good fortune of contributing to the book ‘Csound: A Sound and Music Computing System’ together with Victor, John ffitch, Steven Yi, Iain McCurdy and Joachim Heintz in 2016. I also count myself lucky to be working with Victor in a current research project on crossadaptive processing, where we have also developed new methods of live convolution together with Sigurd Saue. With all of the creative freedom afforded by the knowledge presented in this book, one could easily forget an additional benefit of this manner of working: transparency. For any future research on the creation process, to be able to trace the steps taken in the production, and to be able to study the intentions and incentives invested in the process of a work’s creation could be of great value. Many of today’s tools for the creative industry are closed source commercial products that are not compatible across versions of the same tool. This makes archiving for one’s own purposes a hard task, and archiving for longer-term purposes nearly impossible. This is not to say that all our current creations deserve to be studied in the future, but it might just happen that someone sometime may be interested in knowing what we did and how we worked. Working with open source software does not in any way guarantee that our projects can be run on future versions of the same software. It merely allows the possibility for someone interested to be able to decode how the software was supposed to work, and then by careful reconstruction to be able to create the environment to open those saved projects. Reconstruction will always be time consuming, but by using open source, at least we offer the opportunity to do so.

Trondheim, March 2018

Øyvind Brandtsegg

Preface

This book can be read in a number of different ways. First and foremost, it is a companion volume to Computer Music Instruments: Foundations, Design and Development. Here, many ideas and concepts introduced in that book are broken down and explored at a lower level. Another way to read this book is to take it as a fairly complete course on C11 programming, with a slant towards sound and music computing, and an added introduction to key concepts of C++ and object-oriented programming (OOP). It is also possible to take this as an applied Digital Signal Processing text, which uses programming to discuss mathematical concepts. I would also think that a number of other readings can be attempted. In any case, this book is complementary to its companion, but can also be taken on its own, as an independent text. It is true that many ideas explored here at an implementation level work out the elements of what was described there in more formal ways. There is however a conscious choice (in both books actually) to develop everything from first principles. In this text, we will also pay some attention to the discipline involved in writing code, and for this reason, programming problems are suggested in each chapter. It is my belief that we can only achieve fluency with plenty of practice, and readers who want to achieve a good level of C/C++ programming skills should attempt to solve every exercise proposed. The book is divided into two parts, the first of which, as I have outlined above, is a comprehensive exploration of the C programming language and fundamental programming concepts, from the ground up. The fact that this language can be discussed fully in this space is one of the great attributes of C: being small. Part I traces a journey from zero to complete realtime audio programming. It equips readers with all the tools necessary to create realtime audio instruments at a reasonably low level. From early on, it prioritises examples and applications that have direct relevance to making sound with computers. Chapter 1 introduces the reader to the desktop programming environment. In some ways, it picks up where we left off in the first Computer Music Instruments book, where a description of modern computing platforms for music making was offered. In the following chapters, we introduce all the components of C programming in a stepwise manner: data types, variables, arithmetics, input and output, control of vii

viii

Preface

flow, arrays, pointers, functions, and data structures. By the time we reach Chapter 8, all of the language has been dealt with, and we start looking at key elements of the C standard library, such as memory allocation, and file input and output. From Chapter 10 onwards, the focus is completely turned on to sound computing. In fact, we had introduced principles of audio signals as early as Chapter 4. As soon as we find some means of iterating operations, we are off producing sound waveforms. We discuss realtime audio synthesis and processing in Chapter 11 and complement it with MIDI control in the last chapter of Part I. At this stage, many key concepts of audio programming have been explored and we are ready to dive into DSP components, which is one of the main themes of Part II. The other theme, of course, is OOP. Throughout the chapters in Part II, we continuously demonstrate how this paradigm is extremely useful for the modelling of computer music instruments. In Chapter 13, we introduce it gently by applying its principles to the development of a cornerstone of sound synthesis: the oscillator. Each chapter in Part II is devoted to a set of instrument components that are paired with key C++ programming concepts. Midway through, we are able to discuss the development of a fully-fledged object-oriented library, AuLib, which is used to illustrate the discussion of DSP algorithms, as well as OOP. The following two chapters are devoted to specific audio processing concepts: delay lines and spectral manipulation. The latter connects very firmly with its companion text, Chapter 7 of Computer Music Instruments, and provides a complementary perspective to it. It covers similar ground, but uses programming as the main means to explore frequency-domain processing in a mostly non-mathematical way. The book closes with a look at the concept of plugins, also from an object-oriented perspective. At this point, we return, full circle, to Csound and study the means of developing the building blocks of instruments, opcodes, using C++. This final chapter connects very closely with the topics in the companion text, as it provides the means to implement in a native form many of the principles outlined in that earlier book. The target audience of this book is aligned with that of its predecessor. While some understanding of acoustics and electronic music would be helpful in assisting the reader to understand some applications, it is not strictly necessary to have prior knowledge of audio DSP or even programming. Familiarity with other languages is also not a requirement, but may allow a faster progression through the first part of the book. C/C++ programmers with no experience with audio may be able to jump into the specific sections dealing with sound and music computing. Together with its companion volume, the present book aims to provide a comprehensive discussion of computational instruments for sound and music. Maynooth, March 2018

Victor Lazzarini

Acknowledgements

Much of this book has been the result of over fifteen years of audio programming teaching at postgraduate level to music technology students. The flow and balance of topics has been tested in a large number of classes and seminars over the years. So I am deeply indebted to all of the students who have worked with me over the years, some of whom have gone on to become researchers, lecturers, and developers, and have made great contributions to the field themselves. In particular, I would like to thank Rory Walsh for taking the time to read some of the trickier sections of this book, helping me to pitch them at the right level, and providing useful comments. I would also like to acknowledge the help and encouragement of the computer music community, as well as the various contributions to software development, ideas, and concepts that have arisen from them. Special thanks should go to colleagues in the Csound development team John ffitch, Steven Yi, Tarmo Johannes, Joachim Heintz, Stephen Kyne, Franc¸ois Pinot, Alex Hoffmann, and Bernt Isak Waerstad, for their input into this open-source project and also for the enlightening discussions on all matters to do with audio programming and beyond. I am very grateful for the endorsement given by Øyvind Brandtsegg, who very kindly wrote the foreword for this book. Our collaboration stretches back many years, and recently I have had the chance to work closely with him and Sigurd Saue on some very interesting musical signal processing bits and pieces, which have indirectly contributed to elements in this book. It is important to note the continued support of Ronan Nugent at Springer, who has been very helpful in facilitating the editorial process for this book. As ever, the work for this book has been thoroughly supported by the patience and help I get from my wife Alice, and our children Danny, Ellie, and Chris. They are an integral part of any achievements I might be in a position to claim.

ix

Contents

Part I Towards Realtime Audio in C 1

Introduction to the Programming Environment . . . . . . . . . . . . . . . . . . . . 1.1 The Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 The File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 The Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 The Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.5 The POSIX Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The C/C++ Toolchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Compilers and Interpreters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Compiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Running Programs from the Terminal . . . . . . . . . . . . . . . . . . . 1.3 Introduction to C Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Character and Keyword Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Entry Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 The shin Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 4 5 8 9 9 9 9 10 11 11 12 12 14 15 16 17

2

Data Types and Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Variables and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Initialisation, Assignment and Arithmetic Operations . . . . . . . . . . . . . 2.2.1 Variable Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19 19 20 22 22 23 24 24 25 26 xi

xii

Contents

2.2.4 Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Arithmetic Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.6 The sizeof Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27 28 28 28 29

3

Standard Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Printing to the Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The Format String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Getting Input from the Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Character Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 The calc Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31 31 32 34 34 35 36 37 37

4

Control of Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Conditional and Logical Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Conditional Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Conditional Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 The while and do – while Loops . . . . . . . . . . . . . . . . . . . . 4.4.2 The for Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 The break and continue Statements . . . . . . . . . . . . . . . . . 4.5 A First Synthesis Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Plotting the Waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Playing the Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Other Waveforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39 39 40 42 43 45 45 47 48 48 49 52 52 53 53

5

Arrays and Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Two-Dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Pointers and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Pointer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Pointers and Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55 55 57 57 58 60 60 63 65 65

Contents

xiii

6

Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Function Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Variable Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Call Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 Function Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.5 Parametrised Macros and Inline Functions . . . . . . . . . . . . . . . 6.1.6 Variable Argument Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.7 Recursive Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Modular Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Pointers to Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 The C Standard Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Another Synthesis Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Realtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Arguments to main() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Translating Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67 67 68 69 69 70 70 72 73 73 75 77 77 79 80 81 82 83 83

7

Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Defining a New Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Member Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Pointers to Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Functions in Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Enumerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Bitwise Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Bitwise Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Bitshift Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85 85 86 87 88 89 89 90 90 92 93 93

8

Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.1 Allocating Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.1.1 Reallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.1.2 Freeing Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 8.1.3 Setting and Copying Memory Blocks . . . . . . . . . . . . . . . . . . . 97 8.2 Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 8.3 Linked Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

xiv

Contents

9

File Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 9.1 Standard C Library File IO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 9.2 Text File Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 9.3 Direct File IO Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 9.3.1 Reading/Writing Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 9.3.2 Error Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 9.4 File System Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 9.5 Programming Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 9.5.1 The tobin Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 9.5.2 External Score Generation for Csound . . . . . . . . . . . . . . . . . . . 111 9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10

Soundfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 10.1 Digital Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 10.1.1 Sampling Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 10.1.2 Sample Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 10.1.3 Audio Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 10.2 Basic Operations on Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 10.2.1 A Synthesis Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 10.2.2 Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 10.2.3 Self-Describing Soundfile Formats . . . . . . . . . . . . . . . . . . . . . . 121 10.3 The libsndfile Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 10.3.1 Opening Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 10.3.2 Reading and Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 10.3.3 Seeking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 10.3.4 An Example Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 10.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

11

Realtime Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 11.1 Portaudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 11.1.1 Listing Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 11.1.2 Stream Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 11.1.3 Opening Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 11.1.4 Synchronous Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 11.1.5 Asynchronous Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 11.1.6 Closing Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 11.1.7 The todac Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 11.1.8 An Audio Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 11.2 The Jack Connection Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 11.2.1 Opening a Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 11.2.2 Registering Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 11.2.3 The Processing Callback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 11.2.4 Connecting Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Contents

xv

11.2.5 Closing a Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 11.2.6 Application Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 11.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 12

Realtime MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 12.1 The Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 12.1.1 Hexadecimal Notation Revisited . . . . . . . . . . . . . . . . . . . . . . . 156 12.1.2 MIDI Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 12.1.3 Packing and Unpacking the Status Byte . . . . . . . . . . . . . . . . . 158 12.2 MIDI Programming Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 12.2.1 MIDI on MacOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 12.3 MIDI Programming with Portmidi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 12.3.1 Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 12.3.2 Opening Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 12.3.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 12.3.4 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 12.3.5 A MIDI Synthesiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 12.4 MIDI on Jack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 12.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 12.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Part II Object-Oriented Audio in C++ 13

Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 13.1 Moving to C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 13.1.1 C++ Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 13.1.2 Overloading and Optional Parameters . . . . . . . . . . . . . . . . . . . 190 13.1.3 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 13.2 The Table Lookup Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 13.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

14

Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 14.1 Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 14.2 Cubic Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 14.3 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 14.3.1 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 14.3.2 Oscillator Inheritance Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 14.4 Function Table Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 14.5 Reference Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 14.5.1 Copy Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 14.5.2 Object Reference Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . 212 14.5.3 Self References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 14.6 Phase Generators and Table Readers . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

xvi

Contents

14.6.1 The Phasor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 14.6.2 Table Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 14.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 15

Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 15.1 Envelope Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 15.1.1 Linear Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 15.1.2 Exponential Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 15.2 Access Control and Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 15.2.1 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 15.2.2 A Line Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 15.3 Operator Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 15.3.1 Standard IO Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 15.4 An Audio Output Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 15.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

16

Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 16.1 Feedback Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 16.1.1 First-Order Tone Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 16.1.2 Second-Order Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 16.1.3 Fourth-Order Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 16.1.4 Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 16.2 Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 16.2.1 Templates in the Standard C++ Library . . . . . . . . . . . . . . . . . . 244 16.2.2 Range-Based Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 16.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

17

AuLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 17.1 Object-Oriented Audio Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 17.2 Library Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 17.2.1 Stateful versus Stateless Representations . . . . . . . . . . . . . . . . . 251 17.2.2 Abstraction and Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . 253 17.2.3 Code Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 17.2.4 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 17.3 A Tour of the Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 17.3.1 Signal Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 17.3.2 Signal Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 17.3.3 Audio Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 17.4 Synthesis and Processing Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 17.5 An AuLib Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 17.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

Contents

xvii

18

Delay Line Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 18.1 Circular Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 18.2 Fixed-Delay Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 18.2.1 Comb Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 18.2.2 All-Pass Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 18.3 Variable Delay Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 18.4 Multiple Taps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 18.4.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 18.5 Lambda Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 18.5.1 Auto Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 18.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

19

Frequency-Domain Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 19.1 Fundamental Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 19.1.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 19.1.2 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 19.2 The Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 19.2.1 Real-to-Complex and Complex-to-Real Transforms . . . . . . . 298 19.3 Fast Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 19.3.1 Overlap Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 19.3.2 Overlap Save . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 19.3.3 Multiple Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 19.3.4 Convolution Reverb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 19.4 Streaming Spectral Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 19.4.1 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 19.4.2 Resynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 19.4.3 Spectral Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 19.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

20

Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 20.1 Plugins in Csound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 20.2 Framework Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 20.2.1 The Base Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 20.2.2 Deriving Opcode Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 20.2.3 Registering Opcodes with Csound . . . . . . . . . . . . . . . . . . . . . . 331 20.3 The Csound Engine Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 20.4 Opcode Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 20.4.1 Delay Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 20.4.2 Table-Lookup Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 20.4.3 Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 20.4.4 Spectral Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 20.4.5 Array Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 20.4.6 External Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

xviii

Contents

20.4.7 Multithreading Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 20.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Appendix A

AuLib Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 A.1 Library-Wide Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 A.2 AudioBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 A.3 Deriving New Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 A.4 Audio DSP Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 A.5 Control Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 A.5.1 MIDI Synth Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 A.6 Other Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 A.7 Building AuLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

Acronyms

0dbfs ADC ADSR AP API BP BR cps DAC dB DFT DSP FFT FIFO FIR FS GUI HAL HP Hz IDFT IF IIR IO IR ISTFT LFO LP LSB MIDI MSB

Zero decibel full scale Analogue-to-Digital Converter Attack-Decay-Sustain-Release All Pass Application Programming Interface Band Pass Band Reject cycles per second Digital-to-Analogue Converter Decibel Discrete Fourier Transform Digital Signal Processing Fast Fourier Transform First In First Out Finite Impulse Response File System Graphical User Interface Hardware Audio Layer High Pass Hertz Inverse Discrete Fourier Transform Instantaneous Frequency Infinite Impulse Response Input-Output Impulse Response Inverse Short-Time Fourier Transform Low Frequency Oscillator Low Pass Least Significant Byte Musical Instrument Digital Interface Most Significant Byte xix

xx

OLA OLS OOP OS PCM PID PV RMS STFT

Acronyms

Overlap-Add Overlap-Save Object-Oriented Programming Operating System Pulse Code Modulation Process Identifier Phase Vocoder Root Mean Square Short-Time Fourier Transform

Part I

Towards Realtime Audio in C

Chapter 1

Introduction to the Programming Environment

Abstract The desktop programming environment is explored, from the perspective of its major software components. We begin by discussing the concept of operating systems, and their main components: file system, terminal, and commands. The C/C++ toolchain is introduced as the fundamental collection of software that will support all the work in this book. Finally, we take a first look at the C language and its basic elements. The C/C++ programming environment of a modern desktop computer comprises a complex collection of software sometimes called the compiler toolchain. It includes programs to transform code written in plain text into a form that can be executed, as well as a number of utilities to help the development process. In addition to these, two other key components are essential. The first one of these is a program called a text editor, which is also widely employed, to create the plain text files that contain program source code. The other one is a command interpreter, sometimes called the terminal, used by the developer to invoke the different programs needed to build software. In this chapter, we will introduce these components of the programming environment, which will be used throughout the book to develop software in C/C++.

1.1 The Operating System In order to run any programs, computers generally depend on a fundamental software set called the operating system (OS) [60], which is made of several components that provide the support for applications to run. At the core of the OS sits the kernel, which provides the basic functionality for the operation of a computer, for instance, the instructions to communicate with the different peripherals, memory, input devices (e.g. keyboard, mouse), output (screens, etc.), and disk files, and to load and run programs, among other things. For personal computers, the most commonly used OSs are MS Windows, MacOS and Linux. In mobile environments, iOS © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_1

3

4

1 Introduction to the Programming Environment

and Android are fairly ubiquitous. This book will concentrate on development under and for UNIX-like operating systems [27], which include MacOS and Linux on the desktop side1 as well as iOS and Android for mobile devices2 . From now on, all discussion will assume this type of development environment.

1.1.1 The File System A key component of the OS is the file system (FS) [53], which is the software responsible for the storing of data in a permanent (or in some cases temporary) form. There are different types of FS, but we normally do not need to worry about the specific characteristics of these in normal use. Most of them operate in a similar way to organise stored data in terms of its logical units, files and directories. The former store data (of various types, such as text, images, sound, etc.), and the latter are used as containers for files or directories themselves. The FS is organised hierarchically as a tree, starting from a root, with backslashes representing symbolically the separation between levels. The root will generally contain files and directories that have a system-wide relevance, such as user programs and configuration data. Under this, we will also find a directory containing a set of user directories, one for each user registered in the system. A user directory for a given username is known as its home directory. That is where all files or directories created and manipulated by that user will be stored. For example, a user directory for the username jane in the MacOS FS is denoted by /Users/jane, with the different directory levels separated by the forward slash symbol \. Its location in the FS tree is shown in Fig. 1.1.

/

/Users

/jane Fig. 1.1: The directory /Users/jane in the MacOS FS tree.

1 It is possible to emulate a UNIX-like environment under Windows, using the Msys/MinGW or Cygwin software tools. See http://www.mingw.org and https://www.cygwin.com for more details on these tools. 2 The common practice with mobile applications is to develop them on a desktop system, rather than directly on the device themselves. Thus, we will concentrate on the use of desktop OSs in this book.

1.1 The Operating System

5

Its parent directory is /Users (holding all user directories), and the parent directory to that is / (the root directory, which contains all directories). The unique directions under the FS to that given home directory, called the path is given as /Users/jane. Thus, each file or directory in the FS has a given path to it, for example: • /: the root directory • /Users/jane/mysrc.c: a file in Jane’s home directory • /usr/bin/cc: the cc command in the /usr/bin directory As hinted above, files can be of various types, but a fundamental distinction can be made between two types of files: 1. Those that hold data (text, sound, photos, etc.). 2. Programs: executables. The basic difference is that program files are marked by the FS in a way that identifies them as executables, i.e. containing code that can be loaded and run. Data files are not marked in this way and thus cannot be run (but can be opened in programs for viewing, editing, playing, etc.). Another distinction can be made between two types of data file with regards to the format of their contents: 1. Plain text. 2. Other unspecified data (sound, photos, word-processed text, etc.). The first type is very important for us, as we will use it to hold the source code for programs. It holds only text encoded using a given character set. For C/C++ programs, these files should use the ASCII character set and nothing else. This means that we need to be careful that the files we are using are produced correctly, without any extraneous characters. To ensure this, we should always edit source code using a plain text editor (and not, for instance, a word processor3 ).

1.1.2 The Terminal The terminal [3, 46, 59] is an application that contains a command interpreter program, called the shell, which allows the user to type in and execute programs (or commands). In general, the OS allows any user program to be run from the terminal, including graphical and non-graphical programs. The former will in most cases be launched as a separate window, whereas the latter will run under the shell. A terminal can hold more than one command interpreter, either at the same time (separately in a different window or tab) or as a subprocess of a parent shell. An OS often has several different shell programs, which can be chosen by the user. The 3

Word processors produce files in different formats that often include a mixture of plain text and other formatting information, which puts these in the second category of data files as described above.

6

1 Introduction to the Programming Environment

most common of these is bash (the default in MacOS), or /bin/bash (full path), which is based on the original UNIX Bourne shell4 [27]. Once the terminal is open, it will start the default system shell. The following discussion assumes this to be bash. The shell gives you a prompt denoted by a symbolic character5 (for instance, $ is commonly used for this purpose, and we will adopt it throughout this book) where you can type commands and press the enter key to execute them. In most shell programs, the up and down arrow keys allow you to recall older command lines. Commands are made up of $ [command] [argument] [argument] ... where [command] stands for the program you want to run, and is followed by a number of optional or required arguments (depending on the command), which are passed as parameters to the program. Programs and arguments are separated by blank spaces. The shell is always opened in a given directory of the FS. This is called the working directory, and it is normally the user home directory when the shell is started. The working directory can also be identified by a dot (./); its parent (the one that contains it) is denoted by a double dot (../). We can get the path to the working directory with the command pwd (print working directory): $ pwd /Users/jane It is possible to navigate around the FS using the command cd (change directory): $ cd [directory] where [directory] is the path of the directory we want to go to. The path can be relative to the current (working) directory or absolute from the FS root. For instance: $ pwd /Users $ cd / $ pwd / $ cd $ pwd /Users/jane Note that the cd command with no arguments always bring us back to the home directory. We can navigate to anywhere in the FS where we have the right permissions to do so. In particular, we should be able to go anywhere in our user directory. 4

The command /bin/sh can also be used, generally invoking the default system shell. In some cases this is preceded by the machine name, working directory, and/or username. For example, the full prompt for where I am working now is ligeti:src victor$. 5

1.1 The Operating System

7

A number of commands are going to be useful for looking at and manipulating files through the shell. The command ls is used to list files in a directory. You can check that it matches the names that you can get using the graphical file finder/manager program in your system (e.g. Finder on MacOs). The ls command can also show hidden files and a long listing of file names and attributes if you use the optional arguments -a (all) and -l (long). These types of options that are given to some commands are also known as flags. The long listing shows us the owner of the file, its group, and the permissions associated to the owner, members of the group, and all other users in the system. For instance, the following two entries drwxr-xr-x 6 jane staff 204 13 Jun 13:44 audio -rw-r--r-- 1 jane staff 2371 12 Jul 2016 voice.txt can be interpreted as follows: 1. The first letter: d (directory) or - (file). 2. The first group of three letters, rwx: permissions to read, write or execute (if present) for the owner. In order for directories to be opened, they need to have the x permission. 3. The second group of three letters: permissions for the members of the group staff. 4. The third group of three letters: permissions for all other users. The other information in the long list provides the owner, group, size (in bytes), date and name. Generally speaking, files created by the user will be owned by her and will generally have permissions for reading only to group and others. Executable files (programs) will have x permissions. The OS provides commands for moving (renaming), copying, deleting, and viewing contents of files. It also provides means of making new directories and removing empty directories. Here is a short list of these commands: • • • • • •

mv: move files from one name (path) to another. cp: copy files from one location to another. rm: remove files permanently. cat: concatenate (show) the contents of a file. mkdir: create a new directory. rmdir: remove an empty directory.

The shell and some of the commands it runs can be configured through the use of environment variables. These hold values that can influence how the shell or other programs behave. An important such variable is PATH, which keeps the names of directories where the shell will look to find executables to run. If a command file is not in this list of directories, it will not be found and cannot be executed. The system gives users a basic pre-filled PATH with the most common executable directories in it. In order for us to check the value of an environment variable we prepend a dollar sign ($) to it, and pass it as an argument to the echo command: $ echo $PATH /usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

8

1 Introduction to the Programming Environment

The echo program prints to the terminal (shows) all of its arguments, in this case the value ($) of the variable. Each directory in the PATH is separated from the next by a colon (:), as we can see in the example. Generally your working directory (.) is not in the path. This means that any programs in it will not be found and cannot be executed unless the full path is given. You can type ./ before the program name in this case to indicate that you want to run an executable file from your directory.

1.1.3 Processes When programs are executed by the OS, they do so under a process. For example, the shell program, which takes in input from the user and can start other programs, is a process run by the OS. Several such processes are being executed concurrently in a system, each one with their own access to resources, memory space, etc. A process has one or more threads executing at the same time, which run independently but share resources. Processes have an owner, which for programs started by the user, is generally the user herself, and a number called the process identifier (PID). It is possible to get a list of active processes, their PIDs, as well as their owners using the command ps. For instance the following line prints the PIDs and full pathnames to all processes running on a system: $ ps -A A user may kill her own processes using the kill command and the relevant PID. Alternatively, a process can be stopped by name, using killall: $ kill PID $ killlall program_name Finally, a process can be started in the background, returning to the shell immediately, before it completes execution. This is often used with graphical user interface (GUI) programs, when run from the shell. In this case they will open the program window and return to the shell for the user to continue to type commands into it. To run a process in the background we use an ampersand (&) at the end of the command line. Once the program starts, the shell reports its PID, $ emacs & [1] 20331 which can also be used to stop the program if we need to: $ kill 20331 [1]+ Stopped

emacs

1.2 The C/C++ Toolchain

9

1.1.4 The Manual The system manual can be accessed directly from the shell with the command man. This can be used to print information about commands, as well as C programming subroutines (as we will see later), and specific topics. The command is $ man [topic] where [topic] stands for the topic you want to get information about (e.g. a command). The manual is arranged into sections, which you can access by passing the section number (optional) before the topic name.

1.1.5 The POSIX Standard Many of the concepts introduced here are defined as part of the POSIX (Portable Operating System Interface) standard [26]. This is a specification that encompasses much of the programming environment discussed in this book, and in some ways it can be taken as the basic specification for UNIX-like operating systems. While MacOS is POSIX-certified, and thus fully compliant, Linux adheres to it very closely, but does not have a certification. The standard defines the interface, not the implementation, of a variety of components of the OS. It also alines closely to the ISO specification of the C language [24], which is followed by this book.

1.2 The C/C++ Toolchain In order to make a working program from C/C++ code, we need to build it. This is a multi-stage process in which compilation is one of the key steps, but not the only one. Although building is a more accurate term for this, we often use compiling in an informal way to denote the complete process. To support this, the compiler toolchain provides a series of programs, which can be invoked with a single command, or in separate steps.

1.2.1 Compilers and Interpreters The central component of the C/C++ development toolchain is called the compiler. This is a program that takes the code as a plain text file and translates it into binary instructions that can be understood by the computer to execute the intended computation. The binary file that is produced by the compiler needs to be combined with other binary data, generally from other system files, in order to produce the full executable program. This is done in the final stages of the process.

10

1 Introduction to the Programming Environment

C and C++ are languages designed to be compiled in this way, producing highly efficient programs. In contrast, there are other languages, such as Python and Lisp, that are not dependent on compilers, but on an interpreter program, which does the translation from code text to computation directly, without the need for a compilation stage. These are generally less efficient from a pure computation point of view, but have an advantage of being generally more interactive and they work at a higher level (i.e. demand fewer programming steps/number of code lines in a program). For the type of computation involved in audio and music applications, we often require the efficiency of compiled code. Languages that are run on optimised virtual machines, such as Java and Javascript, can be seen as an in-between solution, where compilation to an intermediate bytecode representation is used in place of direct interpretation or machine code.

1.2.2 Compiling In the first part of this book, we will concentrate solely on the C language, and thus the discussion from now on will turn to the specific tools used to build programs written in that language. The command cc is used to invoke the C compiler6 , to which we need to pass the name of file to be compiled, and the name of the output program we want to create: $ cc mysource.c -o myprog where we are passing mysource.c, called the source file, containing the code for the program. We are also using the flag -o to indicate the name of the output file myprog, which will hold the compiled program. We can see that this file has been created in the current directory by listing it: $ ls -l myprog -rwxr-xr-x 1 jane staff 8432 13 Jun 21:42 myprog Note that the file has execute permissions as it was created as binary executable. Using the cc command in this way invokes all the toolchain commands in one single step, behind the scenes, to build the new program. The main stages of this process can be listed as: 1. Preprocessing. 2. Compiling. 3. Linking. In the first step, the code text in the source file is manipulated to produce the input to the compilation process. One of the typical aspects of this preprocessing is 6

We assume you have the compiler toolchain installed on your system. This might need to be installed, please revert to the instructions for your specific platform in order to do so. You can check whether the tools are installed by typing the cc command and checking whether it exists in the system.

1.3 Introduction to C Programming

11

the addition of code taken from other existing files called header files (because they are often placed at the top of the source file). These files usually have names that use the extension .h (although this is not mandatory) and contain standard lines of code that are used by many programs. They are used to facilitate programming, reducing the need for these lines to be rewritten in every new source file. Other preprocessing operations can be invoked, such as text substitution (also known as macros). Once the final program code in text form is ready, with all preprocessing done, the compiler translates it into object code. The output from this stage will contain only the compiled binary version of the code that was written in the source file, nothing else. In the majority of cases, to make a full executable, we require some extra chunks of object code to allow the OS to load and run it. These come from existing pre-compiled components that are kept in library files. Again, much of this binary code is standard and does not need to be compiled every time a program is built. To bring in these extra components and combine them with our compiled object code, we need the third step, linking, from which emerges the full program. While it is possible to perform these three stages in separate calls to the different compiler tools, we will not need to do this in most of the examples in the early part of this book. With larger and more complex projects containing multiple source files, it will make sense to split the build process into separate compiling and linking steps.

1.2.3 Running Programs from the Terminal The compiler places the newly built program in your current directory. It can be run from the terminal like any other command/program in the system. For this, we give the full path to the filename, as in the following example, $ /Users/jane/myprog Alternatively, we can use the . shorthand, $ ./myprog which is more convenient as it will not require us to remember the full path to the working directory. This, of course, assumes that the working directory is not in the PATH list.

1.3 Introduction to C Programming Now that we have introduced the environment in which we will be developing our programs, we can turn our attention to the C language. In this section, we will explore the fundamental elements of program structure, layout, compilation, and execution. This will be done by looking a simple program, which, although trivial, will illustrate all of these basic aspects of programming.

12

1 Introduction to the Programming Environment

1.3.1 Character and Keyword Sets All C Programs may avail of the following set of distinct characters [24]: 1. The 26 uppercase letters of the latin alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 2. The 26 lowecase letters of the latin alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z 3. The 10 decimal digits 0 1 2 3 4 5 6 7 8 9 4. The 29 graphic characters ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ˆ _ { | } ∼ 5. The space character and the control characters representing horizontal tab, vertical tab, and form feed. This list implies that the language is case-sensitive, which means that it pays attention to the capitalisation of identifiers. In addition to this character set, we should note that the language reserves a series of specific keywords for particular uses. The following is a list of these that is defined by the C language standard [24]: auto break case char const continue default do double else enum extern float for goto if inline int long register restrict return short signed sizeof static struct switch typedef union unsigned void volatile while _Alignas _Alignof _Atomic _Bool _Complex _Generic _Imaginary _Noreturn _Static_assert _Thread_local

1.3.2 Entry Point Programs are organised in structural blocks called functions7 . All the code that performs computations, the program statements, is then placed inside these programming units, which are executed by the computer. Programs are executed statement by statement in a sequential manner, until the last one is performed, when the program exits. 7

A more precise definition will be given in Chapter 6.

1.3 Introduction to C Programming

13

C programs will consist of at least one function, called main() [24]. The main function is the first function that is called when your program runs. It is known as its entry point, from where the OS makes the program start the execution of a process. From there onwards all instructions in the program source code are executed in sequence. When the last instruction is performed, the process is exited, returning an exit code to the OS (to indicate successful completion or otherwise). A flowchart demonstrating this operation is shown in Fig. 1.2. user starts the program

-

OS loads the executable

? instructions are executed in sequence

entry point is found



? last instruction executed program exits

-

OS takes a return code from program

Fig. 1.2: Running a program.

A simple program may therefore consist of a single function, called main(), with a sequence of statements inside it: int main() { statement_1; statement_2; ... statement_N; } Each statement is terminated by a semicolon (;). This serves as a full stop for C program code. Without it, the compiler will not know where one statement ends and where the next starts. Statements may span multiple lines, so it is very important to pay attention to the placement of semicolons. All our C programs will need at least one of the C standard libraries, which deals with standard input and output of data. Its associated header file is stdio.h. We add it to the program code with this preprocessor command at the top of the source file: #include <stdio.h> All lines starting with a # (hash) are preprocessor commands. The include command effectively copies all the text data from a header file into the position where the preprocessor finds it in the source file.

14

1 Introduction to the Programming Environment

1.3.3 The shin Program The archetypal first program is an analogue to the classic Hello World by Kernighan and Ritchie [28]. This is simple enough to demonstrate the basic C program structure and layout introduced in the previous section: 1 2 3 4 5 6 7

#include <stdio.h> int main() { printf("Live Long and Prosper.\n"); return 0; /* end */ } This program contains one function, main, which holds two statements: printf (...); and return 0;, each one duly terminated with a semicolon. Note that, for the sake of clarity, we have placed each statement on a separate line. This is not actually required by the C syntax in order to distinguish them. As we have noted before, only the semicolon is used for this purpose. Single statements can span multiple lines; a single line can contain multiple statements. The first statement in line 4, printf("Live Long and Prosper.\n"); calls the printf() function that is defined outside this program. We did not write its code, it is provided by a library. The C program knows about printf() because it is listed in the stdio.h header we are including at the top. This function is part of the standard C library and is used to display text. The characters that make up the text are passed to the function inside double quotes. This is called a string, which is how C programs store text. All parameters to functions are always placed inside parentheses after the function name. The expected result of this call is that the text characters are printed to the standard output, which is by default the terminal. The final statement of the main() function in line 5, return 0; is used to yield a result (0) as the output from this function, which is the numeric code returned to the OS to indicate all went well and the process finished cleanly. The final line of the main() function (line 6) is a comment, defined by the /* and */ delimiters, which contains no program statements and is therefore ignored by the compiler.

1.3 Introduction to C Programming

15

Compiling and running Using the text editor of choice8 , this code is placed in a file called shin.c and compiled with $ cc -o shin shin.c producing a program called shin. Note the use of the -o flag, indicating that the output of cc is a file called shin. The cc command will invoke the preprocessor to deal with the #include line, then the compiler itself to transform the preprocessed code into binary form, and finally the linker to insert the extra externally-defined bits, such as the printf() function. We can run it with the following command (which is the name of the program file): $./shin Live Long and Prosper. where ./ means the file is in the current working directory. As we have seen before, in order to run the program, the command-line interface (shell) looks for executable files (programs) in certain directories indicated by the environment variable PATH. Only directories in the path will be searched for. The current directory might not be in the path; to make sure you are running the right file, always type in the full path to it: ./shin which is a program file called shin in the current directory.

1.3.4 Summary The following is a summary of some of the fundamental details of program structure that we should be aware of: • Comments: programmers can add comments using the /* and */ delimiters anywhere in the program source code. Anything placed in between these will not be read by the compiler. They can span multiple lines: /* shin.c author: V Lazzarini, 2018 / * #include <stdio.h> /* header file for stdio */ int main() /* main function */ { 8 Gnu Emacs (https://www.gnu.org/software/emacs, also called Aquamacs on MacOS, http:// aquamacs.org) and Atom (http://atom.ie) are good examples of text editors that are available for a variety of platforms.

16

1 Introduction to the Programming Environment

/* this prints a message */ printf("Live Long and Prosper.\n"); return 0; } The C language standard [24] also allows single-line comments beginning with //, running to the end of the line: int main() // this is a comment until the of the line Use comments wisely: do not over-annotate. The code should be readable without any external references, if at all possible. Comments can also be used to isolate (comment out) code statements when diagnosing a problem or trying alternative versions of a program. • Entering and exiting: as we have discussed above, main() is the entry point of the program. Thus, when this function reaches its end, the program stops. The C language standard mandates that we define main() with a return type int9 : int main() { ... } Thus, by this definition, main() is expected to return a numeric code to the OS. This is generally 0 if everything was OK, and anything else if not. Since int is a keyword for an integral data type (a whole number), a statement will need to be provided to return a value of this type. This is what the function does at the end, using the keyword return: int main() /* main function returns integers*/ { printf("Live Long and Prosper.\n"); return 0; /* we return 0, meaning 'OK' */ } • Standard IO: text output to the terminal is handled by the C standard input output (IO) library. The printf() function is defined in stdio.h and implemented by the C library. To use it, we have to include that header file. Similarly, as we will see, to get input from the terminal, we can use other stdio.h functions.

1.4 Conclusions In this chapter, we have seen that the OS is a collection of software that provides the environment for programming and running applications. As part of this, it includes 9

Types will be discussed in the next chapter.

1.4 Conclusions

17

a file system (FS) that organises files and directories (folders) and allows these to be manipulated. Directories hold files and other directories, files can hold data or programs (executables). The terminal (through a program called the shell) can be used to run programs (also known as commands). The PATH is used by the shell to locate commands. C programs are built in three major stages, which include pre-processing, compiling and linking. Header files contain definitions that are required, for instance, by programs using code from libraries. They are added to programs using a preprocessor directive, #include. All programs have an entry point, which is usually the main() function. C programs are run sequentially, statement by statement. They terminate by returning a value to the OS. Next, we will start looking at the fundamental elements of programming in C, using the tools and concepts developed in this chapter. In particular, we should try to be comfortable with the development environment described here, and bear in mind what has been discussed with regard to the overall structure of a program, its compilation, and execution.

Problems 1.1. Modify the existing lines of the shin program, compile it and observe the result: (a) What happens if you add copies of the line containing printf("Live Long and Prosper.\n");? (b) What happens if you modify the text inside the double quote marks?

Chapter 2

Data Types and Operators

Abstract In this chapter, some fundamental concepts of programming are discussed. Data types and variables are introduced, as well as the principles of binary encoding, bits, bytes, and endianness. We then look at the different built-in types that are available in the C language and the arithmetic operations that can be applied to them. The C language is fundamentally oriented towards executing operations with numeric data, in particular for the applications we will be targeting in this book. Everything we program will ultimately be based on arithmetic and logic operations, even if on the surface, the resulting software might not immediately appear to be so. This furnishes us with a good starting point to learn the language. We will start by introducing the concepts of variables, their types and the basic operations we can apply to them.

2.1 Variables and Types Variables, in a programming context, are memory locations that we can address directly or indirectly to store numbers or text characters. They are also called objects in the C language standard [24], when referring particularly to those that can be modified. Types are provided to determine the meaning of the contents that are stored in a variable. The following are some of the fundamental C language types that can be employed in a program: • Integer: whole numbers. • Floating-point: real numbers1 . • Character: text characters. 1

Actually, a finite representation of a real number, as some of these may have an infinite decimal expansion [29].

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_2

19

20

2 Data Types and Operators

Before any variable can be used in a program, it needs to be declared appropriately. In addition to being given one of the types above, each variable will be identified by a symbolic name, which must begin with a letter or a _ (underscore) character. All variables will occupy a certain amount of space in memory, which will be determined by its type. The bit is the name we will use for a unit of data that can hold two states, 0 or 1. Based on this, we can define a byte, the implementation-dependent addressable unit of storage2 [6], which for our present purposes is equivalent to 8 bits. Each specific type defined in the language has a given size in bytes, which is also implementation dependent.

2.1.1 Encoding In binary architectures, all numbers are ultimately encoded using base 2 [6]. Although we will not generally use a binary representation directly in our programs, it is important to know some fundamental principles related to this. For instance, to translate a non-negative integer from decimal to binary, we can state it in terms of a series of powers of 23 : 1310 = (1)23 + (1)22 + (0)21 + (1)20 = 11012

(2.1)

More generally, we have, for a decimal integer d and its binary encoding b of size N in bits [10], N−1

d=

∑ b(n)2n

(2.2)

n=0

where the n + 1 binary digit of b is given by b(n). Notice that the lower-order bits (low n) are less significant than the higher-order ones. This means that a change in one of those bits leads to a smaller change in value than does a change in a higherorder bit. The least-significant bit is of order zero, associated with 20 in eqs. 2.1 and 2.2, assuming the standard right-to-left positional notation.

Byte order Data types that hold more than one byte (all types listed above except for characters) can also be ordered in terms of the most-significant byte (MSB) or least-significant byte (LSB) [6]. This follows the same idea: the latter ordering is the one where a change of 1 in its least-significant byte will mean a minimum change in value. In

2 3

This means that in normal situations the effective minimum storage size of a variable is a byte. We use the notation xN to mean x in base N.

2.1 Variables and Types

21

the case of the MSB, a change of 1 will mean a bigger change. For instance, if we have a 2-byte number in MSB – LSB arrangement, then 0000 0000 0000 0000 = 0 0000 0000 0000 0001 = 1 0000 0001 0000 0000 = 256 Byte ordering in computer memory is system dependent. There are two typical arrangements: big-endian and little-endian ordering [10]. In the first case, bytes are addressed in increasing order of significance, LSB to MSB, whereas in the other case, the MSB comes first. For example, as shown in Fig. 2.1, a 4-byte number in big-endian architectures will have the bytes ordered 0, 1, 2, 3, whereas in the little-endian case it would be 3, 2, 1, 0. The x86 64/i386 family of processors has a little-endian architecture.

MSB 3

LSB 2

1

LSB 0

0

little-endian

MSB 1

2

3

big-endian

Fig. 2.1: Little-endian and big-endian byte order for a 4-byte number.

Note also that, since the C language has a byte as its lowest addressable data unit, we are not concerned in general with how bits are stored inside a byte. Additionally, the underlying byte order is not relevant when we are denoting literal constants, which are always written using the right-to-left positional convention from mathematics4 . Generally, we only need to be careful with byte ordering when we need to transfer data from one system to another (e.g. by copying files, see Sect. 10.2.2) or when accessing individual bytes packed in a multi-byte data type. Therefore this is an issue that will not be significant immediately, but we will meet a number of situations where it is, at later stages in this book.

4

We always write the most significant digit to the left of the number. This can be viewed from a little-endian or big-endian perspective, depending on the way we read it. Cohen [10], for instance, considers this to be a big-endian order as the ‘wider’ end of the number comes first if we are reading it as we would do a text in English; a little-endian ordering under this perspective is akin to Arabic or Hebrew writing. In [6], it is concluded that big-endian ordering is superior in terms of computer architecture design.

22

2 Data Types and Operators

2.1.2 Integers An int variable is used to store signed whole numbers. For example, the C statement int a; declares an int variable and calls it a. There altogether five standard types of signed integers, signed char, short int, int, long int, and long long int. For each one of these, there is a corresponding unsigned type, declared by the unsigned keyword. The C language standard [24] requires that a signed char occupies at least a single byte (minimum range: -127 to +127); a short integer should hold at least two bytes (-32767 to +32767). The long type is defined as using at least four bytes (-2147483647 to +2147483647) and the long long type, eight bytes (-9223372036854775807 to +9223372036854775807). Unsigned integers will be able to hold twice their corresponding signed range. The exact size of each data type in C is always implementation dependent. In most modern 64-bit architectures, the five standard integer types listed above will be stored in 1, 2, 4, 8, and 8 bytes, respectively. The following are some examples of type declarations: unsigned int ua; /* an unsigned integer */ unsigned long ulb; /* an unsigned long integer */ short sample; /* a signed 16-bit integer */ The C language standard [24] defines the following exact size integer types in the stdint.h header file. If we include this file, we can use them in a program: int8_t int16_t int32_t uint8_t uint16_t uint32_t int64_t uint64_t As can be inferred, u* means unsigned, and *N_t means N bits of precision. The 64-bit sizes might not present in some platforms. If the size of an integer variable is crucial for an application, we should use these whenever our compiler toolchain is compliant with the C99 (or later) version of the standard.

2.1.3 Real Numbers Floating-point numbers are so named because they store a real number in two parts: an exponent (which tracks the point position) and a mantissa (which holds the actual numbers over which the point floats). For example,

2.1 Variables and Types

23

2.56 = 256 × 10−2

(2.3)

where 256 is the mantissa (or significand) and −2 the and this can be represented as 256e − 2. There are two common sizes of floats (as defined by the IEEE 754 standard [21]) commonly used in the C language: exponent5 ,

• float: a single precision floating-point number has about seven digits of precision. Single-precision floats use three bytes (24 bits) for the mantissa and one byte for the exponent. float result; • double: a double precision number has about fifteen digits of precision. A double takes eight bytes to store, using fifty-three bits for the mantissa and eleven bits for the exponent. double value; A long double type is also defined by the language, which may implement the ten-byte IEEE extended format in most of the commonly-used computer architectures.

2.1.4 Characters The type char holds a single character, stored in one byte. For example: char c; This type is most often used to store ASCII characters (which are themselves 7-bit codes), but can be used for any single-byte numerical use. The type char can either be signed or unsigned6 . In the shin program of Sect.1.3, we used a sequence of characters to print a message to the screen, and called this a string. We also noted that this is the usual form for C programs to handle text data. Each character in a string is effectively a char, but the complete sequence is treated as a single block. We will leave the details of how strings can be manipulated as variables for later, but will discuss literal strings in Sect.2.2.2 of this chapter. For now, we will just determine that strings will be defined by the char* type (note the asterisk).

5

In this case, we are using 10 as the base for the exponent. Other bases may be employed. The C language does not specify whether char is signed or unsigned. If you are using it for numeric applications, you might need to explicitly declare it, or use int8_t/uint8_t if you have them.

6

24

2 Data Types and Operators

2.2 Initialisation, Assignment and Arithmetic Operations When first declared, variables can be initialised to a given value: int a = 0; Multiple variables can be declared and/or initialised in a single statement, separated by commas: int a = 0, b, c = 2, d = 3, e; In general, the comma can be used to place two or more operations or expressions in a single statement. Operations are ordered left to right. If a variable is not initialised, its value will be undefined until some data is written into it. You can store a value in a variable using an assignment operation: name = value; For instance, a = 10; stores the value 10 in the variable a, which was previously declared7 .

2.2.1 Variable Scope The scope of a variable is the extent of a program in which it is relevant. Variables declared within a program block are valid, and in existence, only inside that block (and in all enclosed blocks). A program block is delimited by brackets ({ ... }); thus, a function is a program block. In general, blocks can be used freely to define variable scope, if needed. Variables declared inside a function are known as local, to separate them from variables declared outside them, which are global. They are seen by all functions within a source code file. It is best practice to avoid global variables whenever possible. The lifetime of a C variable is generally automatic (implying the storage class auto), that is, they come into being when declared and are destroyed when they go out of scope. Local variables will have function or block lifetime, whereas global variables will last until the program exits. It is possible to make a local variable have program lifetime by marking it as static (instead of the default auto), which will also mean that it refers to a single memory location that is shared by all accesses to that particular variable.

7

Note that = is the assignment operator and does not mean identity (or equality) (which is denoted by ==, as we will see later).

2.2 Initialisation, Assignment and Arithmetic Operations

25

2.2.2 Constants Constants are numeric values that cannot be changed throughout a program. Literal integer constants are normally written in base-10 format (decimal system): 1, 2. For long integer constants, an L is added: 2L, 10L. For explicitly unsigned constants we can use a U: 2U, 10UL8 . Literal floating-point constants will have two forms: with an f at the end, for floats and just with a decimal point somewhere for doubles (2.f is a float; 2.0 is a double). Integer literals can also be written as either hexadecimals (base 16) or octals (base 8): 1. Octal constants are preceded by a 0. The decimal 31 (= 000111112 ) can be written as: int a = 037; // 037 in octal is 31 in decimal Octal digits will range from 0 to 7. Each one can hold 3 bits (0002 to 1112 ). 2. Hexadecimal constants are preceded by an 0x: int a = 0x1F; // 0x1F in hexadecimal is 31 in decimal Hexadecimal digits will range from 0 – F, with A – F representing the decimals 10 – 15. Each digit holds 4 bits, two of them encode 1 byte. For instance, F in hexadecimals represents (1-valued) set bits. For instance, the 16-bit (2-byte) bitmask9 0xFF00 is a series of 8 set bits followed by 8 zeros (1111 1111 0000 0000). Floating-point literals may be written in exponential form. For example, the double constant 0.004 can be notated as double f = 4e-3; and an f may be appended to it to make it a single-precision float. Macros10 can also be used to give names to constants. The preprocessor statement #define will do this for you, and so #define VALUE 10000 will substitute the integer literal 10000 for any instances of the word VALUE, so that you can use VALUE as a constant in your code. The preprocessor takes care of all replacements for you. Single-character literals are defined by single quotes: char c = 'a'; will store the code for the single ASCII character a in the variable c. Literal strings are defined inside double quotes " ": 8

Lower-case u and l can also be used. Bitmasks are used in bitwise operations, which we will see later in the book. 10 Macro is the general name given to the token replacement operation supported by the preprocessor. 9

26

2 Data Types and Operators

"Live Long and Prosper." is an example. They are used to define constant text objects to be employed in programs, such as a message printed by the printf() function. String literals are read-only, and any attempt to modify them leads to undefined behaviour. C string constants cannot span multiple lines inside a single pair of double quotes, but can be split into two or more sets inside multiple pairs of double quotes, which are concatenated by the compiler. For instance, "Live " "Long " "and " "Prosper."; is a valid string literal. Alternatively, and more generally, the backslash character \ can be used as a line continuation character to indicate the absence of a line break at that point: "Live \ Long \ and \ Prosper."; Finally, C also includes a const keyword which can be used to declare variables that are read only, which effectively makes them constants: const int end = 0; in which case we require an initialisation (since the identifier end is not modifiable). Read-only variables and literal constants are distinct: in some cases where a constant is called for, compilers might require a literal to be given explicitly instead of a constant that is defined by a const object.

2.2.3 Operations The fundamental arithmetic operators are: • • • • •

addition: a + b subtraction: a - b multiplication: a * b division: a / b remainder: a % b

For both division and remainder, if the value of the second operand (b) is zero, the behaviour of the operation is undefined [24]. When mixing variable types, as in a = 20.0/6

2.2 Initialisation, Assignment and Arithmetic Operations

27

care needs to be taken. The actual result will depend on the types involved (in this case, we know there is a double constant being divided by an int constant). If a is an integral variable, then the result will be truncated to 3. If it is a floating-point variable, it will be expanded up to the type precision (single or double). Note that: 1. Integer division may truncate the result (in which case the remainder will be non-zero). 2. If a floating-point type is included in the expression, an integer variable will be upgraded to an equivalent floating-point type before the operation is carried out. The operator % returns the remainder of an integer division: int a = 5, b = 2; int q, r; q = a / b; /* q = 2 */ r = a % b; /* r = 1, thus a = b*q + r */ For unsigned numbers, it can also be interpreted as a modulo operator. In general, this is defined to match the following relation, for r = a mod b, and q = a/b, r = a − bq

(2.4)

with non-negative integers (and b > 0) [29]. We can think of it as counting up from 0 to b − 1 and then starting back at 0, and repeatedly to b − 1, until we have counted a + 1 numbers: 5 mod 2 is 1 (0, 1, 0, 1, 0, 1). Conversely, 2 mod 5 is only 2 (0, 1, 2), which is the same as 7 mod 5 (0, 1, 2, 3, 4, 0, 1, 2). This is sometimes called clock arithmetic, as it follows the idea that the hours are calculated modulo 12, and minutes modulo 60.

2.2.4 Conversion Data types can be explicitly converted into one another by using a cast, defined by the operator (type): int a = 1; float b = 1.f; a = (int) b; b = (double) a; Conversions between integral and floating-point types may cause truncation, as the fractional part of the number is lost. It is also important, when converting types to ensure that the recipient has enough range to hold the data or overflow might occur.

28

2 Data Types and Operators

2.2.5 Arithmetic Order Arithmetic ordering puts multiplication, division and remaindering at a higher precedence than addition and subtraction. All of these operations are left-to-right associative, so operators of the same level of precedence are executed in that order of appearance. To eliminate any confusion, we can use parentheses, ( and ), to group operations. These have the highest precedence of all, so whatever is placed inside them is evaluated first: 1. Addition and subtraction: 1 - 2 + 3 /* 2 */ 1 - (2 + 3) /* -4 */ 2. Multiplication and division: 18 / 2 * 3 /* 27 */ 18 / (2 * 3) /* 3 */

2.2.6 The sizeof Operator As we have noted, most of the data types defined in the C language standard have implementation-dependent sizes. To get the exact size of a variable or a type, we can employ the sizeof operator. This can be used with any operand whose size is known at the time of compilation. The result of this operation is the size in bytes occupied by the operand, and the type of this result is the unsigned integer type size_t11 (itself an implementation-dependent type) [24]. For example, size_t int_size = sizeof(int); can be used to get the size of an integer in the system. Likewise, we can check the size of a given variable: float f; size_t f_size = sizeof(f); This operator will allow us to verify requirements in certain situations when we will need to manage memory space ourselves in a program.

2.3 Conclusions We have examined some of the most fundamental aspects of C programming in this chapter. In particular, the concepts of variable and type are crucial to the functioning of a program. We should try to make sure the general principles outlined here 11

Defined in stddef.h.

2.3 Conclusions

29

are well understood as they will serve as the basis for the remainder of this book. Unfortunately, however, what we have explored so far does not allow us to write our first fully-functional program, as we are missing one key element: the capacity to interact with the external world. This is what we call input/output, and we will introduce it in the next chapter.

Problems 2.1. As a pen and paper exercise, do the following: (a) Write 32, 55 and 102 in binary form (using as many bits as you need). (b) For each of these binary numbers, shift all bits by one position to the left (adding a zero to the new lowest order bit, i.e. 101 → 1010). Convert the results into decimal form and compare with the original numbers. (c) Do a similar operation with the same original binary numbers, but instead shift by one to the right, i. e. 101 → 10. Convert them to decimal and compare. (d) What is the effect of these shifting operations? 2.2. What are the results of these operations with C constants? (a) 1 + 2 / 3 * 4 (b) 3 * 3 / 4.5 (c) 10 / 3 / 2

Chapter 3

Standard Input and Output

Abstract This chapter covers the basic means of input and output that are available to C programs. We introduce the principles of formatted input and output, which will provide the most generic methods of getting data in and out of programs. In addition, we also explore other methods of single character input and output, and string output. With the ideas presented in this and the previous chapter, we are able to start writing our first straight-line programs. Before we are able to write our first programs, we need to find a means of interfacing with the world outside it. For this purpose, we have a variety of input and output (IO) means, the simplest of these being the standard IO functions. With them, we will be able to feed data into our program and display the results. This functionality interacts with the shell in a very tight way, which can be used for more than just typing inputs and printing data.

3.1 Printing to the Terminal The most general way to output results from a program is through the printf function, which we have first encountered in Chapter 1. It takes a constant string and a number of optional arguments. The function prototype, which tells us its overall form, is int printf(const char *format,...) where the ellipsis indicates that we can use one or more extra parameters at the end of the argument list, all separated by commas. The format string1 determines how many parameters we will need. If it contains any format specifiers [24] introduced by the % character, it will call for one or more extra arguments. 1

As we indicated earlier in Sect. 2.1.4, the char* type defines a string, and the const keyword indicates it will be used read-only (constant) in the function. More details on strings will be furnished later in the book.

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_3

31

32

3 Standard Input and Output

We have seen the case of printf("Live Long and Prosper."); where we only have the format strings and nothing else. As this contains no %, characters it results in the string literal being printed without anything extra. The function always returns the number of characters printed, but we can ignore this value if we want.

3.1.1 The Format String In the format string, the characters following the % indicate how the value of a corresponding parameter is displayed, the conversion specification. This is defined by a sequence containing, in the following order: 1. Zero or more flags, modifying the meaning of the conversion specification. 2. An optional minimum field width, to determine how many characters to be displayed. The field width is defined by an asterisk (*) or a non-negative decimal number. If the converted value has fewer characters than the field width, it will be padded with spaces. 3. An optional precision, giving the minimum number of digits for numeric conversions. The precision is defined by a period (.) followed by an asterisk or an optional decimal integer. 4. An optional length modifier, specifying the size of the argument. 5. The actual conversion specifier character to determine the type of conversion. For each conversion specifier, we need to supply an argument to be converted. The format specifier determines the type of the argument expected. If you use a specifier with the wrong argument type, printf() will not work properly. The basic conversion specifiers that you can use in the C language are as shown in Table 3.1

Table 3.1: Basic format specifiers. specifier \%c \%d (\%i) \%e (\%E) \%f \%s \%u \%x (\%X) \%o

type char int float or double float or double char* unsigned int int int

printed output single character signed integer exponential format signed decimal sequence of characters unsigned integer unsigned hex value unsigned octal value

3.1 Printing to the Terminal

33

The optional length modifiers are: • • • • • •

hh: specifies that the integer conversion applies to a char argument. h: specifies that the integer conversion applies to a short argument. l: specifies that the integer conversion applies to a long argument. ll: specifies that the integer conversion applies to a long long argument. z: specifies that the integer conversion applies to a size_t argument. L: specifies that the floating-point conversion applies to a long double argument.

When the field width, precision, or both are indicated by an asterisk, an extra int argument needs to be supplied to determine it. In this case, such argument should be provided before the corresponding argument that will be converted. The precision gives the minimum number of digits for integer conversions, the number of digits after the decimal point for floating-point conversions, and the maximum number of bytes for the string conversions. The optional flags are as follows: • • • • •

-: left justify. +: always display sign space: display space if there is no sign 0: pad with leading zeros #: use alternate form of specifier The alternate form # of the modifier can be used as follows

• • • •

%#o: adds a leading 0 to the octal value %#x: adds a leading 0x to the hex value %#f or \%\#e: ensures decimal point is printed %#g: displays trailing zeros

Format strings may contain any ASCII characters, including some special formatting codes. These are always escaped with a backslash: • • • • • • • • • •

\b: backspace. \f: formfeed. \n: newline. \r: carriage return. \t: horizontal tab. \v: vertical tab. \': single quote. \": double quote. \0: null character. \a: sound/bell alert. Examples:

– A message including an integer, followed by a newline:

34

3 Standard Input and Output

int a = -10; printf("This is an integer: %d \n", a); – Two unsigned integers separated by a tab and followed by a newline: unsigned int a = 1, b = 4; printf("%u \t %u \n", a, b); – A long integer with ten characters of field width, right justified (no newline): long int a = 100; printf("%10ld", a); – A floating-point number with three decimal digits of precision, that is the result of an expression, followed by a newline: int a = 100; printf("%.3f\n",

a/3.);

– A vertical tab and three characters inside double quotes, followed by a newline: printf("\v\"%c%c%c\"\n", 'h', 'i', '!');

3.2 Getting Input from the Terminal Data from the standard input can be retrieved with scanf(), which has a similar prototype to printf(), int scanf(const char *format, ...) This function will return the number of items assigned, but this value can be ignored if not needed. In some cases, as we will see later, it can return a special code defined by the macro EOF, indicating that there is no more input to be read. .

3.2.1 Pattern Matching The format string in the scanf() case will perform pattern matching, reading what has been typed at the input and placing it in one or more corresponding variables. The main difference here is that each argument will receive data (rather than providing it, as in the case of printf()), and for this reason we will need to expose the memory address of each parameter. This will allow scanf() to use these parameters as output rather than input. Once an address has been passed, the function can place data in it. In the C language, the address of a variable can be obtained using the & operator: int a; // variable a &a; // the address of a

3.3 Character Input and Output

35

Thus, to get two integers from the input, we can use int i, j; scanf("%d %d",&i,&j); which will read in two whole numbers into i and j, and ignore any whitespace or new lines in the input. The following rules apply as far as the formatting string is concerned: • Any format specifiers will be used to translate a given input into a variable address provided. For example the %c places a single character typed at the terminal into a char variable: char c; scanf("%c",&c); • Any whitespace characters in the formatting string will match any number of such characters typed at the input. For instance char c; scanf("%c ",&c); will ignore any number of spaces, newlines or tabs typed after a single character. • Any ordinary character (except %) will match a corresponding character in the input. This means that a scanf() call will attempt to match an input to a format string. If it cannot, it will return without scanning any further inputs. For instance char c; scanf("hello %c ",&c); will look for an input that matches the string "hello" followed by any number of spaces and a single character.

3.3 Character Input and Output In addition to the formatted IO functions outlined above, which provide a comprehensive means of IO for programs, we call avail of single and multi-byte character functions provided by the C library. These are int putchar(int c); int getchar(); for single characters (which are converted to/from int), and int puts(const char *s); for character strings. With the latter function, in particular, we could have written the shin program as

36

3 Standard Input and Output

int main() { puts("Live Long and Prosper.\n"); return 0; } These functions have more limited applications than the general-purpose printf() and scanf(). However, they might be more appropriate for some specific tasks such as retrieving individual characters from the standard input, printing user messages, and character-by-character output.

3.4 The calc Program The following program implements an interactive calculator that outputs the sum of two whole numbers: 1 2 3 4 5 6 7 8 9

#include <stdio.h> int main() { int a,b; printf("\n Please enter the two numbers: "); scanf("%d %d",&a, &b); printf("%d + %d = %d \n", a, b, a+b); return 0; } Line 1 includes the stdio.h header, which contains the declarations for the functions printf() and scanf(). Two variables, used as memory to hold each input number separately are declared in line 4. The next line prints an instruction to the terminal, which is followed in line 6 by a call to scanf() to get the input data. This will block execution until the pattern in the format string (two numbers separated by spaces) is matched by the user input. Once this happens, the numbers are placed in variables a and b. Line 7 prints the two numbers and their sum in a format string. If we place this program in a file called calc.c, we can compile and run it as shown below: $ cc -o calc calc.c $ ./calc Please enter the two numbers: 2 3 2 + 3 = 5 $ Note that a newline is printed at the start of the program, as we had \n as the first character of the message string, followed by a white space. This string did not

3.5 Conclusions

37

terminate with a newline, so the program waited for input at the same line it printed to the shell. Two numbers were typed followed by an ‘enter’, leading to the result being printed out in the next line.

3.5 Conclusions We are now in good shape to attempt to program some of our first software. This will be very simple at first, but we should be paying a lot of attention to the details of getting data into the program, performing the required computation, and producing the output. These first programs are based on straight-line code: we start at the top of the main() function, and perform a sequence of steps, exiting at the last statement. Once we are comfortable with this, we will be able to start adding detours and repeats, which are collectively known as control of flow, as we will see next.

Problems 3.1. Ask for a distance in feet, convert it to metres and print out the result. (1 ft = 0.3048 m). 3.2. Calculate the average of three numbers input at the terminal. 3.3. Write a program to calculate travel expenses. Request the payable rate (cents per kilometre), then the start and finish odometer readings and output the payable expenses in euros. 3.4. A winery produces N litres of wine per kilogram of grapes. Calculate (1) how many 50-litre barrels will be needed to store a certain weight of produce; and (2) the remaining volume in the last barrel (if not completely full). Request as input the yield N and the weight of fruit.

Chapter 4

Control of Flow

Abstract The methods of controlling and directing the flow of execution of a program are the main topics of this chapter. We first look at branching, which can be controlled by logical tests, or by pattern matching. Then we introduce the principle of iteration and the three types of loop constructs available in the C language. With this, we are able to start generating audio waveforms that can be displayed in graphs or played back after a minor conversion step. Computer programs normally require means of selecting statements (or blocks of statements) for execution while ignoring others, in order to provide more flexibility for developers. Straight-line code, such as the one employed in the previous chapter, is very rarely used. We also need to provide means of iterative (repeating) computation to implement loops, which are fundamental for certain applications. All of these aspects of programming are provided by control-of-flow constructs. In all of these, we will need to provide a decision procedure that will determine what gets executed. This is called a condition, which is made out of a logical expression.

4.1 Conditional and Logical Expressions Conditional and logical expressions are made up of operations that result in a binary outcome: they are either false (0) or true (1). Unlike arithmetic, they only evaluate to one of these two values. Thus they can be used to test a condition and provide a means of selecting the subsequent sequence of execution in a program. The basic operators in such an expression are called relational operators, >, <, >=, <=, ==, != evaluating a condition of greater than, less than, greater than or equal to, less than or equal to, equal to, or not equal to, respectively1 . The result of any of these operations 1

The equality operator is == and not simply =, which is used for assignment instead.

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_4

39

40

4 Control of Flow

is either 0 or 1. The first four operators in the list above have the same precedence level, which is higher than the next two, which also have the same priority. All relational expressions have a lower priority than arithmetic ones. To these conditional expressions, we can add a set of logical operators, which are used to combine one or more relational expressions. The two fundamental operations are AND and OR, denoted by && and ||, respectively. A truth table can be constructed for each one of them, indicating the outcome of the expression for two operands. Note that in C, while false (F) is only represented by 0, true (T) can actually be given by any non-zero value. Tables 4.1 and 4.2 show the truth values for each combination of operands.

Table 4.1: Truth table for AND. a && b

T

F

T F

T F

F F

Table 4.2: Truth table for OR. a || b

T

F

T F

T T

T F

The priority of logical AND is higher than that of logical OR. Both operators have less precedence than relational expressions. In addition to these two operators, we should also mention the unary negation operator !, which returns false for a true operand and true for a false operand. The result of a relational or logical expression has type int. The C language standard [24] also defines the type _Bool for uses where the value can only be 0 or 1, which could also be employed to hold the results of such expressions, known as Boolean expressions.

4.2 Conditional Execution The if() expression is the most basic means of conditional execution, allowing us to select one or more statements depending on the result of a logical expression: if(a < 0) printf("%d

is negative\n", a);

4.2 Conditional Execution

41

If the result of the test is false, then the program skips the execution of that particular statement. In general, the if() expression is defined as if(logical_expression) ...

? @

@

condition @

@ @

0 (false)

@ !0 (true)

? { ... }

 ?

Fig. 4.1: The if ... statement.

where what follows the expression is either a single statement or a group of them inside a program block2 (Fig. 4.1). To the single if() expression we may add a complement that will be executed when the condition defined by the logical expression turns out to be false: if(logical_expression) ... else ... where the else keyword labels the statement or block that will be executed when the logical expression evaluates to false (Fig. 4.2). For instance, if(a == 0) printf("zero \n"); else printf("%d is nonzero \n", a); We can also have an alternative form of the if() expression that checks for several alternatives (Fig. 4.3, with a catch-all none-of-the-above (optional) else at the end: if(conditional_expression) ... else if(logical_expression2) ... ... 2

We have already noted that blocks are defined by brackets.

42

4 Control of Flow

@

? @ @ 0 (false) condition @ @ @ !0 (true)

?

? { ... }

{ ... }

? Fig. 4.2: The if ... else ... statements.

else if(logical_expressionN) ... else ... Note that all of these expression can be nested inside others using block delimiters if needed.

4.2.1 Conditional Operator The if() expression has an operator version, whose result can be assigned to a variable, logical_expression ? true_expression : false_expression; the result of which is defined by the logical expression: if 1, it is set to the value of true expression, else it is set to the value of false expression. This can be used as a means of selecting a value to be assigned to a variable: a = b > c ? b : c; where when b is bigger than c, then b is assigned to a, otherwise c is (an example of how to select the maximum value of two inputs). This is equivalent to: if(b > c) a = b; else a = c;

4.3 Switch

43

? @ @

@

condition @

@

0 (false)

? @

@ !0 (true)

@

@

condition @

0 (false)

@ @ !0 (true)

? { ... }

? { ... }

? { ... }

? Fig. 4.3: The if ... else if ... else ... statements.

4.3 Switch The switch block is another example of conditional execution. Here we will have a series of discrete options defined by labels that will be compared with a value, if they are equal, then the program executes from that point. If no options match, it looks for a default label. The expression passed to the switch statement needs to evaluate to an integral type. Each label is composed of the keyword case followed by spaces and an integer constant and completed by a colon. The default case is defined by the keyword default. The break statement can be used to exit the switch block after the desired statement has been executed to avoid the execution from continuing on to the next statement (called a fall through). The most common form of this construct is switch(expression) { case constant1: ... break; case constant2: ... break;

44

4 Control of Flow

... case constantN: ... break; default: ... } For instance, we can use this mechanism to select the result in a multiple-option question: switch(i) { case 1: printf("option one selected\n"); break; case 2: printf("option two selected\n"); break; case 3: printf("option three selected\n"); break; default: printf("no selection\n"); } Note that it is perfectly possible to use switch statements that include legitimate uses of fall through. It is also possible to use multiple cases mapping to a single statement: switch(i) { case 1: printf("the selection is positive\n"); /* fall through */ case 2: printf("the selection is bigger than 1\n") /* fall through */ case 3: printf("the selection is bigger than 2\n"); break; case 4: case 5: printf("the selection is 4 or 5 \n"); break; default: printf("the selection is > 5 or < 1\n"); }

4.4 Iteration

45

4.4 Iteration In complement to conditional execution, it is possible to write programs whose flow of control produces iterations of the same computation sequence. This is enabled by two types of loop constructs, both of which will depend on the result of a logical expression following similar principles to those observed in the if() statement. Loops are essential for many applications. For example, all graphical user interface software such as those we commonly use on a daily basis will require some sort of loop to keep them open and ready to receive input from the user, otherwise they would eventually reach the end of the program statements and exit.

4.4.1 The while and do – while Loops The while loop will repeat a statement or block depending on the result of a logical expression: while(logical_expression) ... Effectively, it is a version of if() that will carry on executing until the condition becomes false (Fig. 4.4). If the logical expression is constant and true, the program will enter an infinite loop. If there are no other means of exiting the loop built into the program or the loop block, it may be hard to close the application. Thankfully, operating systems have means of signalling to a program to make it interrupt execution, so in most cases, this should not be an issue.

? @ @

@

condition @

0 (false)

@ @ !0 (true)

? { ... }

? Fig. 4.4: The while loop.

46

4 Control of Flow

The do – while loop has the following structure (Fig. 4.5): do ... while(logical_expression); which allows the program to execute the body of the loop (its statement or block) at least once before checking the result of the condition.

? - { ... } ? @ @

@

condition @

0 (false)

@ @ !0 (true)

? Fig. 4.5: The do – while loop.

The iterations of a loop are generally controlled by a variable that will make the logical expression false at some point. A typical way of doing this is to use a counter that can control the number of iterations. This will keep track of how many repeats the program has gone through and exit the loop at the expected time. For instance, int cnt = 0; while(cnt < 10) { ... cnt = cnt +1; } will iterate ten times and then exit. The expression cnt = cnt +1 can be understood as taking the value of the variable, adding one to it and storing it back in the same place. This is called an increment (by one). It is so common that two shorthand forms exist, one with a prefix operator, and the second with a postfix one: ++cnt; // prefix increment cnt++; // postfix increment

4.4 Iteration

47

The difference between these is that while ++cnt increments the variable before using its value, cnt++ will use the value of the variable, and then increment it. This only has an impact if we are using the value (assigning or checking it). For instance: int cnt = 0; while(++cnt < 10) { printf("%d \n", cnt); } will print the numbers 1 to 9, whereas if we had used cnt++, the printing would go one step further, to 10, as the check would be made before the variable was incremented. Decrement operators (--) can also be used in a similar way. Postfix operators have a higher precedence level than prefix ones, which themselves have higher priority than normal arithmetic expressions. Assignment operators, += and -= can also be used for increment or decrement. They have a right-hand side step value or expression (e.g. cnt+=2 for an increment of 2). Such operators have a lower priority than the relational and arithmetic expressions, so they need to be placed inside parentheses if we want to check their value correctly. Similarly, we have a *= b (for a = a * b), as well as a /= b and a %= b. In addition to counting variables, there are other ways of controlling a loop that can be used. We could request the user to enter specific values via scanf(), which are then checked for a given condition. We could also examine the value of an arithmetic expression and trigger a new iteration based on it, and so on.

4.4.2 The for Loop Given the widespread use of counting variables in loops, a specialised version is available to facilitate this use. The while loop cnt = 0; while(cnt < 10) { ... cnt++; } can be implemented in the compact form of the following for loop: for(cnt = 0; cnt < 10; cnt++) ... As with conditional execution statements, loops can be nested within the body of other loops. This is particularly useful if we have to execute repeated operations for each operation of a loop (for instance, to trace a two-dimensional figure).

48

4 Control of Flow

4.4.3 The break and continue Statements As we have seen before, the break statement makes a program to exit a block from anywhere within it. It can be used as means of exiting a loop in the middle of its body if we require it. In addition to this, loops can avail of the continue statement, which is used to jump directly to the logical expression evaluation from anywhere in a block, skipping any statements after it.

4.5 A First Synthesis Program With loops and branching, we can write programs that do a lot of work with only a few lines. This allows us to have our first go at sound synthesis. The principle is very simple: we will generate a sequence of numbers that can be interpreted as a digital audio signal. When we do that, we will hear a tone. So let’s approach this in parts. First we will write a program to print a series of numbers to the terminal. This sequence will have a repeating pattern: every now and then it will look the same. Each repeated set of numbers is called a period, and if we interpret this series of numbers as a signal, we have a periodic signal. The pattern we will create first is a ramp, numbers that will increase from zero to a maximum. We can do this by using the modulo operator in a loop: while(n < END) { s = n % max; n++; } This is the core of our synthesis program. Let’s complete the rest around it and call the resulting executable saw: #include <stdio.h> #define END 44100 int main(){ unsigned int n = 0, max = END/441; float fmax = (float) max, s; while(n < END) { s = (n % max) / fmax; printf("%f \n", s); n++; } return 0; }

4.5 A First Synthesis Program

49

Note that we have made sure the numbers are output as floats in the 0 to 1 range. This will facilitate the later translation into a digital signal.

4.5.1 Plotting the Waveform If we run this program, we will see the following pattern at the terminal: a series of floating-point numbers moving from 0.0 to close to 1.0, repeatedly: $ ./saw 0.000000 0.010000 0.020000 ... 0.980000 0.990000 0.000000 0.010000 Now we can interpret this as a digital audio signal. In doing so, we can, for instance, plot the waveform it produces. A simple graphic display can be made with a separate standard IO program, which can feed off the data we produced with the saw program. For this purpose, we introduce two important concepts of shell operation: 1. Redirection: the output of printf(), i.e. the standard output, or stdout, is normally directed to the terminal screen by the shell. We can redirect it to a different destination, for instance to a file in the FS, which will be filled with the contents produced by printf(). To do this, we use the output redirection symbol > after the program name, and the name of the file after that. For instance, $ ./myprog

> output.txt

Likewise, the input to scanf(), i.e. the standard input, or stdin, normally comes from the terminal, but we can redirect it from a file. The process is similar: we use the input redirection symbol < to take the input from a named file: $ ./myprog

< input.txt

So we could write a program to plot this output as a waveform, vertically, on the terminal, using this principle (let’s call it plot): #include <stdio.h> #include <math.h> /* round() is declared here */ int main(){ float sample;

50

4 Control of Flow

int i = 0, s, nsamp = 0; do { i = scanf("%f", &sample); /* read sample */ s = (int) round(sample * 100); /* scale it */ printf("[%5d]", nsamp++); /* sample index */ while(--s >= 0) printf("-"); /* plot the value */ printf("*\n"); } while(i != EOF); return 0; } This program scans the standard input for float samples and then prints an equivalent number of dashes to the terminal, terminating the line with an asterisk. Each line also receives the corresponding sample index as a time reference. Note that, in order to keep the plot aligned, we print enough spaces to hold up to 5 digits (by setting the field width to 5 in the formatting string, "%5d"), since the biggest index we will print, 44100, contains 5 digits. The program checks for a special end-of-file code (the constant EOF), which is returned by scanf() once the stream of characters is finished3 . With this in hand, we can now produce a simple plot of the waveform: $ ./saw > wave.txt $ ./plot < wave.txt [ 0]* [ 1]-* [ 2]--* [ 3]---* [ 4]----* [ 5]-----* [ 6]------* [ 7]-------* [ 8]--------* [ 9]---------* [ 10]----------* ... While this is not a standard way of plotting data, and the program can only cope with non-negative numbers, it is about the best we can do at the moment. In Chapter 6 we will develop a better terminal plotting program. 2. Pipes: in addition to redirection, we can send the standard output of one program into the standard input of another using the pipe symbol |: $ ./saw | ./plot 3

The EOF condition can also be signalled to a program by typing the ctl-d key sequence at the terminal.

4.5 A First Synthesis Program

51

These same principles can be applied to more advanced plotting programs, such as gnuplot. For example, the waveform graph in Fig. 4.6 was created from the data produced by the saw program using the following command line: $ ./saw | gnuplot -p -e "set xrange[0:400]; \ plot '-' with lines"

Fig. 4.6: The sawtooth waveform generated by the saw program, as produced by gnuplot.

This pipes the output of saw to gnuplot, with commands to create a line plot using the first 400 numbers taken from the standard input4 . This particular gnuplot command is fairly general-purpose oriented, and can be used with any single-column standard input data.

4

For further information, see http://www.gnuplot.info/.

52

4 Control of Flow

4.5.2 Playing the Sound Since our program generates an audio waveform, we can just as easily listen to the sound it produces. To do this, we have to first convert the numbers from text (ASCII) to a binary encoding, place them into a file and then open that file with a sound editor. The following are the steps to run the synthesis program, perform the text-to-binary conversion, and produce an audio file for listening: 1. The conversion is done by another program, tobin.c5 , which we compile as tobin. 2. We connect the output of our synthesis program, let’s call it saw, to the input of tobin using a pipe (|): $ ./saw | ./tobin 3. We redirect the output of tobin from stdout to a file (e.g. output.raw) using >: $ ./saw | ./tobin

> output.raw

4. We import the file as raw data into the sound editor, with the encoding set to 32-bit floating-point data, the sampling rate to 44100, and channels to 1. What we have done in the last step is the interpretation of the sequence as making up an audio signal with 44100 samples (numbers6 ) in one second, containing one channel of audio, with each number to be read as a 32-bit float with little-endian byte order7 . So, the 44100 numbers we generated will constitute a 1-second tone, whose frequency is going to be 441 Hz8 (because the ramp pattern is repeating every 100 samples, there will be 441 periods, or cycles, in one second). This is a very simple digital sawtooth wave [36].

4.5.3 Other Waveforms If we replace the synthesis loop for this: float s = 1.f; while(n < END) { if((n % max) == 0) s *= -1.f; printf("%f \n", s); n++; } 5

We will study this code in Chapter 9, where you can find the source code for it. A sample is the name we give to each individual element of a sequence that represents the digital audio signal. 7 We are assuming this is being built and run in a little-endian architecture, such as the x86. 8 1 Hz = 1 cycle per second [36]. 6

4.6 Conclusions

53

we can generate a digital square wave. When this is played back, note that the pitch will have dropped by one octave. This is because the square wave we generated has a period that is twice the size of the original sawtooth. Note that the loop alternates between −1 and 1 every max samples, so the whole cycle takes twice the time to complete. Problem 4.4 prompts you to think about how you could generate another one of these waveforms based on simple geometric shapes.

4.6 Conclusions This chapter has introduced some important concepts of structured programming, such as conditional execution and loops. We are now at the stage where we can create programs that generate sequences of numbers which can be interpreted as digital audio signals. This is a significant development. To build on it, we will move on to a deeper level, where we can manipulate the program memory and compute larger blocks of data. This will be the topic of the next chapter, where we will encounter another set of fundamental programming concepts.

Problems 4.1. Write a program to read in three numbers and write the smallest. 4.2. Travel expenses are paid as follows: 15c per km for cars up to and including 1.5 litre engines; and 20c per mile for cars with engines above that size. Write a program to calculate travel expenses which takes as input the car engine size and the distance travelled. 4.3. Add N input numbers and write out the result. Ask for the number of inputs (N) first. 4.4. Write a version of the synthesis program that can generate a triangle wave.

Chapter 5

Arrays and Pointers

Abstract This chapter introduces the principles behind the composite data types called arrays. It discusses their memory layout and how to manipulate them. We then introduce the more advanced topic of memory addresses and pointer variables, showing how they relate to arrays. Finally, strings are presented as a special kind of character array. The chapter concludes by exploring ways of manipulating string variables. In this chapter, we will look at how we can create lists or sequences of the various built-in data types, called arrays, and manipulate memory addresses through the specially-defined pointer variables. These objects will be fundamental to many of the sound and music computing applications we will be working with throughout this book. In particular, they will allow us to access contiguous blocks of digital audio data, which will be essential for all synthesis and processing techniques.

5.1 Arrays All the variables we have so far used have been able to store only a single value (of a given type). In many applications, however, it is to common to group a whole block of data together, so that we can store multiple values of a certain type. In order to do this, we introduce the concept of arrays. For example, let’s say we would like to hold ten integers together. The following declaration int numbers[10]; declares an array called numbers with ten elements. The general form of an array declaration is type name[size];

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_5

55

56

5 Arrays and Pointers

where the variable name is an array of type type and contains size elements, which in general needs to be a constant expression1 . Arrays declared in this way are not initialised, and might contain garbage. We can initialise them using the following notation: int numbers[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; where each integer literal will be stored in the array with respect to the order in the initialisation list. In fact, with such a list, we do not need to declare the array size directly, as it will be implied by the number of items in it: int numbers[] = {1, 2, 3, 4, 5}; Alternatively, we can also initialise members out of order using designators, which are indices declared inside square braces, as in int numbers[5] = { [4] = 1, [1] = 2 }; and, in general, initialiser lists (whether using designators or not) may be incomplete. Once we create an array, we can use the array indexing notation to select individual items, e.g. a[n], where n can be an integer variable or a constant. Array indices are zero-based; that is, the first element is in index 0, and the last in size-1 (Fig. 5.1). Arrays are stored as contiguous memory locations; thus the indices are used to select a given offset from a start location in memory. We should never try to access data beyond the end of an array, as this can lead to undefined errors or segmentation fault during execution. This is a different problem from a syntax or compilation issue, which is caught when we are trying to build the program, and it can be more difficult to fix. It is important to know that the C compiler does not check for these mistakes, so the programmer should always be aware of them. a[0]

a[1]

a[2]

a[3]

a[4]

1

2

3

4

5

Fig. 5.1: A graphic representation of the array int a[5] = {1, 2, 3, 4, 5}.

Arrays can be manipulated very easily via loops. In particular, the for loop is well suited to accessing values in them. For instance, we use for (i=0; i<10; i++) a[i] = i+1; to fill the array with 1, 2, ..., 10. 1

The C language standard [24] defines the concept of a variable-length array (VLA), which can optionally be implemented (but is not strictly required). The C++ language does not define these. To avoid compatibility problems with some compilers and a better forward link to C++, we should always treat array sizes as constant.

5.2 Strings

57

5.1.1 Two-Dimensional Arrays Two-dimensional arrays are also possible in C, where, instead of a single row of elements, we have several, stacked. They are declared by passing the number of rows and columns required: type name[rows][columns]; For example, if we want to create a 10 by 10 array, we use int matrix[10][10]; which is a 10-element array of a 10-element array of int. Thus, two-dimensional arrays are initialised and accessed in row-column order. For instance, a 2 by 3 array is effectively an array of two elements, each of which is itself an array of three numbers: int mat[2][3] = {{0, 1, 2}, {3, 4, 5}}; where mat[0][0] = 0, mat[0][1] = 1, mat[0][2] = 2, mat[1][0] = 3, mat[1][1] = 4, and mat[1][2] = 5. In these cases, we have an array of arrays of a given fundamental type. Likewise, higher-dimensional arrays are also defined by the C language standard [24], and can be declared by the presence of further array of specifications.

5.2 Strings We have seen that in C, text is kept in strings. The underlying storage of this pseudotype is an array of type char. The convention employed in the language is that each string is terminated by a null character ('\0'). This is automatically added by the compiler in the case of a string constant. For example: char

s[2] = "a";

where we note that the array has two elements. The first one is initialised by the character 'a', whereas the second holds '\0'. Notice also that we are using double quotes to declare a string constant, which means it will always contain one extra character in addition to those inside the quote marks (Fig. 5.2). Remember that a single character constant will always be declared insingle quotes. s[0]

s[1]

s[2]

s[3]

s[4]

s[5]

'H' 'e' 'l' 'l' 'o' '\0' Fig. 5.2: A graphic representation of the string char *s = "hello".

58

5 Arrays and Pointers

Since strings are arrays, we have to be careful when manipulating them. We cannot assign them directly and expect that all of its elements will be magically copied from one character array to another. It is possible, however, to initialise a character array with a constant string, as we have seen above. In order to manipulate them, we will need to access each character individually. As we will see later, there are a number of C library functions that do just that, copying, checking, duplicating, and concatenating strings.

5.3 Pointers We have seen that a variable declared in a program reserves a certain storage space to hold a value of a given type. We give this memory location a name and we can proceed to assign and retrieve data to/from it. The OS organises all such locations through unique addresses that are used to identify them. We have seen that the address-of operator (&) can be used to obtain this location reference so that some functions such as scanf() can place their results directly there. This is complemented in the C language by the ability to store such addresses in special variables called pointers. These are not the usual regular types we have seen before, but ones designed to hold the memory location for a variable of a given type. We declare them by using an asterisk. For instance: int *p; is a pointer to an address that can hold an integer. The position of the asterisk in the declaration is not important; we can equally well use int* p or int * p, as long as there is an asterisk somewhere after the type name. In any case, the variable p we declared is not pointing to anything, because it has not been initialised or assigned to a location yet. We can initialise it with a given memory address: int n; int *p = &n; where the pointer variable p is initialised to the address of the variable n. Once a pointer is holding a memory location, we can access it using the dereferencing operator (also known as indirection), as illustrated in Fig. 5.3. This is represented, again, by an asterisk, but now with a different meaning. A * placed in front of a pointer accesses the value stored in the memory address that is held by the variable (rather than its contents, which are the address itself). For example /* n is declared and initialised with 10, k is just declared */ int n = 10, k; /* a pointer p is initialised with the memory address of n */ int *p = &n; /* k is assigned the contents of n, 10 */

5.3 Pointers

59

k = *p; /* the contents of n now hold 12 */ *p = 12;

NN

LL p

NN

n

10

6 Fig. 5.3: A pointer int *p = &n, whose contents are the address of n (NN). The pointer memory address is LL. The arrow indicates the indirection operation, which yields the value of n, 10.

Note that the asterisk has two functions in this context, which have slightly different semantics. These are some basic principles of pointer syntax: 1. When declaring a variable, it marks the variable as a pointer to a data type: int *, float *. It is possible to have a generic pointer with type void *, when it is not clear what pointer type we need. However, in this case we cannot do much apart from storing an address. It cannot be used to access data because we do not know the size of the allocated space we are pointing at. 2. When using a pointer variable, it dereferences it, so that the expression refers to the address pointed at. In this case, it is a unary operator applied to variable or expression to the right of it. 3. Precedence and associativity: the unary dereferencing * operator binds to the expression to its right, and has a higher priority than all binary arithmetic operators (+,-,*,/, %). It has the same priority level as the prefix increment/decrement, but lower priority than postfix increment/decrement. The asterisk is also the symbol used as the multiplication sign. However, this is a binary operator, applying to both sides. Care needs to be taken not to confuse the two. Parentheses could be used to clarify their application, if they happen to appear together in an expression. Pointers can be assigned to other pointers, in which case, as indirection is not involved, we are assigning memory addresses: int int int q = p =

n, k; *p = &n; *q; p; /* q now points to n */ &k; /* p now points to k */

Location identifiers are just unsigned integers (of a given size, dependent on the OS and hardware). However we try not to confuse the two and always keep addresses as pointers, and never as integers. The pointer type is also very important as it tells the compiler the size of the memory location it is holding. This is important in some operations, such as those involving pointer arithmetic, which we will study below.

60

5 Arrays and Pointers

5.4 Pointers and Arrays A variable declared as int a[10]; is also a constant pointer to the start address of the array memory; thus, in practice, a and &a[0] are the same. As it is constant, we cannot point it to anywhere else. But we can use it like any other pointer. We can dereference it, for instance, to obtain the value of the first position of the array (e.g. *a), and apply offsets, as described in the next section. Similarly, a two-dimensional array can be described as a pointer to a pointer, having two levels of indirection. Dereferencing it once will yield a pointer, and dereferencing it twice will give us access to the actual data stored in memory.

5.4.1 Pointer Arithmetic We can manipulate pointers through integer arithmetic operations. For example, let’s say we have the following declarations: /* an array */ int a[5]; /* p is pointing to the start of the array */ int *p = a; We have already seen that we can use indexing to access an array, so in this case p[n] and a[n] are exactly equivalent. However, given that p is a variable, we can then manipulate this pointer to access the different locations in the array (Fig. 5.4). a

?

a+1

a+2

a+3

a+4

a+5

?

?

?

?

?

Fig. 5.4: The array a[5] and the various offsets to the location of each of its elements. Note that the memory range of this array is [a, a+5), and thus a+5 points to an address outside it.

The following operations are commonly used: • +: adding an integer moves the pointer a certain number of memory positions ahead. For instance, p = a + 4;

5.4 Pointers and Arrays

61

places p 4 int-size memory locations beyond a. This means that after this *p and a[4] will yield the same value. • -: likewise, subtraction will move the pointer a number of locations backwards. • ++, +=, --, +=: these are also used to increment and decrement pointer positions. This means that we can access a given space in memory using either pointer arithmetic or array indexing. For instance, for(i=0; i
62

5 Arrays and Pointers

(i)

0

1

2

3

4

5

a[0][0] a[0][1] a[1][0] a[1][1] a[2][0] a[2][1] (ii)

0

1

2

3

4

5

Fig. 5.5: Two graphic representations of the array int a[3][2] = {{0, 1},{2, 3},{4, 5}}: (i) two-dimensional row-column arrangement; (ii) flat memory layout.

p

?

*a

0

1

2

3

4

5

Fig. 5.6: Graphic representation of int *p = *a, for the array a in Fig. 5.5.

int b = *(++pp); /* c is assigned 4, NB: double dereferencing */ int c = **(a+2); /* pp is an array of two-int arrays, NB: parentheses */ int (*pp)[2] = a; /* d is assigned 5, pp[2][1] */ int d = *(*(pp+2)+1); The final three lines require some further explanation. We need to double dereference the pointer expression because it is a pointer to a pointer of int. Since a is an array of arrays of two int elements, it can be represented by a pointer, the type of which is int (*)[2]. We can assign a to a variable pp of this type. The variable declaration needs an extra set of parentheses around the name, otherwise the type would be int *[2], which is an array of two pointers to int. It can then be accessed as normal through array indexing, or through dereferencing, as shown

5.4 Pointers and Arrays

63

in the last line. For that, we need to first dereference to get the pointer to the row, and then dereference again to get the element value at the desired column.

p+1 (i) *a

?

0 -1

2

3

4

5

3

4

5

3

4

5

4

5

p (ii)

?

*(a+1)

0 -1

2 b = *p

(iii)

?

0 -1

2

c = **(a+2) (iv)

?

0 -1

2

3

Fig. 5.7: Graphic representation of: (i) *(p+1) = -1; (ii) *p = *(a+1); (iii) b = *(++p); and (iv) c = **(a+2), using the array a in Fig. 5.5.

5.4.2 Pointers and Strings As we have already seen, strings are arrays of characters. This means that we can use a char* variable to refer to strings in a program in a convenient manner. For example, we can declare a pointer and initialise it with a string literal without necessarily needing to set aside a certain amount of memory in an array (since the compiler will deal with allocating the space for the constant): char *string = "Live Long and Prosper.";

64

5 Arrays and Pointers

In this case, since string is a variable, it can be pointed somewhere else to a different address at a later time. However, as long as it is pointed to a string literal, we cannot modify its contents, since that memory location is read only. To avoid problems, we might like to mark the variable as such. This is done with the addition of the const keyword to the variable declaration, for instance const char *string = "Live Long and Prosper."; This means that the string pointed by the variable cannot be modified (since the pointer is to a const char). However, the pointer variable itself is not constant, and can be reassigned. In this case, whatever address the variable points to cannot be modified via this variable. Note, however, that char string[] = "Live Long and Prosper."; is a different case. Here, we have declared a char array with enough memory to hold the characters in the string literal (including the terminating NULL character), and initialised it. We are able to modify the contents of that array, but we cannot point string elsewhere, as it is a constant pointer. While initialising a string to a literal is one of the basic operations we can do with strings, it is clearly not enough to fully manipulate text. If we were, for instance to copy a string from one place to another, the code to do that would look like this: const char *src= "Live Long and Prosper"; char dest[30]; /* enough space for 29 characters */ int i = 0; do { /* bounds check */ if(i == 29) { dest[i] = '\0'; break; } dest[i] = src[i]; } while(src[i++] != '\0') where we copy each character until we find a NULL or we reach the end of memory (note the use of the postfix increment operator). To facilitate these types of operations, a number of subroutines are offered by the C library under the header string.h: • strlen(char *s): checks for null terminator and returns the number of characters in the string (excluding the terminator). • strcpy(char *dest, const char *src): copies strings, from a source src to a destination dest, which needs to contain enough space for the full string to be copied. The strncpy() variant allows for a limit on the number of characters copied and is therefore safer. • strcat(char *dest, const char *src): concatenates two strings into dest (which needs to have enough space for the resulting string). Similarly, strncat() is its bounds-checking version.

5.5 Conclusions

65

• sprintf(char *dest, const char *fmt, ...): prints a formatted string to a destination. This is a version of printf() that outputs to a string instead of the standard output. A snprintf() variant is available, which checks the size of the destination memory.

5.5 Conclusions In this chapter, we have focused on data types that can hold sequences and how we can manipulate these. We have also explored some fundamental issues of memory access by introducing the idea of pointers, which are data types that operate at a lower level. They are, however, widely used in C programming, as they provide a level of flexibility that makes this language very well suited to sound and music computing. Complementing this, we will see in the next chapter, another key aspect of structured programming, subroutines, in which pointers also have an important role to play.

Problems 5.1. Write a program to read in ten integers and write them in reverse order. Use loops to read and write the numbers. 5.2. Sort a sequence of ten integers in ascending order. Here’s a simple sorting algorithm for a sequence of N elements [17]: (a) Find the location I of the largest element from A[0] to A[N-1]. (b) Interchange A[I] with A[N-1]. (c) Decrease N by 1. (d) If N == 0 finish else repeat from (a).

Chapter 6

Functions

Abstract In this chapter, functions are introduced as the fundamental organising element in the C language. Topics related to their definition, argument passing, and call semantics are presented first. This is followed by a discussion of the principle of recursion. The paradigm of modular programming as implemented in C is discussed. The standard C library is introduced, allowing us to develop a sine wave synthesis program. Finally, we develop an ASCII-based terminal plotting function as an example of the ideas presented in the chapter. In C programming, the concept of a function may or may not conform to the mathematical sense, which is narrower. Here, a function is better equated to the idea of a subroutine, that is, a self-contained section of code that can be invoked by other parts of the program. Together with control of flow constructs, subroutines support a paradigm called structured programming, which is a fundamental form of computer programming.

6.1 Function Definition A function is defined by four elements: 1. Return type: the type of the result returned by the function. If the function does not return anything, it can be set to void. 2. Name: a symbolic name that will identify the function. Similarly to variables, it needs to start with a letter or an underscore character. 3. Argument list: inside parenthesis, we have a list of arguments, each one with a type and a name, multiple arguments are separated by commas. Once declared all arguments become local variables for the function. 4. Body: inside brackets, this is the code block for the function. The function exits if it finds a return statement or the end of the block.

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_6

67

68

6 Functions

We have seen all of these elements in the definition of the main() function in the programs encountered so far. In addition to this, we can define other functions that we can call whenever we need to. A simple example is given by this version of the shin program: void shin() { printf("Live Long and Prosper.\n"); } int main() { shin(); return 0; } In this program, main() delegates all of its work to shin(), which calls printf() to display the message. Since the function does not have a result, it is defined with void as return type, and it does not necessarily require a return statement. This also means that it would be a syntax error to try to assign the return value of this function to a variable. In some cases, where an early exit is required, we can use the return keyword, with no arguments as the function has been declared with the void type. The function also has no inputs, so its argument list is empty. Alternatively, it could also have been declared void: void shin(void) { printf("Live Long and Prosper.\n"); } Note that functions are always defined in global space, outside any other functions. The C language standard does not allow for local functions that are placed inside the body of other functions (including main()).

6.1.1 Arguments We can pass parameter values to a function via its arguments. For instance, this is a simple function taking two integers: int sum(int a, int b) { return a + b; } This calculates the sum of its parameters. If we want to use it, we call it by passing integer constants, variables, or expressions. For example, we can call sum() in a = sum(b,2); where a variable and a constant are used,

6.1 Function Definition

69

a = sum(x+y,z*w); where two expressions involving four different variables are used, and a = sum(sum(a,b),3); where the result of a function call and a constant are used.

6.1.2 Variable Lifetime As we have already seen, variables are treated as auto (automatic) by default. In a function, this means that all local variables, including its arguments, which are also local variables, will come into being as the function is called and cease to exist once the function exits. An exception to this is if we explicitly mark a variable as static. In that case, the variable will persist between calls, which also implies that a single instance of it exists for all separate invocations made to the given subroutine. This specific use of the static keyword should be avoided if it all possible.

6.1.3 Call Semantics In C functions, all arguments are passed by value. A local variable is created in the function and the value of the input is copied into it (for each argument). Since the local variable disappears once the function has finished executing, this means that the function cannot modify the values of variables that are passed to it as arguments. However, if we are using pointers, since the addresses are being copied rather the actual contents, we can modify variable values indirectly. Consider the following case: void reset(int *p){ *p = 0; } This function will reset to 0 the value stored in a memory address that is passed to it. This means that a variable that is held outside the function can be modified through a pointer. The following program demonstrates this: #include

<stdio.h>

int main(){ int a = 0; printf("value of a: %d\n", a); // prints 0 a = 1; // assign 1 to a printf("value of a: %d\n", a); // prints 1 reset(&a); // pass the address of a to reset()

70

6 Functions

printf("value of a: %d\n", a); // prints 0 return 0; } Arrays, as they have some equivalence to pointers, are also passed through memory addresses. For instance, let’s extend the function reset() to work on an array: void reset(int *p, int size){ while(--size>=0) *(p++) = 0; } In this case, we also need to pass the size of the array, as it cannot be inferred directly (only strings are terminated by an ASCII null, ordinary numeric arrays are not). Note that since p is a pointer to int, we can apply pointer arithmetic to it, or alternatively we could have used array indexing (p[size] = 0, for example). On the other hand, if we had declared the function as explicitly taking an array, as in void reset(int a[], int size){ while(--size>=0) a[size] = 0; } then we would have been able to move the pointer about, as it is a constant pointer. In this case we would have to use array indexing. In any case, we recommend always using array indexing, which is less confusing. In summary, an argument type *var is a pointer that can accept any memory address, including that of an array. An argument type[] var explicitly takes in an array as a constant pointer (to the start of it).

6.1.4 Function Prototypes A function prototype is the declaration of its return type, name and argument types. For example, int sum(int, int); tells the compiler that there is a sum() function somewhere in the code that takes two integers and returns an integer. Once this is declared, we can use the function in our code. The definition needs to exist somewhere, either in the same source file where the declaration was placed, in a different file, or in an external library, precompiled. If it is not defined anywhere, the linker will issue an error (‘symbol not found’).

6.1.5 Parametrised Macros and Inline Functions The C language preprocessor supports the use of macros containing parameters, which in some cases can be used as an alternative to function definitions. The funda-

6.1 Function Definition

71

mental difference is that macro declaration and use is a text-replacement operation, rather than a function call. This means that code gets inserted at the point where the macro is used. The macro definition is done as we have seen before, through #define, but now it features one or more parameters given as a comma-separated list inside parentheses. For instance, #include <stdio.h> #define SUM(a, b) a + b int main(){ printf("%d \n", SUM(2,2)); return 0; } where the macro instance SUM(2, 2) gets replaced by 2 + 2. We should note that this replacement is strictly textual, so if we were to use it with two variables, say i and j, SUM(i, j) would yield i + j. The macro definition is delimited by a newline (not a semicolon), and we should be careful not to add a semicolon to the definition if it is not needed. In the case above, it is clear that adding one would not allow us to use it inside printf(), but we can always use the same definition in a well-delimited statement such as a = SUM(b, c);. Macros of this type can have more than one statement, provided that the macro declaration itself is in a single line. To allow a code line to extend over several lines (for visual or formatting purposes), we can use a backslash character (\), as already discussed in Sect. 2.2.2: #include <stdio.h> #define SWAPINT(a, b) { \ int tmp = a; \ a = b; \ b = tmp; } int main(){ int a = 1, b = 2; printf("original: %d %d \n", a, b); SWAPINT(a, b) printf("swapped: %d %d \n", a, b); return 0; } Note that there is no need to place a semicolon after SWAPINT(a, b), since the final statement of the macro already includes one. Also, the brackets are only added to the macro definition to allow the code to compile in the pre-C99 standard, where variable declarations could only happen at the start of a block. This also ensures that tmp only exists inside that block, which might prevent variable name clashes. We can use these types of macros for any useful text replacement. For instance, #include <stdio.h>

72

6 Functions

#define CAST(type, var) (type) var int main(){ int a = 1; float f = 2.0; printf("%f %d \n", CAST(float, a), CAST(int, f) ); return 0; } allows us to pass in a type and an object to the macro (something we cannot define for a function, for instance). A similar but not exactly identical mechanism of substitution is defined in the C standard [24] by the inline function specifier. The standard determines that if this is present, the compiler is suggested to make the call to the function as fast as possible (and the extent of this is implementation defined). This is usually done through inline substitution, where the body of the function replaces the function call at the compilation stage (not at preprocessing). This mechanism can provide some gains in performance (by eliminating the calls), at the expense of executable file size.

6.1.6 Variable Argument Lists It is possible to define a function with a variable list of arguments. In this case, we will use an ellipsis delimiter (...) in place of an argument. This is required to be in the rightmost position in an argument list (if there are other arguments). Once we define this, we will need to use some definitions from stdarg.h to access each argument: • va_list is the type holding a variable argument list. • va_start(va_list ap, parmN) is a macro that initialises an argument list, and parmN is the name of the rightmost parameter before the variable argument list. • type va_arg(va_list ap, type) is a macro that retrieves each successive argument in the list. • va_end(va_list ap) is a macro used to close the argument list access operation. The following example demonstrates how to access each argument in a variable argument list. Note that it is important that the number and type of arguments passed to the function are known: #include <stdio.h> #include <stdarg.h> void func(int n, ...) {

6.2 Modular Programming

73

va_list ap; int i; va_start(ap, n); for(i = 0; i < n; i++) printf("%d", va_arg(ap, int)); va_end(ap); printf("\n"); } int main() { func(2, 1, 2); func(3, 1, 2, 3); return 0; }

6.1.7 Recursive Calls Although not a common practice in C programming, the use of recursion is supported. This takes place when a function is defined to call itself to repeat some computation. It is normally the main means of iteration in some other programming environments, and for certain applications, it allows for very elegant and compact code. A typical example is the computation of a factorial, which can be defined recursively as !N = N×!(N − 1), with !0 = 1. This can be implemented using recursion, by splitting the base case (for N = 0) from the rest: unsigned int fact(unsigned int z) { if(z == 0) return 1; /* base case */ else return fact(z-1)*z; /* recursion */ } If we unwrap the calls, we will see that the function will recurse until it gets to the base case, and then go back executing the multiplication part until it exits the first call (Fig. 6.1). While this is a very compact way of implementing an algorithm, it does not always compile to the fastest code. In this particular example, it is more efficient to calculate the factorial of a number using a loop if it is very large. However, there will be applications in which recursion is the correct method to use.

6.2 Modular Programming In C, each source file is treated as a separate translation unit. This means that some functions (and their local variables) and global variables can be made to be internal

74

6 Functions !5 - fact(5)

? fact(4)

? fact(3)

? fact(2)

? fact(1)

?

fact(0) -

return 24*5 - 120

6 return 6*4

6 return 2*3

6 return 1*2

6 return 1*1

6 return 1

Fig. 6.1: Recursive factorial calculation.

to that unit, which we can call a module, and hidden from the rest of the program code. Conversely, we may be able to open up access to some of the functions that can provide an interface to the module. In this way, we can separate different components into separate source code files and provide a means of accessing these through a set of interfaces. This may be a useful strategy for mid- to large-size projects. While so far we have only been using a single source file for the whole of our program, it is very common for code to be split into separate files that are compiled and then linked together to make up the software (Fig. 6.2). In practice this is what we need to do: 1. In a given source file, mark all functions that are only accessible locally as static 1 , e.g. static int my_local_func(int a, float f) { ... } where the static keyword prevents external access from outside the module by making it invisible to the rest of the program. Likewise, any global variables, that is, those declared outside functions, are marked static to make them accessible only within the file (that is, their scope is the file). For these purposes, the source file is understood as the translation unit controlling the scope of objects that are internal to it. 2. Functions that make up the interface to the module, i.e. those that will open up the functionality to the rest of the program should be declared in a header file that can 1

Note that this is a slightly different use of the keyword from the one we have encountered before.

6.3 Pointers to Functions

75

be included in another file2 . Additionally, if we need to make any global variables accessible outside the translation unit, they should not have the static type qualifier, and need to be declared as extern in the outside module where they are accessed. However, we should in general avoid using global variables of any kind, preferring instead to pass values cleanly as parameters to functions rather than accessing them directly from global variables if at all possible.

interfaces

-

module1.h

module2.h

implementation -

#include "module1.h" #include "module2.h" module1.c

module2.c

main.c

program Fig. 6.2: Modular programming.

A note about the #include statement: you might have noticed that all header files included from the C library (e.g. stdio.h) are enclosed by angle brackets (< and >). This is the common procedure when including headers that have interfaces to the external libraries we are using. The compiler will search for them in standard locations, as well as in directories we pass to it using the optional -i flag. For header files that accompany our own source code and are located in the same directory as the implementation code, we should use double quotes instead. That will indicate that the file should be searched for locally. Thus, for a module.h used by a source file in the same directory, we should include it with the line #include "module.h".

6.3 Pointers to Functions In C, functions and numeric data are distinct, so we cannot assign a function directly to a variable3 . However, it is possible to use pointers to refer to subroutines, and 2 Alternatively, they can be declared in parts of the program that will call them. The keyword extern can be used, but it is not necessary, as functions have external linkage by default. 3 This is possible in some languages, for instance, those where the functional paradigm is implemented.

76

6 Functions

assign these to other pointers. These are called pointers to functions. In fact, just as the array variable name is a constant pointer to the start address of the array, a function name is a constant pointer to a given subroutine and can be treated as such. A function pointer declaration is a little convoluted: type (*pointer_name) (arguments); declares a pointer called pointer_name, which can be used to store the address of a function with a given type as its return type and arguments as its argument types. It can only store a function with that prototype (any other will make the compiler barf). For example, int (*func)(int, int); is a pointer to a function with two int arguments that returns an int. We could for instance, assign an existing function to it, for instance, sum(), defined earlier in the chapter: func = sum; We could use them in a program: a = func(b,c); The most common application of these ideas is to employ function pointers as arguments to other functions. Consider the following example. We would like to process the contents of two integer arrays, element by element in various ways: adding, subtracting, multiplying, taking the maximum or minimum of the two, and so on. This involves repeated application of a function that takes two int parameters and returns another int as a result. To implement this, we could design a subroutine that takes four inputs: two arrays, their length and a function to process them. The output can be kept in place in one of the arrays or placed in another. Let’s use the first option (in-place processing). Here is the code: void process(int *data1, int *data2, int len, int (*func)(int, int)) { int i; for(i = 0; i < len; i++) data1[i] = func(data1[i], data2[i]); } This code takes care of the function application. To use it, we have to pass the arrays, length and an existing function to do the job: int a[5] = {1,2,3,4,5}; int b[5] = {6,7,8,9,10}; process(a,b,5,sum); /* result: {7,9,11,13,15} */ Other functions can be passed if they match the prototype. For instance,

6.5 Another Synthesis Program

77

int prod(int a, int b) { return a*b; } int max(int a, int b) { return a > b ? a : b; } ... process(a,b,5,prod); /* result: {6,14,24,36,50} */ process(a,b,5,max); /* result: {6,7,8,9,10} */ This mechanism will be very useful in a number of applications. Another name given to the routine passed to the function is a callback, in other words, a routine we are supplying to be called later by a program. This is in contrast to all the other functions that we call directly in our code.

6.4 The C Standard Library The C language is very lean. It does not include many built-in resources for programming beyond its formal syntax. For instance, as we have noted for standard IO we need to employ calls to externally-defined functions that are not part of the language per se. It is common, however, to the treat the C standard library as an integral part of the whole C programming environment, if not of the language itself. Any compilation toolchain that does not supply it is seriously limited. The C standard library contains a large number of function, type, and constant definitions that are widely used in programming. These will include IO, mathematical routines, string manipulation, and a series of other utilities. Each particular header file will allow us to access the prototypes and declarations for a given set of functionality. We can find extensive information about each subroutine in the library in section 3 of the system manual. This can be accessed from the shell by the command man. For instance, $ man 3 printf will print the complete information for the printf() function, including header file, prototype, arguments, return value, etc.

6.5 Another Synthesis Program With the standard library functions, we can start doing proper sound synthesis. We will leave discussing any further details of digital audio theory details, such as the concept of sampling rate (or frequency, fs ), for later. For now, as we have done before, it is sufficient to say that we will be generating a pulse code modulation (PCM) signal with a certain number of samples per second (so we will have fs numbers for each second of audio). We will generate the purest signal of all, a sine wave. We can do that by using the sine function. For each number we output, we calculate the sine of an angle ω , and as it increases from 0 to 2π and then to 4π and 6π , we will generate complete

78

6 Functions

sine wave cycles. This uses the sin() function from the C math library, declared in math.h, which implements the following expression4 : x = sin(ω )

(6.1)

If we use a frequency multiplier f , then we can generate as many cycles as we want over a given period: x = sin(2π f t)

(6.2)

where t is just the time in seconds. Since we are generating fs numbers per second, we need a time index n in samples, so t = fns [36]:   2π f n x = sin (6.3) fs The following program implements this expression directly: #include <stdio.h> #include <math.h> #define FREQ 440 #define SR 44100 #define DUR 2.0 #define TWOPI 6.283185307179586 int main(){ int i; double pi2osr = TWOPI/SR; for(n=0; n < DUR*SR; n++) printf("%f\n", sin(FREQ*pi2osr*n)); return 0; } This program will generate a 2-second digital sine wave at 44100 Hz sampling rate, as ASCII formatted floating-point numbers printed to the terminal (stdout). If you run it, you will see the numbers that compose the digital signal. As before, we can use pipes, redirection and the tobin program: $. /sine | ./tobin > waveform.raw As we have seen in Sect. 4.5.2, you can open waveform.raw in an editor, as a 32-bit float-encoded raw soundfile with fs = 44100 and one channel of audio.

4 Some compilers require the command-line option -lm to link to the standard C math library (libm). You can add this if an undefined symbol error is reported by the linker.

6.5 Another Synthesis Program

79

6.5.1 Plotting Now that we are able to store data in arrays, we can create a better terminal plotting program to display this waveform. The idea is that we will use a buffer, which is a block of memory used to hold data temporarily, to accumulate input samples. When the buffer is full, we will plot it. The buffer will hold enough numbers to print the maximum number of columns in the terminal (e.g. 80). To plot the data, we will check whether each sample matches the number of the line we are currently printing. The input data is expected to be in the normal range [−1.0, 1, 0] and is scaled up to the plot range. Since the standard output is line oriented, we have no choice but print line-byline, even if the intention is to print the data in columns. As the printing position can only move to the right and downwards, we will need to pay attention to this when plotting. Here is a function that does this: it takes a data buffer (array), the maximum and minimum plot values, and the number of samples in the buffer: void plot(float *data, int ymin, int ymax, int nx) { int n, m; /* for each value in the range [ymin, ymax] */ for(m=ymax; m >= ymin; m--) { /* on each column */ for(n = 0; n < nx; n++) { /* print zero line */ if(m == 0) printf("-"); /* print star if rounded value matches */ else if(lround(data[n]*ymax) == m) printf("*"); /* else print blank */ else printf(" "); } /* jmp a line */ printf("\n"); } } We proceed from the top left of the figure, from line ymax to line ymin and plot an asterisk if the value of the waveform at a given column matches the line number. Since the signal sample is a floating-point number, we use the standard library function lround() to round it to the nearest integer before we compare it. When we reach the end of the line (nx columns), we move to the next line, decrementing the line count. With this function, we can write a simple program, plot2, to take data from the standard input and print it to the terminal. In this case, the code assumes an 80

80

6 Functions

column by 24 line display, but this can be modified by setting the COLS and LNS constants5 : #include <stdio.h> #include <math.h> #define COLS 80 #define LINS 24 int main(){ float buffer[COLS]; int err, n; do { /* get data input from stdin into buffer */ for(n=0; n < COLS; n++) err = scanf("%f", &buffer[n]); plot(buffer, -(LINS-1)/2, (LINS-1)/2, COLS); /* clear buffer */ for(n=0; n < COLS; n++) buffer[n] = 0; } while(err != EOF); return 0; } The program will read ASCII float samples from the standard input until an EOF signal is detected. If we set the sine wave frequency to 5506 , by modifying the FREQ constant, and run the program, piping its output to the plot2 input, $ ./sine | ./plot2 we will get the plot to the terminal shown in Fig. 6.3.

6.5.2 Realtime Furthermore, if we have a program that can send ASCII samples directly to the soundcard digital-to-analogue converter (DAC), then we can also use the sine program to generate audio in realtime. This will employ the same pipe mechanism as in the raw-waveform writing and terminal plotting programs, except that the destination is now the default soundcard in the system. Supposing this program is called todac7 , then 5

In fact, we will be using 23 lines. In order to accommodate the [-1,1] range, we need an extra line to account for values at 0. Therefore the plot requires an odd number of lines. 6 This is to line up a single period with the terminal size. Actually, a 551.25 Hz wave would complete a single cycle in 80 samples at 44100 Hz. 7 We will study this program later, in Chapter 11, where we will also find its source code.

6.6 Arguments to main()

81

Fig. 6.3: A plot to the terminal using plot2.

$ ./sine | ./todac will play a 440 Hz sine wave.

6.6 Arguments to main() C programs can accept initial parameters when they start. These are normally passed from the shell in the form of separate arguments when the program is invoked. Depending on the shell and on the system, there may be other ways to pass these parameters. However, they are generally accepted in a C program in the same manner, regardless of their source, as arguments to the main() function, the entry point to the program. To give arguments to a program, we use a second form of main(), which we have not yet discussed. Arguments are passed to any program through two parameters declared in the main() function. These are usually called argc and argv, but these names can be anything. What is important is that the types match what the linker will expect as the main function prototype: int main(int argc, const char *argv[]); The argc parameter gives the number of arguments passed to the program and is declared as an int. Programs receiving no parameters will have an argument count of one. The argv parameter is an array of constant strings containing any arguments passed to the program. The first string in this array is always the program name. For example,

82

6 Functions

#include <stdio.h> int main(int argc, const char *argv[]) { int i; for (i=1; i<argc; i++) printf("%s\n", argv[i]); return 0; } This program will print out all of its arguments, starting with the program name. Note that the argv parameter can also be declared as const char**, which indicates a two-level indirection (pointer to pointer to char), a two-dimensional array. In this particular case, the two forms are equally applicable.

6.6.1 Translating Arguments Each argument is a string. In some cases these arguments might need to be converted or translated into numeric data of different types. The following standard C library functions declared in stdlib.h can be used for this: int atoi(const char *string); // string to integer double atof(const char *string); // string to double The following example demonstrates their use: #include <stdio.h> #include <stdlib.h> int main(int argc, const char *argv[]) { double a,b; if (argc < 3) { printf("too few arguments \n"); return 1; } a = atof(argv[1]); b = atof(argv[2]); printf("%f \n", a + b); return 0; } In addition to these functions, conversions from ASCII strings to numeric data can alternatively be done with strtof() (float), strtod() (double), and strtol() (long int). These allow an initial portion of a longer string to be converted. Unlike atof() etc., they also output a pointer to the remainder of the input string so that further conversions or other operations on it can be carried out.

6.7 Conclusions

83

This is useful when a string contains several numbers that need to be retrieved. See also sscanf(), which has a similar form to scanf() but takes its input from a string.

6.7 Conclusions This chapter set out to discuss subroutines as the final element to complement structured programming in C. We have examined all relevant aspects of functions, from definitions to call semantics and prototypes. We saw how arrays and pointers can be used as arguments, allowing functions to reference externally-defined memory in addition to local variables. An alternative form of the main() function was also introduced, together with the mechanism for passing arguments to a program. We concluded the chapter with a new digital synthesis example, which was an improvement on the earlier example, since we were able then to use the C standard library mathematical functions to generate accurately pure tones of a given frequency. In the next chapter, we will be able to conclude the study of the C language per se, so that we can move on to looking at specific libraries that will be relevant to musical applications.

Problems 6.1. Write a program that takes any number of arguments and reports the number of characters in each of them. 6.2. Write a version of the synthesis program presented in Sect. 6.5 that takes a frequency value as a command-line parameter. 6.3. Write a function to print the first N numbers of the Fibonacci sequence, defined as F0 = 0, F1 = 1, Fn+1 = Fn + Fn−1 [29]: {0, 1, 1, 2, 3, 5, 8, 13, ...}. 6.4. Write a function that will take an input pressure amplitude in N/m2 and converts it into sound pressure level (SPL) values in decibels. Write a program that will print an SPL value given a certain pressure value at the command line. Use a the expression SPL(a) = 20 log10 20×10 −6 and the math library function log10() (header file: math.h).

Chapter 7

Structures

Abstract User-defined types are the main topic of this chapter. We look at how these can be defined via C-language structures and unions. We show how to manipulate these new types, and how they can be treated like other built-in types, through standard variables, arrays, and pointers. The chapter concludes with a look at bitoriented operations. Arrays allow us to store contiguously in memory a number of data items of the same type. The C language completes this with a means to reserve a non-uniform block of memory that can contain a combination of elements of various different types. This is implemented through structures.

7.1 Defining a New Type In addition to all the built-in data types we have been using so far, including arrays, it is possible to define new ones based on a combination of these. This is done by creating a struct block, giving it a name, and adding elements to it called member variables: struct name { type member_name; ... }; Once a new type is defined, we can use it our program to declare any variables we need. To do this, we use again the keyword struct followed by the name we gave our new type, and the variable name: struct name var;

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_7

85

86

7 Structures

To facilitate the use of the new type in a simpler way, we can employ the typedef mechanism. This allows us to define new names for already existing types: typedef type new_type; This can be applied to a structure name after it has been defined, typedef struct name my_type; or directly to the structure definition, typedef struct name { type member_name; ... } my_type; allowing my_type to be used in variable declarations: my_type var;

7.1.1 Member Access Structure members can be accessed by concatenating the name of a variable with a dot and the name of the member we want to select, as in var.member. For example, let’s consider the case of a data structure that models a synthesiser note. For this new type, we need one integer to keep a note number (i.e. using the MIDI protocol1 ), and two floating-point members to keep the amplitude and duration (in seconds) of the note: typedef int float float } NOTE;

struct note { number; amp; dur;

Once that is defined, we can declare one or more variables of this type: NOTE a = {0, 0.f, 0.f}, b = {60, 0.f, 0.f}; noting that we can initialise each member using a comma-separated list of constants/variables inside brackets, matching the order of declaration of the structure members. Alternatively, we can initialise members out of order by quoting their names following a dot: NOTE d = { .amp = 0.f, .dur = 0.f, .number = 62}; 1

See Chapter 12.

7.1 Defining a New Type

87

To assign a value to a member variable, or to access it, we use a dot notation similar to that used for initialisation, but prefixing it with the variable name instead: a.amp = 1.f; a.number = b.number; b.dur = a.dur + 1.f; It is also possible to assign a whole structure to another, which will copy all of its members across to the destination: a = b; This also means that structures, like other types, can be used as function arguments. As we have seen in Chapter 6, this implies copying parameters into the arguments, which work as local variables for the function. For example, if we want to implement operations on the NOTE data type, we need to supply a set of functions to do that: /* transpose pitch */ NOTE transpose(NOTE x, int semitones) { x.number += semitones return x; } /* scale amplitude */ NOTE ampscale(NOTE x, float gain) { x.amp *= gain; return x; } /* change duration */ NOTE temposcale(NOTE x, float amt) { x.dur *= amt; return x; } Note that while all of this copying in and out of the function (the return value is also copied out) is probably OK for a structure that is small in size like NOTE. Larger ones might create an overhead that might not be ideal. In this case, we should consider keeping the structures in place and passing pointers to variables as parameters.

7.1.2 Pointers to Structures A pointer to a structure works under the same principles as built-in variables. We can declare it, as usual, by using an asterisk,

88

7 Structures

my_type *p; and assign it to an existing memory address, dereference it, etc.: NOTE *p; NOTE a = {60, 1.f, 0.25f}; NOTE melody[7]; p = melody; while(a.number < 67) { *p++ = a; a.number++; } To access the members, we need to dereference it first, and then apply dot selection. Since the latter operation has higher priority than the former, we need to use parentheses to ensure the correct order: (*p).amp = 0.f; This is slightly awkward, but thankfully there is a simpler version provided by the -> selector, which is the dot counterpart for pointers: p->amp = 0.f;

7.2 Functions in Structures Structure members can be of any built-in or user-defined type. This excludes functions, which are not types themselves, but allows pointers of any kind, including pointers to functions. Sometimes it is useful to pack together a series of operations inside a data structure on which they are supposed to work. For instance, it would be nice to be able to have a function that outputs the frequency in Hz (cycles per second, cps) corresponding to a note number. We could include this as part of the NOTE type to keep things together: typedef struct note { int number; float amp, dur; double (*cps)(struct note); } NOTE; This only creates a slot to hold the function. We now need to define the function and then add it to an instance of the type as part of its declaration2 : double func(NOTE x){ return 440.*pow(2., (x.number - 69.)/12.); } 2

See Sect. 12.3.5 for more details on the expression used for the note number to cps conversion.

7.4 Enumerations

89

... /* initialise a */ NOTE a= {60, 1.f, 1.f, func}, b; /* get the pitch of the note in Hz */ double hz = a.cps(a); /* copy a to b */ b = a; Note that the function pointer func is copied from variable a to variable b as part of the assignment in the last line. The operation is then available for that variable also. While it looks a bit awkward in this trivial example, adding function pointers to structures can facilitate some important means of coding that will lead us to objectoriented programming.

7.3 Unions Similarly to structures, C has a mechanism to create a hybrid type that can have two or more different interpretations, called a union. In this case all members share the same memory space, so, if one of them gets modified, this will be reflected in the others. For instance, typedef union _conv { unsigned char bytes[4]; int whole; float real; } converter; makes a union of a four bytes, an integer and a floating-point number. It allows us to access the memory as an integer, a real, or four individual bytes: converter a; a.whole = 0; /* sets it to 0, as an int */ a.real = 3.5; /* sets it to 3.5 as a float */ a.bytes[3] = 255; /* sets the third byte */ Note that each access above will modify the variable memory in some way. The first one resets it to zero, the second sets its four bytes to carry a floating-point number, and the third modifies only the third byte by setting all of its bits.

7.4 Enumerations C provides a means of easily making enumerations, i.e. sequential lists of integer constants: enum {ZERO, ONE, TWO, THREE};

90

7 Structures

This creates four constants set to 0,1,2,3, which can be used in the program as ZERO, etc. This is what we call an anonymous enumeration. We can also give it a name: enum numbers {ZERO, ONE, TWO, THREE}; and declare variables of the type enum numbers to use in the program. A new type can also be created with typedef, as before: typedef enum numbers {ZERO, ONE, TWO, THREE} nums; nums b = ZERO;

7.5 Bitwise Operations As a final C language topic, we will look at a set of low-level facilities that allow us to work on individual bits of an integer. These are known as bitwise operations, and differ fundamentally from the kinds of expression we have seen so far. Two main groups of operators exist: those dealing with binary logic and those implementing the shift of bits in a variable.

7.5.1 Bitwise Logic A number of operators are defined for bitwise logic operations, which treat integers as bit fields rather than a binary representation of a given decimal number. They compare each bit of one operand with the corresponding bit of another operand. 1. 2. 3. 4.

&: bitwise AND . |: bitwise inclusive OR . ˆ : bitwise exclusive OR . ∼: bitwise negation (one’s complement, unary operator).

The bitwise AND (&) returns a set bit (1) only when both sides of the operation have that bit set. It is often use with bitmasks to filter bytes off an integer: short mask = 0xFF00, value, masked; value = 0x0111; masked = mask & value; In the example above, the mask will only let the higher byte pass, filtering off the lower one. So the value of masked will be 0x0100:

7.5 Bitwise Operations

91 0000 0001 0001 0001 & 1111 1111 0000 0000 0000 0001 0000 0000

The bitwise OR (|) returns a set bit when either of the operands has a set bit. It is used to turn bits on (and to combine bytes). masked = mask | value; will turn the higher-order byte to 0xFF, resulting in 0xFF11: 0000 0001 0001 0001 | 1111 1111 0000 0000 1111 1111 0000 0000

The bitwise exclusive-OR returns a set bit when only one operand has a set bit, otherwise it will return a zero. The unary one’s complement operator (∼) converts each set bit into a zero and vice versa. Bitwise logic operators can be combined in shorthand expressions with the assignment operator, for the updating of variables, for example: value &= mask; // same as value = value & mask; There are several uses for bitwise logic. The most common of them is to use each bit of a number to determine whether an option is turned on or off in a program. For example, the following program uses an 8-bit integer to hold eight different options that can be selected individually. If a given bit is set, the option is selected. We have a list of constants in an array, each defining one bit. When an option is selected, we OR it with the options list, so that the given bit is set. Later, when we want to check which options have been chosen, we AND the list of options and each different option constant: #include <stdio.h> int main() { unsigned int i = 1; char options = 0; char opt[8] = {0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40,

// // // // // // //

0000 0000 0000 0000 0001 0010 0100

0001 0010 0100 1000 0000 0000 0000

92

7 Structures

0x80}; // 1000 0000 while(i != 0) { printf("select an option 1-8 (0 to quit): "); scanf("%u", &i); if(i && i < 8) options |= opt[i-1]; // select the option } for(i=0; i < 8; i++) if(options & opt[i]) // if the option was selected printf("selected option %d \n", i+1); return 0; }

7.5.2 Bitshift Operators Two operators can be used to shift bits in an integer: /* left shift */ /* right shift */

<< >>

They shift bits by a number of positions specified by the right-hand operand: x << 1 x >> 2

// shifts all bits by 1 position to the left // shifts all bits by 2 positions to the right

Left shifts fill the vacated bits with 0-bits. Right shifts will depend on the type of the operand: for unsigned types, bits will be filled with 0s; for signed types, the sign bit is preserved and the shifted bits will fill with the sign bit (the first bit). This is platform-dependent, but it is the norm in the systems we use. They employ a representation for signed integers called two’s complement. In it, the first bit (sign) is 1 for negative numbers and 0 for positive ones. Left shifts will also preserve the sign bit. This means that left shifts are equivalent to multiplication (a fast way of doing it; see Fig. 7.1): x << n

// multiplication by 2ˆn

Likewise, right shifts are equivalent to division (with rounding, see Fig. 7.2): x >> n

// division by 2ˆn

So, a fast way of multiplying or dividing by 2 is to left or right shift a number by one position. The division will be rounded down to an integer.

7.6 Conclusions

93

0

1

0

1

1

0

1

1

a = 91;

1

0

1

1

0

1

1

0

a << 1; //182

Fig. 7.1: Bitwise left shift.

0

1

0

1

1

0

1

1

a = 91;

0

0

1

0

1

1

0

1

a >> 1; //45

Fig. 7.2: Bitwise right shift.

7.6 Conclusions In this chapter, we have seen the final elements of the C language syntax. It is a wonder that we can introduce the whole of the language in a few chapters, but that is a significant characteristic of C: it is small. From now on, we will be concerned with the libraries that make up a modern computing environment, in particular those that deal with sound and music computing. The power of the C language resides in the combination of this simple, small set of rules, with the huge variety of system libraries that provide specific functionality for particular tasks. In the next chapter, we will start the next stage of our journey by looking at memory management.

Problems 7.1. Using a bitwise operation, write a program that checks if a user-provided number is a power of two. 7.2. Algorithmic Music Composer: the task in this problem is to develop a program that can generate scores using Stochastic Music principles. The music will be written as a numeric score for a system such as Csound (or equivalent). This score should be printed to the terminal (using printf()). (a) General outline:

94

7 Structures

– The program should ask for three inputs from the user: (i) the total number of notes; (ii) the initial note; and (iii) the random walk interval (> 1). – The program should generate five parameters for the score: (1) the instrument number: a discrete random choice of a minimum of 2 instruments (2) the note start time: random number values (starting from 0 secs). The sequence of notes will have to be increasing in time, each note should not start earlier that the previous (but can start at the same time). The random values should be limited so that the next note never starts more than 1 sec after the current one. (3) the note duration: a random value between 0.5 and 1.5 (secs) (4) the note amplitude: a random value between 0.0 and 1.0. (5) the note number (pitch): apply a random walk algorithm3 over a closed range from 0 to 127 (MIDI note numbers4 ).

Notes: – The C standard library function rand() can be used for all random number generation. See the relevant manual page for more details on how it works. Note that you will need to keep the random numbers to various ranges (use the modulo operation). – The score can use any numeric format, but should contain the five parameters as outlined above. We suggest the use of the Csound standard numeric score as the output as it provides a simple but structured format, which can be played directly. – A data structure holding note parameters might be useful for modelling each note in a sequence/list.

3 4

See [15, 365–8] for details on this algorithm. See Chapter 12.

Chapter 8

Memory Management

Abstract With most of the C language already covered, this chapter looks at the fundamental principles of dynamic memory allocation and management. The main C standard library functions designed to create, expand, and dispose of free-store memory are introduced. We employ these in two basic applications: dynamic arrays and linked lists Up to now, we have not been concerned with how memory is allocated in a program. All we know is that when we declare a variable in a block, it comes into existence while that block is active (i.e., if it is a function, during a call) and then gets destroyed when the program leaves the block. This is the type of storage called automatic. The mechanisms for it are managed by the compiler at compile time, regardless of whether we are using a single variable, an array, or a structure. We do not have to worry too much about the details of memory allocation, it is generally seamless. However, this can be problematic in two particular cases: 1. When we do not know how much memory we will need at compile time. As we have seen, for instance, it is not possible to use a variable to define the size of an array. 2. When the memory space required is substantial. Automatic variables and arrays are allocated in a part of the program memory space called the stack, which might not have enough space for very large memory blocks. To cover these cases, we need to be able to manage the program memory in a more precise way. This is done through dynamic memory allocation.

8.1 Allocating Memory Memory management is provided by the C standard library, whose stdlib.h header file supplies functions to allocate and dispose of memory space. These use © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_8

95

96

8 Memory Management

a different part of the program memory space, called the heap, which can handle larger blocks in a dynamic way. The basic allocation function is malloc(): void *malloc(size_t size); This allocates a certain number of bytes (size) (size_t is an integer type) and returns the address of that location as a generic pointer (type void*). With the built-in sizeof() operator, we can retrieve the size of any data type at compile time. For example, with int *pa = (int *) malloc(sizeof(int)*N); we can create an int array dynamically, where N is the number of items in it. Because malloc is used to allocate an unspecified memory block, it returns a void pointer, which then needs to be cast to the right type (int *, in this case). After the memory is allocated, you can use pa as an array. For example, pa[n] is the element n of the array. In addition to malloc, we have calloc: void *calloc(size_t count, size_t size); which allocates a count number of items of size size and resets the memory to zero. Note that when allocating space for strings, we need to take account of the terminating '\0' character, so we should always add one extra character to the length of the string. The strlen() function returns the length of a given string without its terminating character and can be used in calculating the necessary space. The function strdup() duplicates a string, allocating memory for it: char *strdup(const char *s1); which should be disposed of, after use.

8.1.1 Reallocation If the memory has already been allocated, but it needs to be expanded or contracted, the realloc() function can be used. It allocates new space and copies the existing data to it, returning a new pointer to this location: void *realloc(void *ptr, size_t size); where ptr is the original memory address. This will be disposed of once the reallocation is completed. When a region allocated with calloc() is extended, there is no guarantee that the extra memory will also be filled with zeros.

8.2 Dynamic Arrays

97

8.1.2 Freeing Memory It is left to the programmer to dispose of memory that has been allocated dynamically. If this is not done, the program will leak memory, which is never a good state of affairs. To free memory we use void free(void *ptr);

8.1.3 Setting and Copying Memory Blocks The C standard library provides functions to reset and copy whole memory blocks. These functions are declared in string.h. To set each byte of a memory location to a given value we can use memset: void *memset(void *b, int c, size_t len); This function writes len bytes of value c, converted to an unsigned char, to the memory b. It returns b. Note that this function is almost exclusively used with c = 0, to set an area of memory to 0. To copy data from one block to another, we can use void *memcpy(void *dst, const void *src, size_t n); This function copies n bytes from the memory area src to the memory area dst. The memory blocks should not overlap. If this is the case, then memmove() should be used instead.

8.2 Dynamic Arrays We can take advantage of the memory management functions provided by the C library to implement storage that can be dynamically resized. It is often the case that we need to expand an array according to changes in program state. We can thus design a module to provide this facility to our programs. For example, let’s consider a data structure to model a variable-size floating-point array: typedef struct _dynarray { unsigned int size; unsigned int length; float *array; } dynarray; With this, a dynamic array can be created to have a given initial size. The array should be allocated with some space to spare for future growth, and this is determined by the underlying length of the memory location we are using (Fig. 8.1).

98

8 Memory Management

This allows its size to grow without the need for reallocation, which can be an expensive operation. Under these conditions, the module can define a function that will create a dynamic array, as well as another one to release the allocated memory: dynarray *dynarray_create(unsigned int size) { dynarray *p = (dynarray *) malloc(sizeof(dynarray)); p->size = size; p->length = size * 2; p->array = (float *) calloc(p->length, sizeof(float)); return p; } void dynarray_delete(dynarray *p) { free(p->array); free(p); }

length  size  -

-

Fig. 8.1: Dynamic array.

We also need to provide means to access the data (getter and setter functions). Since we are holding the size of the array, we can protect against fencepost errors (i.e. accessing beyond the array size): float dynarray_get(unsigned int index, dynarray *p) { if(index < p->size) return p->array[index]; else return 0.f; } void dynarray_set(unsigned int index, dynarray *p, float val) { if (index < p->size) p->array[index] = val; } Finally, we need to provide a means of resizing the array that will trigger a reallocation if we exceed the underlying memory space: void dynarray_resize(unsigned int size, dynarray *p) { if (size < p->length) p->size = size;

8.3 Linked Lists

99

else { p->size = p->length; p->length = size * 2; p->array = (float *) realloc((void *) p->array, p->length*sizeof(float)); memset((char *) (p->array + p->size), 0, (p->length - p->size)*sizeof(float)); p->size = size; } } Note that we make sure the newly allocated space is cleared (set to 0), as we did in the dynarray_create() function (by using calloc()). With this module in place, we should have enough flexibility to manipulate arrays that need to grow (or indeed shrink).

8.3 Linked Lists As we have seen above, the combination of dynamic memory allocation and structures allows us to design a new data type that can be grown or shrunk. However, for some applications, array-style storage, where we use contiguous memory locations for each data object, is not always ideal. This is especially the case if we need to insert, delete, or reorder items. For these applications, we can avail of a linked list [29]. Each element of a linked list is defined by a structure that will normally hold two kinds of members: the data it holds and one or more link addresses (Fig. 8.2). These are used to connect elements together (hence the name) so that we can manage the list more cohesively.

-

-

-

- NULL

Fig. 8.2: Linked list.

For example, a singly-linked list of integers would look like this: typedef struct _elem { int data; struct _elem *next; } elem;

100

8 Memory Management

To create a list we start with an empty list1 : elem *head = NULL; We can add items to the list (appending them): elem *append_elem(elem *p, int data){ elem *newp = (elem *) calloc(1,sizeof(elem)); if(p != NULL) { /* find the last element */ while(p->next != NULL) p = p->next; /* link the new element in */ p->next = newp; } newp->data = data; return newp; } The function above returns a pointer to the last element of the list. Note the use of calloc(), which ensures that the structure pointers are reset at the start. It is also important to be able to delete each element (from the end of the list): elem *remove_last(elem *p){ elem *r = NULL; if(p != NULL){ /* find the last element */ while(p->next != NULL){ r = p; p = p->next; } /* free the memory */ free(p); /* unlink the deleted element */ if(r != NULL) r->next = NULL; } return r; } This also returns the last element so we can keep track of the end of the list. The last element to be removed returns NULL, so we could use this function to destroy the whole list (in a loop). Lists are particularly flexible for inserting, as well as removing links, without the need to move elements around (Fig. 8.3). To do this, once we have created a new link, we only need to modify the links at the relevant position: elem *insert_elem(elem *p, unsigned int pos, int data){ 1

The NULL pointer is used to define that it is not pointing to any address.

8.3 Linked Lists

101

-

?

-

- NULL

Fig. 8.3: Inserting a new item into a linked list.

if(p != NULL) { unsigned int n = 0; elem *newp = (elem *) calloc(1,sizeof(elem)), *head; head = pos ? p : newp; /* find the insert position */ while(++n < pos && p->next != NULL) p = p->next; /* insert the element */ newp->next = p->next; newp->data = data; p->next = newp; return head; } else return NULL; } The following program demonstrates these principles: int main(){ elem *head = NULL, *p; int i = 0; head = append_elem(head, 0); printf("head: %d \n", head->data); while(++i < 5) { p = append_elem(head, i); printf("added %d to list\n", i); } head = insert_elem(head, 2, -2); do printf("deleting %d from list \n", p->data); while((p = remove_last(head)) != NULL); return 0;

102

8 Memory Management

} When this program is run, it will print the numbers appended to the list, insert one new element, and then print the numbers deleted from it. Note how we proceed by removing items from the end of the list, in this case: $ ./list head: 0 added 1 to list added 2 to list added 3 to list added 4 to list deleting 4 from list deleting 3 from list deleting 2 from list deleting -2 from list deleting 1 from list deleting 0 from list Other operations can be added to navigate, search, set and get elements, etc. The example provided here is of a singly-linked list, which is the simplest kind. It is also possible to add a double link (both forward and backward), which can be more useful for some applications. The principle of linked lists is very useful in applications where we want to work with a variable-size collection of data elements.

8.4 Conclusions In this chapter, we have introduced some key mechanisms of memory management. We have seen that it is possible to access large quantities of memory space, from an area called the heap, to use them in a program. It is very important that we are careful when allocating space that we avoid leaks, areas of unused or unreachable memory that we have reserved for our programs but never managed to release. We have also seen how dynamic memory allocation can be used in the creation of linked lists that can grow and shrink as required. Memory management will also be very important when we start dealing with file data, in the next chapter. We will see that in many applications we need to set aside specific portions of memory to copy data into for processing. Since we might not know how much of it we need, we will have to use dynamic memory allocation.

8.4 Conclusions

103

Problems 8.1. Write a program that takes in any number of non-negative integers as commandline arguments and sorts them in ascending order. Use dynamic memory allocation and arrays, and check for valid inputs, and free the memory when finished. 8.2. Write a monophonic sine wave synthesis program that will read a sequence of pitches in Hz from the standard input and play them in a sequence (each one of them lasting for one second). Use a linked list to store the pitch data and check for EOF (ctl-d) to signal the end of input.

Chapter 9

File Input and Output

Abstract This chapter expands our means of input and output by introducing file operations defined by the standard C library. We first look at formatted text output and then explore the principles of generic binary file access. The chapter concludes with an application example of file IO that is supported by the sound and music computing system Csound. As with other types of IO, file access is not provided directly by the C language. This type of service relies on libraries or system calls provided by the OS. The lowlevel form of file access in UNIX-like systems is given by the open(), read(), and write() (all declared in unistd.h). This is often not portable to other platforms. However, where the C standard library is present, we can use a higherlevel interface provided by that library, which is more programmer friendly (and portable). This chapter will concentrate on the major functions for file manipulation found in the standard C library.

9.1 Standard C Library File IO All file IO functions, data structures, macros and type definitions in the C library are defined in stdio.h along with the other standard IO functions we have already seen. They provide means for reading and writing text files and/or, more generally, binary data files, such as sound and MIDI files. The C standard [24] defines that any IO operation, whether it is directed to or from various types of hardware, or from files on storage devices, is mapped through logical data streams. Two distinct types of mapping are identified, text and binary. The latter is an ordered sequence of characters that matches the internal data used by the computer, whereas the former is a line-oriented sequence of characters, each line being made up of zero or more characters terminated by a newline character. Implementations may or may not distinguish between these two types, but they are commonly treated separately. A stream can also have an orientation, which may © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_9

105

106

9 File Input and Output

be byte-oriented or wide-oriented. The orientation of a stream is determined by the first use of either a byte-oriented IO function, or a wide-character IO function. In this book, we will only discuss byte-oriented streams. Independently of the type of file we want to open, we use fopen(): FILE *fopen(const char *filename, const char *mode); This function opens a file stream defined by a FILE structure. The name of the file to be opened is filename (which must be a valid name). The mode string determines how the file may be accessed [24]: • • • •

"r": open file for reading. "r+": open for reading and writing. "w" : truncate to zero length or create text file for writing. "w+": open for reading and writing. The file is created if it does not exist, otherwise it is truncated. • "a": open for appending, write-only. The file is created if it does not exist. • "a+": open for appending, read and write. The file is created if it does not exist.

The stream is positioned at the beginning of the file for the reading and writing modes and at the end of the file for the appending modes. The C standard [24] also asks for the inclusion of the letter b in the case of opening non-text (binary) files. Some compilers do not require this, making no distinction between file types. The standard also provides for an exclusive mode, denoted by the letter x, which will require a new file to created for the writing modes ("wx", "w+x"), and will make the function return an error if the file already exists. With a file opened using the append mode, all subsequent writes to the file are forced to the then current end of file, regardless of any calls to fseek() or similar functions (see Sect. 9.3.1 for these). If the open operation is successful, fopen() returns a valid file stream handle. This FILE* handle will be used with all other functions that operate on the open file and it is opaque, i.e. should not be touched or changed directly. If fopen() fails, it returns a NULL pointer so the return value must always be checked for this. For example, FILE *fp; if ((fp = fopen("myfile", "r")) ==NULL){ printf("Error opening file\n"); } To close a file, we use fclose(), whose prototype is int fclose(FILE *fp); This function closes the file stream associated with fp, which must be a valid handle previously obtained using fopen(), and disassociates the stream from the file. The fclose() function returns 0 if successful and EOF (the end-of-file constant) if an error occurs. Any open file streams are also closed when the main() function returns.

9.2 Text File Functions

107

The OS provides three open file streams that can be used with the file-writing or reading functions. These correspond to the standard input, stdin (open for reading); the standard output, stdout (open for writing); and the standard error, stderr. Programs should not open or close these streams, as they are provided by the system.

9.2 Text File Functions A number of functions are provided for text file IO. First we have fputs() and fgets(), which write and read a string to and from a file, respectively. Their prototypes are: int fputs(char *str,FILE *fp); char *fgets(char *str, int num, FILE *fp); The fputs() function writes the string str to the file stream fp. It returns EOF if an error occurs and a non-negative value if successful. The null that terminates str is not written. The fgets() function reads characters from the file fp into a string str until num-1 characters have been read, a newline character is encountered, or the end of the file is reached. The string is null-terminated and the newline character is retained. The function returns str if successful or NULL if an error occurs. Single-character functions are also available: int fputc(int c,FILE *fp); int fgetc(FILE *fp); where the character in c gets written into the stream after conversion to unsigned char, both functions returning the character written or read, or EOF if an error occurred (or the end of the file was reached). A character can also be pushed back into the stream using int ungetc(int c,FILE *fp); and subsequent reads to the stream after calls to this function will retrieve the pushed characters in reverse order. The two remaining text IO functions are fprintf() and fscanf(). These functions operate in a similar fashion to printf() and scanf() except that they work with files. Their prototypes are int fprintf(FILE *fp, const char *fmt, ...); int fscanf(FILE *fp, const char *fmt, ...); and they read/write to/from an open file stream. Note that int fscanf(stdin, const char *fmt, ...); int fprintf(stdout, const char *fmt, ...); are equivalent to printf() and scanf(), since we are using the stdin and stdout streams, respectively, for input and output.

108

9 File Input and Output

The following is a simple example of a text-writing program: #include <stdio.h> #include <string.h> int main(int argc, char **argv) { FILE *fp; char buffer[1024]; fp = fopen(argv[1], "w"); if(fp != NULL) { printf(" Type in your text (use 'end' to finish) \n"); do { scanf("%s", buffer); if(strcmp("end", buffer) == 0) break; fprintf(fp, "%s ", buffer); } while (1); fclose(fp); return 0; } else printf("could not open the file %s \n", argv[1]); return 1; }

9.3 Direct File IO Functions The standard C library includes two general-purpose direct file IO functions, fread() and fwrite(). These functions can read and write any type of data. Their prototypes are: size_t fread(void *buffer, size_t size, size_t num, FILE *fp); size_t fwrite(void *buffer, size_t size, size_t num, FILE *fp); The fread() function reads from the file fp num number of items, each of them size bytes long, into buffer. It returns the number of items actually read. If this value is 0, no objects have been read. The fwrite() function does the opposite of fread(). It writes to the file fp num number of items, each item size bytes long, from buffer. It returns the number of items written. This value will be less than num only if an output error has occurred. The buffer argument in these functions holds the address of a block of memory with enough space to hold the data that will be read into or written from.

9.3 Direct File IO Functions

109

9.3.1 Reading/Writing Position We can position the file stream reading/writing position to the start of the file using rewind(). Its prototype is void rewind(FILE *fp); It is possible to place the stream pointer at a certain position in bytes in a file, by using int

fseek(FILE *fp, long offset, int whence);

This will position the read/write pointer at the offset position (in bytes), relative to the value of whence parameter, which can be one of: 1. SEEK_SET: the offset is the absolute position from the beginning of file. 2. SEEK_CUR: the offset is the position from the current read/write pointer position. 3. SEEK_END: the offset is calculated in relation to the end of the file. The offset can then be negative or positive (extending the length of the file); the function returns 0 if successful, or the constant EOF if not. We can find the current position by using int ftell(FILE *fp) The position of a stream can also be manipulated via fgetpos() and fsetpos(): int fgetpos(FILE * restrict fp, fpos_t *restrict pos) int fsetpos(FILE *fp, const fpos_t *pos) These work with an opaque object pos of type fpos_t, which is unspecified. The first function records stream positions and the second can set the stream to an earlier recorded position. It is not possible to increment or decrement the position given by fgetpos(), but we can use it to position the stream with fsetpos().

9.3.2 Error Reporting Diagnostics on IO operations are provided by three functions: int feof(FILE *fp) int ferror(FILE *fp) void perror(const char *s) The first of these reports on the end-of-file (EOF) indicator for the stream, whereas the second checks for the error indicator, both returning non-zero if these are set, or zero if not. The final function prints an error message to the standard error stream (stderr), with an optional prefix message taken from the string s. This message will be relevant to the latest IO operation attempted by the program.

110

9 File Input and Output

9.4 File System Functions The standard C library also includes means to manipulate the file system, so that programs can remove, rename, or create temporary files. Under the stdio.h header file, we have: 1. The remove() function, which deletes a file, preventing any subsequent access to it: int remove(const char *filename); 2. The rename() function, which changes the name of a file from old to new: int rename(const char *old, const char *new); 3. The tmpfile() function, which creates and open a temporary file in mode wb+. This file is removed when the stream is closed: FILE *tmpfile(void); According to the standard [24], it should be possible to open a TMP_MAX number of temporary files. This constant is defined in the header file.

9.5 Programming Examples In this section, we look at two examples of file reading and writing. The first is the implementation of a text-to-binary conversion program. This is followed by a computer-aided composition application that is designed to work with the Csound [7, 39] software.

9.5.1 The tobin Program We now present the code for the tobin program, with which, in Chapter 4, we were able to convert a stream of audio data as text-character floats into a sequence of binary numbers (32-bit floats). The input is read from stdin and the output to stdout (Fig. 9.1). The code to realise this is minimal, it takes data from the input until the stream is finished (EOF) and places it in the output, one number at a time: #include <stdio.h> int main(){ float f; while(fscanf(stdin, "%f", &f) > 0) fwrite(&f, sizeof(float), 1, stdout);

9.5 Programming Examples

111 stdin text

?

fscanf()

float f

fwrite() binary

?

stdout

Fig. 9.1: ASCII to binary conversion in tobin.

return 0; }

9.5.2 External Score Generation for Csound Csound is a sound and music computing system and a domain-specific language [34], which can be used in a variety of ways. One of these is to furnish a numeric score for its instruments [36] to perform. Scores, alongside the sound synthesis code the system uses, are provided via XML-like script files called CSD files. We can configure these to call an external score generator program to provide a new numeric score every time we run the CSD file through the system [39]. This allows us to use the C language directly in computer-aided or algorithmic composition applications. This is done using the bin attribute of the score tag in the CSD file (as demonstrated below). This attribute names an external executable which is expected to take in an input text file name as its first argument, and writes to another text file whose name is the second argument. Csound will invoke this user-supplied program passing these files as arguments. The input file will receive the contents of the score section of a CSD file. This allows the program to receive any text parameters defined there. The output of the program has to be a score in the standard numeric format, which is written to the file named as the second argument. Csound will then use this file as its score. In the example below, the program will look for a single floating-point number in the score. With this in hand it will write 10 lines, each one containing an i-statement [39] that will run instrument 1 defined in Csound code. The input parameter is used as the starting pitch (in octave.pitchclass notation) of a chromatic-scale sequence: #include <stdio.h> #include <string.h> int main(int argc, char *argv[]){

112

9 File Input and Output

int i; FILE *fp; char str; float f; if((fp = fopen(argv[1], "r")) != NULL){ fscanf(fp, "%f", &f); fclose(fp); } if((fp = fopen(argv[2], "w")) != NULL) { for(i=0; i < 10; i++) { fprintf(fp, "i1 %d %d %f %f \n", i, 1, 0.1+i/10.0, f+i/100.); } fclose(fp); } else fprintf(stderr, "could not open file \n"); return 0; } If the program above is compiled to a command named scoret, then the following Csound CSD code can be used with it: -odac 0dbfs=1 instr 1 out oscili(p4,cpspch(p5)) endin 8.00

9.6 Conclusions This chapter has introduced the fundamental means to manipulate text and binary files in a program. We saw how they are opened for reading or writing, and how we can get or store data from or to them. We saw that the OS provides three special streams that we can use to write to the standard IO in the same way as we write to

9.6 Conclusions

113

files, and we demonstrated this in our tobin program, which we used in Chapter 4 to convert from text to a binary representation so that our synthesis data could be read by a sound editor. We will see in the next chapter how we can do this directly via soundfiles.

Problems 9.1. Write a program that writes the command-line arguments to a file called test.txt. 9.2. Write a program that can open a text file (such as test.txt above) and print its contents to the terminal. 9.3. Write a version of the tobin program that reads from a file and writes to another. Take the names of the input and output files from the command line. 9.4. Write a version of the sine wave synthesis program in Chapter 6 that writes directly to a binary file. Take as arguments the frequency and the output filename.

Chapter 10

Soundfiles

Abstract The specific case of soundfile IO is discussed in detail in this chapter. Some principles of digital audio are outlined: sampling, digital-to-analogue and analogue-to-digital conversion, data precision, channels, and basic operations. To complement this discussion, a widely used library for soundfile IO, libsndfile, is introduced. In this chapter, we will be discussing the basic aspects of sound storage in computer files. Soundfiles are very important for music programming, as they provide a medium for manipulating audio in a computer. Historically, they were the first type of support for computer music and until very recently they were the typical means of input and output for a sound-generating program. Soundfiles provide a way of implementing computer musical signal processing in a platform/device-independent way, without the need to consider more complex issues relating to realtime performance, audio device access, etc.

10.1 Digital Audio We have seen in the examples of sound synthesis developed in Chapters 4 and 6 that an audio waveform is treated by the computer as a sequence of numbers defining it a regular points in time. This is a type of digital encoding called pulse code modulation [36]. In addition to this, there are other ways to represent an audio waveform in digital form, but these are not generally used directly in audio synthesis and processing. Some of them are designed for data compression, reducing the size of the information that is required to be stored or transmitted. In these applications, data is converted from PCM into one of these formats as needed (and back to PCM for manipulation). The process of encoding a waveform into a digital form is called analogue-to-digital conversion, and its converse is digital-to-analogue conversion. PCM encoding provides us with a transparent and straightforward way to treat a waveform. It is based on the principle of periodic sampling (Fig. 10.1), that is, taking © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_10

115

116

10 Soundfiles

measurements of a waveform at regular time intervals, and quantising (Fig. 10.2), which is finding an output number that will best represent the instantaneous value of the waveform at the sampling point. Each number of a digital signal is called a sample, and a sequence of these will make up a digital waveform that is the computational form of the real-world acoustic waveform. This sequence can take many forms: floating-point numbers, integers, ASCII-encoded (text) or binary. In the case of the programs we developed earlier on, we were using a text encoding of floatingpoint numbers, which we then translated to binary for storage or playback. This form is the most common way of handling digital audio, although, as we have seen, text can be used as well, for simplicity (and portability).

Fig. 10.1: Sampling a waveform (adapted from [36]).

A program that can open binary files for reading and writing can be used to manipulate digital audio data directly. However, interpreting the contents of a digital audio signal will depend on some knowledge about its characteristics. In particular three aspects are significant: 1. How often the samples are taken: the sampling frequency. 2. How the samples are encoded: the sample precision. 3. How many channels the audio signal carries.

10.1.1 Sampling Frequency The fundamental parameter that defines how we are supposed to interpret an audio signal is the sampling frequency, or rate. This is actually a form of playback speed: how fast the different numbers are supposed to exit the computer through the DAC.

10.1 Digital Audio

117

In synthesis, this will also determine the pitch of a signal, since changing the sampling rate will speed up or slow down the playback. Normally, the sampling rate is set as a constant, and we can then calculate all other parameters in relation to it. We determine it in terms of samples per second (also written as Hz). The CD standard demands a sampling frequency of 44,100 Hz, but it is also common to see higher rates such as 48,000 and 96,000 used in production settings. The choice of sampling frequency has two implications: 1. In accordance with the Sampling Theorem [58, 49], it determines the frequency range of a system. No signal with frequencies over twice the sampling rate can be encoded properly in a digital signal. Any such signals will be aliased to frequencies below this threshold; that is, they will be indistinguishable from other signals originally present at those frequencies. 2. The storage and data processing rate will increase with the sampling frequency. Higher rates will demand more storage space, faster processing, faster transmission, etc. The frequency threshold of half the sampling rate is known as the Nyquist frequency and it is a very important constant in digital signal processing. The range of frequencies below this threshold is also known as the digital baseband [61].

10.1.2 Sample Precision Digital audio samples can be encoded in integral or floating-point formats [36]. The type of encoding will determine how much precision is available to the quantiser to represent the sample. For instance, 8-bit integers can be used to hold 256 different values. The quantising stage of the ADC will divide the range of values of a waveform between its minimum and maximum into however many regions are available in a given format (see the example in Fig. 10.2, where for a 5-bit number there are 32 distinct regions). This discretisation process will be more error prone if there are fewer steps, and the result will include a higher level of noise [66]. Integral encoding precision is the determined directly by the number of bits, and the maximum signalto-noise ratio is roughly defined as 6 dB per bit, improving as we increase the size of storage (e.g. 48 dB for 8 bits, 96 dB for 16 bits, 120 dB for 24 bits) [48, 61]. The performance of floating-point encoding is generally at least as good as 24-bit integer for single precision, and much better in the case of double precision [36]. Note also that increasing the number of bits in each sample will require more capacity for the storage of an audio signal block. For integral encodings, the maximum amplitude of a signal will also vary according to the number of bits employed. For instance, for 8 bits, the maximum absolute amplitude of a bipolar signal is 128 (a range of −128 to 127). In the case of 16 bits, this maximum is 32768. In the case of floating-point formats, the amplitude is always expected to be in the normal range of −1.0 to 1.0. This is another reason

118

10 Soundfiles

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Fig. 10.2: Linear quantisation into 32 regions (5-bit samples).

why it is preferable to handle audio signals as floating-point numbers, which can then ultimately be scaled and converted to one of the integer formats if required.

10.1.3 Audio Channels Finally, audio signals can hold one or more independent channels. When it comes to computing a multichannel stream, there are two ways we may treat it: 1. In an interleaved form, whereby each sample point refers to a frame of samples, one for each channel. 2. As completely separate single-channel data, in two separate locations, noninterleaved. The first form is fairly common. In this case the audio stream is made up of a sequence of frames, and the sampling frequency then refers to a frame rate. Each frame is composed of a series of samples in ascending channel order. In this case, if we want to access channel n of N channels, we need to start with an offset of n samples and then pick every Nth sample after that.

10.2 Basic Operations on Signals

119

10.2 Basic Operations on Signals Some basic operations can be summarised as follows: • Gain: gain scaling, or changing the amplitude of an audio signal is done by applying a multiplier (called the gain) to each sample in the stream, eg.: out[n] =

in[n] * gain;

If the gain value changes (slowly) over time, we can have an amplitude envelope (or modulation, if the variation is periodic). • Mix: mixing signals is equivalent to adding them together (summing): out[n] =

in1[n] + in2[n];

• Stereo pan: to place a signal between two stereo channels, we can apply proportionally different amplitudes to each. This is called amplitude panning. For instance, to place a signal at the left speaker, we apply 1 and 0 to L and R samples, respectively. For a midway placement, we use 0.5 and 0.5. For instance, a simple pan control between 0 and 1 (L and R) could be implemented with this code (Fig. 10.3): left[n] = in[n]*pan; right[n] = in[n]*(1.0 - pan); While this algorithm does not provide equal-power panning from centre to left or centre to right, it demonstrates the principle of amplitude panning in a simple way.

in (1 − p)- ×i ? L

×ip ? R

Fig. 10.3: Simple amplitude panning flowchart.

When scaling or mixing two or more streams we have to be careful that the resulting signal does not exceed the maximum amplitude for the given sample format.

120

10 Soundfiles

10.2.1 A Synthesis Example The following example shows a simple synthesis program, which will generate a single-channel soundfile containing a fixed-frequency sine wave using the method demonstrated in Section 6.5. It illustrates the principles of digital signals outlined above and produces a soundfile directly, as shown in Listing 10.1. Note that we generate data in blocks rather than single samples, as it is in general more efficient to do so [13], Listing 10.1: Soundfile sine wave synthesis program. #include <math.h> #include <stdio.h> #include <stdlib.h> int main(int argc, char** argv){ FILE *fpout; // output file pointer float *audioblock; // audio memory pointer int end, i, j; // dur in frames, counter vars int sr = 44100; // sampling rate int blockframes = 441; // audio block size in frames unsigned int ndx = 0; // phase index for synthesis float dur, freq; // duration, frequency double twopi; // 2*PI if(argc != 4) { printf("usage: %s outfile dur freq \n", argv[0]); exit(-1); } /* command line parameters */ dur = atof(argv[2]); freq = atof(argv[3]); /* set the value of 2*PI */ twopi = 8*atan(1.); /* set the total duration in frames */ end = (int)(dur*sr); /* open the file */ fpout = fopen(argv[1], "w"); /* allocate memory */ audioblock = (short *) malloc(sizeof(short)*blockframes); /* this is the synthesis loop */

10.2 Basic Operations on Signals

121

for(i=0; i < end; i+=blockframes){ for(j=0; j < blockframes; j++, ndx++){ /* calculate the samples of a sinewave */ audioblock[j] = (float)(0.5* sin(ndx*twopi*freq/sr)); } /* write to the output */ fwrite(audioblock,sizeof(float), blockframes, fpout); } /* de-allocate memory and close file */ free(audioblock); fclose(fpout); return 0; } In order to interpret the audio data stored in the resulting file, we need to provide the sampling rate, the encoded format, and the number of interleaved channels in the stream, as well as the byte order (44100, 32-bit little-endian float, 1). Without this information, it is hard to be interpret the raw data.

10.2.2 Byte Order Raw soundfile data is generally not portable across multiple platforms. As we have seen in Chapter 2, multi-byte numbers can be stored in different byte orders, depending on the hardware. As we have seen in Sect. 2.1.1, little-endian ordering puts the LSB first and then the remaining bytes in increasing order of significance. Bigendian ordering puts the MSB first and the other bytes in decreasing significance order. This is yet another reason to avoid the use of raw data as the sole means of audio storage.

10.2.3 Self-Describing Soundfile Formats The fact that sample data is meaningless without any information as to how it represents a digital signal points to the need for additional elements to be stored with the sound itself. So far, we have been handling raw soundfiles, because we know what to expect from the sample data. However, if we want to make our soundfiles more flexible and portable, we will need to use a self-describing soundfile format.

122

10 Soundfiles

This will store along with the audio, information about the sampling rate, the number of channels, the sample width (precision), the number of sample frames in the file and other useful information. Each soundfile type will also imply a certain byteordering, which will adopted across all platforms. Programs handling these formats will have to be prepared to read and write all this extra information alongside the audio data in a standard binary form for a particular format. Supporting the huge variety of soundfile types that are available to users is a significant issue for software developers.

10.3 The libsndfile Library The best way to handle different file formats is to use a dedicated library that can manipulate them seamlessly. Currently, libsndfile [40] is one of the best such libraries, supporting several soundfile types with a transparent interface. All the different elements that make up the various formats are hidden away and the library provides a unified way of accessing all of them. There is no need to write code that targets a specific format, as the library will take care of that for us.

10.3.1 Opening Files The libsndfile application programming interface (API) provides a single function to open files for reading or writing. This takes a name (or full path) string, an opening mode SFM_READ, SFM_WRITE or SFM_RDWR, and a pointer to an existing SF_INFO variable (defined, alongside all libsndfile functions, in sndfile.h): SNDFILE *sf_open(const char *path, int mode, SF_INFO *sfinfo); It returns an opaque pointer1 to a SNDFILE structure. The reading or writing operations will depend heavily on the contents of the SF_INFO variable, whose type is the following structure: typedef struct SF_INFO{ sf_count_t frames; int samplerate; int channels; int format; int sections; int seekable; } SF_INFO; 1

Opaque here means we will use it as a black box only, not accessing its contents directly.

10.3 The libsndfile Library

123

Each call to the open function should refer to a separate instance of this data structure. If we are to open a file for reading, then we need to pass a pointer to an empty variable of this type, which will then be filled with information on the various parameters from the data in the file. If we are opening a file for writing, then we need to fill the variable with the desired values for its members before calling sf_open(). Not all structure members are relevant to our discussion here. We need only be concerned with samplerate (sampling frequency), channels, and format. While the first two are self evident and will carry the values for sampling frequency and number of channels, the third requires some further explanation. The format, in the case of libsndfile, is a code to determine two things: (a) the soundfile format we want to write, or are reading, and (b) the sample and encoding format used in storage. The first corresponds to the major format and the second, to the subtype. We combine these options together using a bitwise OR (|). The following list comprises a selection of the most important formats and subtypes supported by libsndfile: • Major formats: SF_FORMAT_WAV /* Microsoft WAV */ SF_FORMAT_AIFF /* Apple/SGI AIFF format */ SF_FORMAT_AU /* Sun/NeXT AU format */ SF_FORMAT_RAW /* RAW PCM data. */ SF_FORMAT_PAF /* Ensoniq PARIS file format. */ SF_FORMAT_SVX /* Amiga IFF / SVX8 / SV16 format. */ SF_FORMAT_NIST /* Sphere NIST format. */ SF_FORMAT_VOC /* VOC files. */ SF_FORMAT_IRCAM /* Berkeley/IRCAM/CARL */ SF_FORMAT_W64 /* Sonic Foundry's 64 bit RIFF/WAV */ SF_FORMAT_MAT4 /* Matlab (tm) V4.2/GNU Octave 2.0 */ SF_FORMAT_MAT5 /* Matlab (tm) V5.0/GNU Octave 2.1 */ SF_FORMAT_PVF /* Portable Voice Format */ SF_FORMAT_XI /* Fasttracker 2 Extended Instrument */ SF_FORMAT_HTK /* HMM Tool Kit format */ SF_FORMAT_SDS /* Midi Sample Dump Standard */ SF_FORMAT_AVR /* Audio Visual Research */ SF_FORMAT_WAVEX /* MS WAVE with WAVEFORMATEX */ SF_FORMAT_SD2 /* Sound Designer 2 */ SF_FORMAT_FLAC /* FLAC lossless file format */ SF_FORMAT_CAF /* Core Audio File format */ • Subtypes: SF_FORMAT_PCM_S8 SF_FORMAT_PCM_16 SF_FORMAT_PCM_24 SF_FORMAT_PCM_32 SF_FORMAT_PCM_U8

/* /* /* /* /*

Signed 8 bit data */ Signed 16 bit data */ Signed 24 bit data */ Signed 32 bit data */ Unsigned 8 bit data (WAV/RAW) */

124

10 Soundfiles

SF_FORMAT_FLOAT /* 32 bit float data */ SF_FORMAT_DOUBLE /* 64 bit float data */ SF_FORMAT_ULAW /* U-Law encoded. */ SF_FORMAT_ALAW /* A-Law encoded. */ SF_FORMAT_IMA_ADPCM /* IMA ADPCM. */ SF_FORMAT_MS_ADPCM /* Microsoft ADPCM. */ SF_FORMAT_GSM610 /* GSM 6.10 encoding. */ SF_FORMAT_VOX_ADPCM /* OKI / Dialogix ADPCM */ SF_FORMAT_G721_32 /* 32kbs G721 ADPCM encoding. */ SF_FORMAT_G723_24 /* 24kbs G723 ADPCM encoding. */ SF_FORMAT_G723_40 /* 40kbs G723 ADPCM encoding. */ SF_FORMAT_DWVW_12 /* 12 bit Delta Width Var Word */ SF_FORMAT_DWVW_16 /* 16 bit Delta Width Var Word */ SF_FORMAT_DWVW_24 /* 24 bit Delta Width Var Word */ SF_FORMAT_DWVW_N /* N bit Delta Width Var Word */ SF_FORMAT_DPCM_8 /* 8 bit differential PCM */ SF_FORMAT_DPCM_16 /* 16 bit differential PCM */ A WAVE file with float (single precision) encoding is defined by the following format code: sfinfo.format = SF_FORMAT_WAV | SF_FORMAT_FLOAT;

10.3.2 Reading and Writing The libsndfile reading and writing functions are defined in two ways: • By the type of audio data buffer we are supplying to it. • By how we are counting the data, in samples or in frames. Tables 10.1 and 10.2 list the names of the functions for each of these categories. Their general form is sf_count_t sf_xxxxx_type(SNDFILE *sf, type *data, sf_count_t n); where xxxxx determines whether it is a write or a read function, and whether we are counting in frames or samples. The argument sf is a handle to an open soundfile, data is an array from which we will read or to which we will write, and n is the size of the data in samples or frames, depending on the specific function employed. The read/write functions will return the number of samples or frames read/written, as sf_count_t, which is an integer type defined by the library to hold values up to SF_COUNT_MAX.

10.3 The libsndfile Library

125

Table 10.1: libsndfile reading functions type short int float double

samples sf_read_short() sf_read_int() sf_read_float() sf_read_double()

frames sf_readf_short() sf_readf_int() sf_readf_float() sf_readf_double()

Table 10.2: libsndfile writing functions format short int float double

samples sf_write_short() sf_write_int() sf_write_float() sf_write_double()

frames sf_writef_short() sf_writef_int() sf_writef_float() sf_writef_double()

As we noted in Sect. 10.1.2, floating-point data will default to the normalised (−1.0, 1.0) range, whereas the two integer formats will have a range that depend on their minimum and maximum signed values. Regardless of the type of data we are using when reading or writing, libsndfile will make sure it is converted correctly to the format and rangesdefined by the subtype we are using in storage. It is also possible to configure the behaviour of libsndfile so that the floating-point range is not normalised by default.

10.3.3 Seeking It is possible to move the reading or writing position to any existing position in the file. We can do this using the sf seek() function, which will offset the current position in a similar way to fseek(), but specifically in relation to the start of the audio data: sf_count_t

sf_seek(SNDFILE *sndfile, sf_count_t frames, int whence);

The offset is always calculated in frames, and the whence parameter can be either SEEK_SET, SEEK_CUR or SEEK_END, determining that the offset refers to the start of the waveform data, the current position, or the end of the data, respectively.

126

10 Soundfiles

10.3.4 An Example Program The following program opens an input soundfile and pans it into a stereo output. The input and output formats will be the same, except for number of channels. The program checks for a minimum number of arguments (three plus the program name), that both files have been opened, and that the input is mono. If one of these conditions is not true, it will exit with an error message. The processing core is composed of this loop: do { cnt = sf_read_double(fin, inbuf, bframes); for(i = j = 0; i < cnt; i++) { outbuf[j++] = inbuf[i] * (1. - pan); outbuf[j++] = inbuf[i] * pan; } sf_writef_double(fout, outbuf, cnt); } while (cnt > 0); where we read a number of frames of one channel into the array inbuf, which is the input buffer. As we have seen in Sect. 6.5, this a block of memory we use to keep data in temporarily before processing. Then we enter an inner loop, which processes every single sample of the output, placing it in the two channels of the output buffer, scaled appropriately to implement the amplitude panning (Fig. 10.3). Note that while the input buffer counts using the variable i, the output uses j, which increases by two in each iteration of this loop. The output buffer is written to the open file. We only process and output as many frames as we have read (cnt). Once the input data is exhausted, the program frees the memory, closes the files and exits. The full program is shown in Listing 10.2. Listing 10.2: Soundfile panning program. #include <stdio.h> #include <stdlib.h> #include <sndfile.h> int main(int argc, const char *argv[]){ const int bframes = 512; /* buffer size */ double *inbuf, *outbuf; /* buffers */ SNDFILE *fin, *fout; /* file ptrs */ SF_INFO info_in, info_out; /* format */ double pan; /* pan position */ if(argc > 3) { if((fin = sf_open(argv[1], SFM_READ, &info_in)) != NULL) { if(info_in.channels == 1) { info_out.format = info_in.format; info_out.samplerate = info_in.samplerate;

10.3 The libsndfile Library

127

info_out.channels = 2; if((fout = sf_open(argv[2], SFM_WRITE, &info_out)) != NULL) { size_t cnt, i, j; inbuf = (double *) calloc(bframes, sizeof(double)); outbuf = (double *) calloc(bframes*2, sizeof(double)); pan = atof(argv[3]); do { cnt = sf_read_double(fin, inbuf, bframes); for(i = j = 0; i < cnt; i++) { outbuf[j++] = inbuf[i] * (1. - pan); outbuf[j++] = inbuf[i] * pan; } sf_writef_double(fout, outbuf, cnt); } while (cnt > 0); free(inbuf); free(outbuf); sf_close(fin); sf_close(fout); } else { sf_close(fin); printf("ERROR: could not open %s \n", argv[2]); return 1; } } else { sf_close(fin); printf("ERROR: input %s not mono\n", argv[1]); return 1; } } else { printf("ERROR: could not open %s \n", argv[1]); return 1; } } else { printf("usage: %s input output pan \n", argv[0]); return 1; } return 0; }

128

10 Soundfiles

Compiling and linking Since now we are using external libraries, and not only the C standard library, we have to tell the compiler where to find the headers and the library. To compile and link to libsndfile, first we need to know where it is installed. If it is in the system directories, then we only need to add the linker flag -lsndfile, which will cause the program to be linked to the library routines. If however, the library is not installed there, we need to indicate where its files are to be found. For headers, we can give a directory to be searched for it with -I /path/to/includes, where /path/to/includes should be replaced by the path to the directory where sndfile.h is located. For library binaries, we need to do the same, but using the -L flag instead. For example, if the library is installed in /usr/local/lib and the headers in /usr/local/include, the full command will be $ cc -o pan pan.c -I/usr/local/include \ -L/usr/local/lib -lsndfile

10.4 Conclusions The libsndfile API is also very well documented; its website www.mega-nerd.com/ libsndfile contains excellent reference documentation on the programming interface. We strongly advise readers to refer directly to this information as a complement to the basic principles outlined in this chapter. Since the library is always evolving, the details of any slight change in the interface or addition of new features will be fully documented there. With this library under our belt, we are now ready to start writing complete offline applications to process audio. This capacity will be enhanced by realtime audio, which will be explored in the next chapter.

Problems 10.1. Write a program that synthesises two sine waves of different frequencies lasting one second, each one panned midway from centre to the left and right sides, producing a raw binary soundfile with 16-bit 44,100 Hz samples. The program should take three arguments: filename, left frequency and right frequency. 10.2. Write a program using libsndfile that changes the gain of an input file, writing a new file as its output. 10.3. Write a program using libsndfile that mixes the two channels of a stereo file into a mono file output. 10.4. Write a program for mixing soundfiles, with the following characteristics:

10.4 Conclusions

129

(a) Accepting any soundfile formats supported by libsndfile. (b) Taking only uncompressed PCM format, in any (integer or floating-point) precision (8-bit (signed/unsigned), 16-bit, 24-bit, 32-bit, floats, doubles). (c) Accepting only matching sampling rate values (print an error message otherwise). (d) Producing stereo files from mono and/or stereo input files; mono files should be panned, stereo files are mixed as they are. (e) Expecting a mix gain to be set for each soundfile.

Chapter 11

Realtime Audio

Abstract This chapter discusses the fundamental aspects of realtime audio programming and access to sound devices. Two widely-used APIs are introduced and contrasted: Portaudio and the Jack connection kit. Programming examples are offered for each, demonstrating realtime processing in C. Realtime audio synthesis and processing depend on a number of components of a computer system: 1. Hardware: the right kind of hardware containing a fast central processing unit and peripherals, that can feed a digital-analogue converter with enough data to ensure an uninterrupted audio stream. For audio processing, we also need an analogueto-digital converter, which will provide the source data for computation. The hardware should ideally provide very small latencies (time delays) between input and output, on the order of a few milliseconds. Some latency is inevitable as data is processed in blocks rather than in single units (samples), but it should be minimal. 2. Software: an operating system that can communicate with the ADC/DAC with very little latency, which depends on fast and flexible switching of tasks, sometimes also referred to as realtime preemption; a suitable API to allow programmers to write applications that access the audio hardware (soundcards/devices) directly. In order to provide realtime audio capabilities to a program, we will need to call on the services of system libraries that allow access to the audio devices. These are platform-dependent: each OS will provide a different library to do the low-level device communication. In Linux, this is normally done by the ALSA (Advanced Linux Audio Architecture) subsystem. In MacOS, the CoreAudio and AudioUnit frameworks are responsible for this functionality. These libraries are also called hardware audio layers (HALs), as they work very closely with the OS components that manage the audio devices. Programs can use these services directly or use higher-level APIs that provide an intermediary layer. The advantage of operating at this level is that the APIs will most likely be implemented across various platforms. In this © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_11

131

132

11 Realtime Audio

case, we do not need to rewrite any of the audio IO code when porting a program from one OS to another. Among these APIs, we can cite Portaudio [5] and the Jack Connection Kit [14]. Regardless of the choice of API, we will see that a number of things are constant across different systems. Audio signals are, as we have seen in Chapter 10, a sequence of frames of sample data. These will be produced as a stream by the soundcard at the rate of fs frames per second (where fs is the sampling frequency). Each sample in a frame will be encoded in some way, as an integer or floating-point number, depending on the system options available. The job of a realtime audio program is to pick up this data stream, process it as efficiently as possible and then send a corresponding audio signal to the output device. The program has to deliver enough data, at a speed that must exceed the sampling rate, to keep the stream continuous, without gaps. If it cannot keep up, the result will be drop-outs: the output buffer will contain silent or garbage frames that will interrupt the audio waveform with clicks and pops. Buffering (i.e. placing the audio data in memory blocks for processing) is required for continuous and smooth IO operation. Generally speaking, the larger the buffer size, the less likely that the stream will be interrupted by gaps in computation. On the other hand, buffering introduces a degree of latency between input and output, and for true realtime operation, we should attempt to limit this to a minimum. IO latencies of over 20 ms are likely to be perceived by users, depending on the type of processing applied. The amount of buffering required will depend significantly on the computation load, on the OS, and on the audio subsystem. A well-tuned Linux or MacOS computer should be capable of achieving latencies close to the millisecond mark. We can determine the latency l introduced by buffering as the total number of buffer frames in the input (n) and the output (m) divided by the sampling frequency fs : l=

m+n fs

(11.1)

In addition to this latency, which is attributed to the program code, there can be other latencies introduced by software and hardware buffers in the layers below the user code. Generally speaking accessing the HAL directly should minimise these, but that will depend on the OS and its audio subsystem.

11.1 Portaudio In this section, we will introduce Portaudio1 as an example of a cross-platform realtime audio IO library. This API allows users to write programs that can take advantage of various lower-level audio host libraries implemented across different OSs. It 1

http://www.portaudio.com.

11.1 Portaudio

133

also supports interfacing with other higher-level systems such as Jack (in both Linux and MacOs) and Pulseaudio (Linux) (Fig. 11.1). The Portaudio functions, constants, and data structures are defined in its public header portaudio.h, which should be included in any source code employing them.

- MS Windows APIs - Jack client - Portaudio software

- ALSA (Linux) - Pulseaudio (Linux) - Coreaudio (MacOS)

Fig. 11.1: Portaudio and its underlying APIs.

Prior to its use, we initialise the library with a call to Pa_Initialize() (the API is defined in portaudio.h). If this call is successful, we can go ahead and call other functions. The library defines a type PaError for error codes, and the constant paNoError indicates success: PaError err; err = Pa_Initialize() if(err == PaNoError) printf("Portaudio initialised\n"); If on the other hand an error is thrown, we can retrieve a diagnostic error string with Pa_GetErrorText(err): else printf("%s \n", Pa_GetErrorText(err));

11.1.1 Listing Devices The library provides a means of listing existing devices in a system. We can get the total number of logical devices (which are mapped to existing physical audio devices) with Pa_GetDeviceCount(). Devices may be configured for input or

134

11 Realtime Audio

output only, or both (bidirectional). By checking the number of channels in a device, we can tell whether it is capable of one or more directions. This is one of the fields in the PaDeviceInfo structure, typedef struct PaDeviceInfo { int structVersion; const char *name; PaHostApiIndex hostApi; int maxInputChannels; int maxOutputChannels; PaTime defaultLowInputLatency; PaTime defaultLowOutputLatency; PaTime defaultHighInputLatency; PaTime defaultHighOutputLatency; double defaultSampleRate; } PaDeviceInfo; which holds other characteristics of a given logical device. We can query each device listed by calling Pa_GetDeviceInfo() and passing the device number to it. The following code demonstrates this: ndev = Pa_GetDeviceCount(); for(i=0; imaxOutputChannels > 0) printf("output device: "); if(info->maxInputChannels > 0) printf("input device: "); printf("%d: %s\n", i, info->name); } From this list and the information provided, it is possible to choose one of the devices by selecting its numerical index. The functions Pa_GetDefaultInputDev ice() and Pa_GetDefaultOutputDevice() can also be used to retrieve the indices of the respective default input and output devices.

11.1.2 Stream Parameters Before opening a device, we will need to configure it with the desired stream parameters. This determines the characteristics of the audio signals we are going to be processing. The parameters include the chosen device number, number of channels, sample format, and estimated latency, and are kept in a PaStreamParameters data structure: typedef struct PaStreamParameters

11.1 Portaudio

135

{ PaDeviceIndex device; int channelCount; PaSampleFormat sampleFormat; PaTime suggestedLatency; void *hostApiSpecificStreamInfo; /* NULL */ } PaStreamParameters; For example, if we wish to select the default devices for mono, using a singleprecision floating point data format, and with a buffer containing bufframes frames, the data structures should be filled as follows: PaStreamParameters inparam, outparam; inparam.device = Pa_GetDefaultInputDevice(); inparam.channelCount = 1; inparam.sampleFormat = paFloat32; inparam.suggestedLatency = (PaTime) (bufframes/sr); inparam.hostApiSpecificStreamInfo = NULL; outparam.device = Pa_GetDefaultOutputDevice(); outparam.channelCount = 1; outparam.sampleFormat = paFloat32; outparam.suggestedLatency = (PaTime) (bufframes/sr); outparam.hostApiSpecificStreamInfo = NULL; Stream parameters are defined separately for the input and output streams. Note that by employing a float format we imply that the audio data will range from −1.0 to 1.0.

11.1.3 Opening Devices We call Pa_OpenStream() to open devices for input and/or output, passing to it the address of an opaque pointer to PaStream, which is the stream handle: PaError Pa_OpenStream(PaStream** stream, const PaStreamParameters *inputParameters, const PaStreamParameters *outputParameters, double sampleRate, unsigned long framesPerBuffer, PaStreamFlags streamFlags, PaStreamCallback *streamCallback, void *userData );

136

11 Realtime Audio

This is slightly different from what we have seen with other similar functions (like for instance sf_open(), where the function returns an opaque handle, but it works in a similar way. By passing the pointer address (a pointer to a pointer), we allow the function to fill it with the correct pointer value and the net result is the same: we end up with a handle for using in other functions. As can be seen, the function returns an error code, and for this reason it has been designed to provide the handle via a pointer. The other parameters in Pa_OpenStream() are, in order: • Stream parameters for input and output respectively. Devices can be opened for input, output, or both. By supplying stream parameters for input and/or output, we are determining how we want the streams to be opened. By passing a NULL instead of an address to a PaStreamParameters variable, we are choosing not to open the device for a given direction. • Sample rate. • Buffer size in frames. • Stream options, via constants that can be combined with a bitwise OR. • Callback, a function that will be invoked to process input and/or output buffers. This is used only in asynchronous mode, otherwise it is set to NULL. • Callback user data, a data structure that will be passed to the callback function, it can be NULL if the callback is not defined. For example, the following call opens devices for input and output streams, passing sr and frames as the sampling rate and buffer size, respectively. It does use any special stream options and its IO mode is defined as synchronous (no callback required). PaStream *handle; err = Pa_OpenStream(&handle, &inparam,&outparam, sr, frames, paNoFlag, NULL,NULL); On return, the function will give an error code, which should be checked before proceeding. If successful, the call will place a valid stream handle in handle. This can be used to start audio IO through the following code line: err = Pa_StartStream(handle);

11.1.4 Synchronous Mode The synchronous IO operation is very similar to what we have seen for file reading and writing. It is sometimes called the push form of audio IO. Functions are provided to take data from a stream and place it in a buffer, and conversely, to put the

11.1 Portaudio

137

contents of a buffer into a stream. They take the handle, a pointer to the buffer, and the number of frames in it: PaError Pa_ReadStream( PaStream* stream, void *buffer, unsigned long frames); PaError Pa_WriteStream( PaStream* stream, const void *buffer, unsigned long frames); The following code shows a direct-through example, where the data is copied from the input to the output without any changes. It can be used to test the IO of a system, as well as give an aural indication of the latencies involved. The function Pa_GetStreamTime() can be used to get the current stream time in seconds and check if we have reached the end of processing: while(Pa_GetStreamTime(handle) < duration){ err = Pa_ReadStream(handle, buffer, frames); if(err == paNoError){ err = Pa_WriteStream(handle, buffer, frames); if(err != paNoError) printf("%s \n", Pa_GetErrorText(err)); } else printf("%s \n", Pa_GetErrorText(err)); } Synchronous mode is blocking: the program will not continue until the read or write operation has returned (this is also the behaviour in file IO). It is less responsive and requires more buffering than the asynchronous mode, resulting in longer latencies.

11.1.5 Asynchronous Mode Using a callback is non-blocking and tends to be the recommended way to implement low-latency realtime audio. It is also called pull mode, because the system will seek audio data when it needs it, rather than have it supplied regularly by a program. At the core of this method, we have an audio callback whose signature is defined by typedef int PaStreamCallback( const void *input, void *output, unsigned long frameCount, const PaStreamCallbackTimeInfo* timeInfo, PaStreamCallbackFlags statusFlags, void *userData );

138

11 Realtime Audio

where we have as arguments the input and output data buffers, the number of frames in these buffers, a timestamp indicating the stream time of the buffer data, options (flags), and a pointer to a user data structure variable, which is used to communicate between the program and the callback. The callback is executed in a separate thread2 , which is started and managed by Portaudio. Since this thread is running under the same process as the main program thread, it will share resources with it, such as memory. The equivalent direct-through processing is implemented by the following callback: int audio_callback(const void *input, void *output, unsigned long frameCount, const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags, void *userData){ int i; float *inp = (float*) input, *outp = (float*) output; for(i=0; i < frameCount; i++) outp[i] = inp[i]; return paContinue; } The callback should be written in such a way that it does not block execution and does not perform any operations that might be too onerous, such as memory allocation, printing to terminal, reading/writing to files, etc. We call this approach realtime safe. As a rule of thumb, we should only use code that involves signal processing computation, so that the callback can be invoked regularly without compromising the continuous operation of input and output. Any other types of action should be placed in a different thread (e.g. the main program thread). If communication is needed between the callback and the rest of the program, it should be done in a non-blocking way to ensure smooth realtime operation, as will be demonstrated by examples in this and later chapters. Note that when employing an asynchronous IO approach, we will need to provide means of keeping the program open while audio processing is happening. This is because we are not directly calling the IO function, but instead the audio subsystem is, in parallel to what is happening in the main() function. As we have noted above, the two parts of the program are run on separate threads (the main program and the Portaudio IO callback thread). If the program falls through the main() function, for instance, it will exit before there is a chance for the callback to start processing. As we have seen before, a program will start at the top of this function and finish when it returns, so we have to delay reaching the end until we are ready to quit. A simple means of achieving

2

These are sections of code that are made to execute in parallel. The audio callback function is an example of a separate thread that is started and managed by the Portaudio library. There is also dedicated support for programs to do this in their own code if required. This is provided by the pthread library [22, 26].

11.1 Portaudio

139

this is to have a simple empty loop (maybe with a call to usleep()3 to avoid excessive use of resources) that checks for time elapsed: while(Pa_GetStreamTime(handle) < duration) usleep(1000);

11.1.6 Closing Up The following sequence of calls can be used to stop processing, close the devices, and terminate the use of the library: Pa_StopStream(handle); Pa_CloseStream(handle); Pa_Terminate();

11.1.7 The todac Program In Chapter 6, we discussed a program that took ASCII samples from the standard input and placed them directly in the audio device. This program can easily be implemented using Portaudio, following the principles outlined above. It uses the synchronous/blocking IO mode, since it is more suited for picking data using a function such as fscanf(), which is itself blocking. This program can be used with any floating-point generating software. It can take as parameters the desired number of channels and sampling rate, which should match what the input stream contains. The full code for the program is shown in Listing 11.1. As with libsndfile, we need to pass the name of the library, as well as its location, to the compiler in the command line. With the library installed, the flag for Portaudio is -lportaudio. Assuming the library exists in /usr/local/, the command line will then be: $ cc -o todac todac.c -I/usr/local/include \ -L/usr/local/lib -lportaudio Listing 11.1: The todac program. #include <stdio.h> #include <stdlib.h> #include <portaudio.h> #include <math.h> #define BUFFRAMES 4096 3 A system call defined in unistd.h that suspends processing for a number of microseconds (1 second = 1000000 microseconds).

140

11 Realtime Audio

void usage() { fprintf (stderr, "usage: todac [sr] [channels] < input\n"); exit(1); } int main(int argc, const char* argv[]){ PaError err; PaStreamParameters outparam; PaStream *handle = NULL; int i, chn=1,bufsize,sr=44100, dev; float *buf, out = 0.f; if(argc > 1) sr = atoi(argv[1]); if(argc > 2) chn = atoi(argv[2]); if(argc > 3) usage(); err = Pa_Initialize(); if(err == paNoError){ dev = Pa_GetDefaultOutputDevice(); bufsize = BUFFRAMES*chn; buf = (float *) malloc(sizeof(float)*bufsize); memset(buf, 0, sizeof(float)*bufsize); outparam.device = (PaDeviceIndex) dev; outparam.channelCount = chn; outparam.sampleFormat = paFloat32; outparam.suggestedLatency = (PaTime) (BUFFRAMES/(double)sr); outparam.hostApiSpecificStreamInfo = NULL; err = Pa_OpenStream(&handle,NULL,&outparam, sr,bufsize,paNoFlag, NULL, NULL); if(err == paNoError){ err = Pa_StartStream(handle); if(err == paNoError){ long cnt, i; do{ cnt = 0; for(i = 0; i < bufsize; i++) { cnt += fscanf(stdin, "%f", &buf[i]); } if(cnt > 0) { err = (int) Pa_WriteStream(handle, buf, cnt/chn);

11.1 Portaudio

141

if(err != paNoError) printf("write error: %s \n", Pa_GetErrorText(err)); } else break; } while(cnt > 0); Pa_StopStream(handle); } else printf("%s \n", Pa_GetErrorText(err)); Pa_CloseStream(handle); } else printf("%s \n", Pa_GetErrorText(err)); free(buf); Pa_Terminate(); } else printf("%s \n", Pa_GetErrorText(err)); return 0; } Note that because we are using fscanf() amongst the realtime audio output processing, this program is not realtime safe. If that function is not provided with input for a long period, we will have interruptions in the audio stream. However, in the simple applications for which it is designed, it performs reasonably well, and it has the advantage of being conceptually very simple.

11.1.8 An Audio Effect The next example implements an audio effect: amplitude modulation (or tremolo) [15, 36]. The principle is straightforward; we take in an audio signal and multiply it by a sine waveform. This makes the amplitude of the signal vary according to the modulating wave. If the sine wave frequency is in the audio range (> 20 Hz), we will have an amplitude modulation effect, which results in the sum and difference of the input signal and sine wave frequencies. If the frequency is in the sub-audio range, we will hear a tremolo (fluctuating amplitude). The amount of modulation is controlled by an amplitude parameter a (Fig. 11.2). If this is 1, we have the full effect. If it is 0, we have just the original input. The expression implementing this is y(t) = x(t) (1 − a (0.5 + 0.5 sin (2π fmt)))

(11.2)

where a is the effect amplitude in the [0,1] range and fm the modulation frequency (Fig. 11.3). This example employs a callback to enable low-latency IO and realtime safety. All of the processing is implemented in this function. It uses a user data structure UDATA to get the parameters from the main program and also to store the sine wave generator time index from call to call: int audio_fn(const void *input, void *output, unsigned long frameCount,

142

11 Realtime Audio 1.0

0.5

0.0

−0.5

−1.0

1000

500

1500

2000

Fig. 11.2: Tremolo effect with a = 0 (black dots), a = 0.5 (blue), and a = 1 (red), using a sine wave input.

2πfm t ? sine()

0.5 1 in 0.5- +? - +? i−a i - ×? i ? out

Fig. 11.3: Tremolo effect flowchart.

const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags, void *userData){ int i; UDATA *p = (UDATA *) userData; float *inp = (float*) input, *outp = (float*) output; float fr = p->freq; float amp = p->amp; float sr = p->sr; unsigned long n = p->n; for(i=0; i < frameCount; i++, n++) outp[i] = inp[i]*(1 - amp*(0.5 + 0.5*sin(n*TWOPI*fr/sr))); p->n = n; return paContinue; } Note that the callback uses only signal processing code and that the operation is fully non-blocking, as per the realtime requirement. The only function call is to sin(), which incurs very little computational overhead. The full program is shown in Listing 11.2. It can be built with the following compiler options:

11.1 Portaudio

143

$ cc -o tremolo tremolo.c -I/usr/local/include \ -L/usr/local/lib -lportaudio Listing 11.2: Tremolo program. #include #include #include #include #include

<stdio.h> <stdlib.h> <portaudio.h> <math.h>

typedef struct udata { float amp; // effect amplitude float freq; // effect frequency float sr; // sampling rate unsigned long n; // time ndx } UDATA; int usage(); int audio_fn(const void *input, void *output, unsigned long frameCount, const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags, void *userData); int main(int argc, const char *argv[]){ PaError err; const PaDeviceInfo *info; PaStreamParameters inparam, outparam; PaStream *handle = NULL; int i, chn = 1, frames = 128, sr = 44100; float duration; UDATA parms; if(argc > 3) { parms.amp = atof(argv[1]); parms.freq = atof(argv[2]); parms.sr = sr; parms.n = 0; duration = atof(argv[3]); } else return usage(); err = Pa_Initialize(); if(err == paNoError){ inparam.device = Pa_GetDefaultInputDevice();

144

11 Realtime Audio

outparam.device = Pa_GetDefaultOutputDevice(); inparam.channelCount = outparam.channelCount = chn; inparam.sampleFormat = outparam.sampleFormat = paFloat32; inparam.suggestedLatency = outparam.suggestedLatency = (PaTime) (frames/(double) sr); inparam.hostApiSpecificStreamInfo = outparam.hostApiSpecificStreamInfo = NULL; err = Pa_OpenStream(&handle,&inparam,&outparam, sr,frames,paNoFlag, audio_fn, &parms); if(err == paNoError){ err = Pa_StartStream(handle); if(err == paNoError){ while(Pa_GetStreamTime(handle) < duration) usleep(1000); Pa_StopStream(handle); } else printf("%s \n", Pa_GetErrorText(err)); Pa_CloseStream(handle); } else printf("%s \n", Pa_GetErrorText(err)); Pa_Terminate(); } else printf("%s \n", Pa_GetErrorText(err)); return 0; } #define TWOPI 6.283185307179586 int audio_fn(const void *input, void *output, unsigned long frameCount, const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags, void *userData){ int i; UDATA *p = (UDATA *) userData; float *inp = (float*) input, *outp = (float*) output; float fr = p->freq; float amp = p->amp; float sr = p->sr; unsigned long n = p->n; for(i=0; i < frameCount; i++, n++) outp[i] = inp[i]*(1 - amp*(0.5 + 0.5*sin(n*TWOPI*fr/sr)));

11.2 The Jack Connection Kit

145

p->n = n; return paContinue; } int usage() { fprintf (stderr, "usage: tremolo amp freq dur\n"); return 1; } As indicated by the message in the usage() function, the program takes three arguments, the amplitude, frequency, and duration (in seconds), the latter of which determines how long the program will run for. Any process can have its execution interrupted by sending a SIGINT signal to it, through typing the ctl-c key sequence at the terminal. Thus, if we wish, we can also stop the tremolo program in this way, before its run time has elapsed.

11.2 The Jack Connection Kit The Jack Connection Kit4 is another cross-platform API for audio IO. It is well supported on UNIX-like systems (Linux, MacOS), and available on Windows, although its status on that platform is not as firmly established. Jack was originally designed to overcome the shortcomings of the lower-level audio API on Linux (ALSA), which was never very well designed to work as a user-level programming interface. In addition to this, Jack also provides a very robust inter-application routing mechanism. This in fact has become its most popular feature, allowing users to connect diverse programs together and use the system as a virtual studio. It has become the de facto standard for professional audio applications realtime IO in Linux, and, to a lesser extent, on MacOS. In fact, in systems where Jack is present, Portaudio can also use it as one of its listed device sources and destinations. Jack works as a client-server system. Applications that want to provide audio IO connect to the server, registering input or output ports. These are then made available to all other clients running in the system. Connections can be made programmatically in the client programs, or via patching (see Fig. 11.4, via a graphical user interface, or a text-based command-line program). A fully-functional API5 is provided for clients that are to be linked to the Jack library (-ljack). In the following sections, we outline the basic operations for starting clients, registering and connecting ports, and processing audio.

4 5

http://www.jackaudio.org. See http://jackaudio.org/api/index.html for its full reference manual.

146

11 Realtime Audio

Fig. 11.4: Jack patcher window on MacOS (JackPilot).

11.2.1 Opening a Client A client program can connect to the server through the jack_client_open() function, defined along with the rest of the API in jack.h: jack_client_t* jack_client_open(const char *client_name, jack_options_t options, jack_status_t *status, ...) which opens a client session with a server. Its parameters are • client_name: this provides the name by which this client will be known to the other clients in the server. • options: a bitwise-OR combination of options: – JackNullOption: no options. – JackNoStartServer: do not attempt to start a Jack server if there is none running. – JackUseExactName: always use the exact name requested, otherwise Jack may generate a unique one. – JackServerName: connect to a specific server name, passed as an extra optional argument (const char *). – JackSessionID: pass a token to allow a session manager program to identify this client at a later time. • status: if non-NULL, this provides an address for the server to return information about the open operation. • optional parameter: the Jack server name (if explicitly requested by the option).

11.2 The Jack Connection Kit

147

Given that we are connecting to a server rather than opening a device, there is not much else we need to do. System parameters such as sampling rate, sample type, and channels are determined by the server. Jack defines each sample as jack_default_audio_sample_t. Each client defines a certain number of input and output streams, each one containing a single channel. So, for multichannel audio, all we need to do is connect to different client ports. The sampling rate is given by the server and we can query it using jack_nframes_t jack_get_sample_rate(jack_client_t *client) where jack_nframes_t is an integral type also used to count frames in other settings.

11.2.2 Registering Ports Signal connections to other clients on the server are made through ports, which are handled by opaque objects of type jack_port_t. In order for these to be made available, we need to register them with Jack. This is done through the following function jack_port_t* jack_port_register(jack_client_t *client, const char *port_name, const char *port_type, unsigned long flags, unsigned long buffer_size) where a port on a given client is identified by a port_name string and should be of a given type (JACK_DEFAULT_AUDIO_TYPE in this case). Options can be passed via flags (as usual, more than one of these are to be bitwise-OR combined), which define the characteristics of the port: • JackPortIsInput: the port can receive data. • JackPortIsOutput: the port can send data. • JackPortIsPhysical: the port corresponds to some physical/hardware input and/or output. • JackPortIsTerminal: for an input port, this means that the data received by it will not be passed out of the client; for an output port, this means that the data sent out does not originate from any other port. The buffer size parameter is only used in the case of non built-in ports (e.g. special types of ports), and is ignored otherwise. This is the case for audio data ports6 , which are one of the standard port types. Once a port is successfully registered, we obtain a handle to it, which can be used to read or write data to or from it. 6

Defined by the JACK_DEFAULT_AUDIO_TYPE port type.

148

11 Realtime Audio

11.2.3 The Processing Callback Jack operates asynchronously, which means that we will need to supply a callback function to the server for reading and/or writing audio data7 . This function has the following signature: typedef int (*JackProcessCallback)(jack_nframes_t nframes, void *arg); In the processing callback, the number of audio frames and the user data are passed as arguments. This means that we will need to query the server for the locations of the input and/or output data. Since these are held by each port defined by the client, we can use an API function to obtain the buffer pointers: void* jack_port_get_buffer(jack_port_t *port, jack_nframes_t nframes) which returns a pointer to a location that can be written to, or that holds data that we can read from. In the case of audio IO, the pointers are cast to the Jack floating-point audio sample type (jack_default_audio_sample_t*), which can then be used to access each sample in the buffer. The client-defined JackProcessCallback() is registered with the server using int jack_set_process_callback(jack_client_t *client, JackProcessCallback process_callback, void *arg) which takes in the client handle, the callback name, and the location of the user data arg to be passed to the callback. If successful, the registering function returns 0. Once the callback is set, we can start processing audio. For this, we need to activate the client, which is done through int jack_activate(jack_client_t *client) Note that, as in the Portaudio case, we will need to limit the code inside the callback to non-blocking operations in order to ensure smooth realtime operation.

11.2.4 Connecting Ports When a client is activated, it can connect to any ports in the server. From the client program itself, we can name a port to connect to, either for input or for output. The following function does this: 7

Other callbacks for a variety of operations can also be set; for more details, see http://jackaudio. org/api/index.html.

11.2 The Jack Connection Kit

149

int jack_connect(jack_client_t *client, const char *source_port, const char *destination_port) where ports are referred to by their full name. This is normally a concatenation of the client and port names, as in client name:port name. For the ports defined by the client, we can use the jack_port_name(const jack_port_t * port) function to get the full name of a port. The physical ports of a server are often named system:capture N for inputs and system:playback N for output, where N is the channel number.

11.2.5 Closing a Client When an application is about to exit, we should deactivate and then close its client(s). This is done using int jack_deactivate(jack_client_t *client) and int jack_client_close(jack_client_t *client)

11.2.6 Application Example The following example creates a simple program with one input and one output port, which applies a gain to the signal. It follows the principles outlined in the previous sections: 1. A client is opened: client = jack_client_open("MonoGain", JackNoStartServer, NULL); 2. Two ports are registered: state.inport = jack_port_register(client, "input", JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0UL); state.outport = jack_port_register(client, "output", JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0UL); 3. A callback is set:

150

11 Realtime Audio

jack_set_process_callback(client, jackProcess, (void*) &state); 4. The client is activated: jack_activate(client); 5. The ports are connected: jack_connect(client, "system:capture_1", jack_port_name(state.inport)); jack_connect(client, jack_port_name(state.outport), "system:playback_1"); The processing callback needs to access the ports to get the audio buffers, so we define a data structure to hold them. This also holds the gain value that is supplied by the user: typedef struct UDATA { jack_port_t *inport; jack_port_t *outport; float gain; } udata; The definition of the callback is fairly straightforward. The buffer pointers are obtained and a loop is used to apply the gain to the input signal, writing the result to the output: static int jackProcess(jack_nframes_t nframes, void *pp) { jack_default_audio_sample_t *in, *out; float gain; int n; udata *p = (udata *) pp; in = jack_port_get_buffer(p->inport, nframes); out = jack_port_get_buffer(p->outport, nframes); gain = p->gain; for (n = 0; n < nframes; n++) out[n] = in[n]*gain; return 0; } While the audio is being processed by the server, we need to keep the program open. In order to do so, we check the current time and loop until a set duration has elapsed:

11.2 The Jack Connection Kit

151

now = jack_get_time(); end += now; while(time < end) { usleep(500000); time = jack_get_time(); printf("%.2f \n", (time-now)/1000000.); } Time is measured in microseconds (1/1,000,000 sec, as noted earlier). Alternatively, we could have blocked the main program under scanf(), waiting for the user to close the program by pressing any key. Once the set duration is reached, the program proceeds to deactivate and close the client. The full code for the Jack gain program is shown in Listing 11.3. Provided that Jack is installed in the system (e.g. in /usr/local), we can compile it with the following command line: cc -o jgain jgain.c -I/usr/local/include \ -L/usr/local/lib -ljack Listing 11.3: Jack example program. #include #include #include #include

<jack/jack.h> <stdio.h> <stdlib.h>

#define MICROS 1000000 typedef struct UDATA { jack_port_t *inport; jack_port_t *outport; float gain; } udata;

static int jackProcess(jack_nframes_t nframes, void *pp) { jack_default_audio_sample_t *in, *out; float gain; int n; udata *p = (udata *) pp; in = jack_port_get_buffer(p->inport, nframes); out = jack_port_get_buffer(p->outport, nframes); gain = p->gain; for (n = 0; n < nframes; n++)

152

11 Realtime Audio

out[n] = in[n]*gain; return 0; } int main(int argc, const char **argv) { if (argc < 3) { printf("jgain gain dur \n"); } else { jack_client_t *client; client = jack_client_open("MonoGain", JackNoStartServer, NULL); if (client != NULL) { udata state; unsigned long end, time = 0, now; state.gain = atof(argv[1]); end = (unsigned long) (atof(argv[2])*MICROS); /* register input port */ state.inport = jack_port_register(client, "input", JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0UL); if (state.inport == NULL) { jack_client_close(client); printf("Could not open input port"); return -1; } /* register output port */ state.outport = jack_port_register(client, "output", JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0UL); if (state.outport == NULL) { jack_client_close(client); printf("Could not open output port"); return -1;

11.2 The Jack Connection Kit

} /* set process callback */ if(jack_set_process_callback(client, jackProcess, (void*) &state) != 0) { jack_client_close(client); printf("Could not set Jack callback"); return -1; } /* activate Jack */ if(jack_activate(client) != 0) { jack_client_close(client); printf("Could not start Jack processing"); return -1; } /* connect ports to system in and out */ if(jack_connect(client, "system:capture_1", jack_port_name(state.inport)) != 0) printf("could not connect %s automatically " "to system:capture_1 \n", jack_port_name(state.inport)); if(jack_connect(client, jack_port_name(state.outport), "system:playback_1") != 0) printf("could not connect %s automatically " "to system:playback_1 \n", jack_port_name(state.outport)); /* keep track of time */ now = jack_get_time(); end += now; while(time < end) { usleep(MICROS/2); time = jack_get_time(); printf("%.2f \n", (float)(time-now)/MICROS); } /* close client */ jack_deactivate(client);

153

154

11 Realtime Audio

jack_client_close(client); printf("closed Jack client \n"); return 0; } else { printf("Could not open Jack client\n"); return -1; } } return 0; } The program, as indicated by the usage message, takes in the gain to be applied and a duration, which will determine how long the program is to run for. In order to execute this program, we also need the Jack server to be running as the program will not be able to start the server by itself (the JackNoStartServer option has been used). It is possible however to enable that option to allow programs to get the server running if they need to, which might be more suitable in other applications.

11.3 Conclusions This chapter has concentrated on the principles of realtime audio IO. We selected a cross-platform library, Portaudio, and an audio server, Jack, as our main vehicles for exploring audio processing. These allow programs to be easily ported from one OS to another. We saw the two main modes of realtime IO operation, synchronous (push) and asynchronous (pull). While the latter allows for more reactive, low-latency, and realtime safe performance, the former is simpler conceptually, as it follows similar principles to other types of IO such as file access. We presented three examples, one demonstrating how we can read an ASCII stream from the standard input and send it to a DAC, another showing a low-latency audio effect, and a third demonstrating how to connect to a Jack server. Realtime audio is nicely complemented by interactive controls, and the next chapter will introduce a very important protocol that can be used to implement them.

Problems 11.1. Write a realtime-output sine wave synthesis program that takes the amplitude and frequency as parameters, in two versions: synchronous and asynchronous. 11.2. Write a program using libsndfile and Portaudio to play back a soundfile. 11.3. Write a version of the tremolo program to work with the Jack server.

Chapter 12

Realtime MIDI

Abstract The MIDI protocol is presented in this chapter as one of the typical ways in which realtime audio instruments can be controlled. The native MacOS API CoreMIDI is introduced as a system-dependent means of accessing MIDI devices. This is complemented by a discussion of cross-platform support for realtime MIDI, which is provided by Portmidi or Jack. MIDI (Musical Instrument Digital Interface) [47] is a long-established communication protocol. It can be used to control synthesisers and other musical equipment, as well as a range of software applications. Most OSs provide some form of MIDI support, some systems provide internal or built-in MIDI devices (either in software form or as part of the sound hardware). In this chapter, we will study how to program MIDI in C with the aim of developing realtime interactive applications.

12.1 The Protocol The MIDI protocol has the following fundamental characteristics: • • • •

It employs one-way transmission, from a MIDI OUT port to a MIDI IN port. The MIDI THRU port copies the data from the MIDI IN port. It uses 16 channels per port (or device). Start and stop bits frame an 8-bit byte of data (3125 10-bit bytes can be delivered per second over a physical MIDI connection). • It supports four channel modes: 1. Mode I : omni on/poly (omni mode): responds to any channel, polyphonically 2. Mode II: omni on/mono (mono mode): responds to any channel, monophonically. 3. Mode III: omni off/poly (multi mode): responds to specific channels, polyphonically.

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_12

155

156

12 Realtime MIDI

4. Mode IV: omni off/mono (mono mode): responds to specific channels, monophonically.

12.1.1 Hexadecimal Notation Revisited MIDI programs will often make use of hexadecimal constants, which we have already noted earlier in this book. Hexadecimal numbers are useful because each of them has a 4-bit range (0–15, 16 states). They are notated 0–9 A–F, as shown in Table 12.1. A byte can be written very compactly as two hexadecimal numbers. Some examples are presented in Table 12.2.

Table 12.1: Hexadecimal numbers. BASE 10 0–9 10 11 12 13 14 15

BASE 16 0–9 A B C D E F

Table 12.2: Bytes in base-2, 16 and 10. base 2 0000 0000 1111 1111 0000 1111 0001 0000 0111 1111

base 16 0x00 0xFF 0x0F 0x10 0x7F

base 10 0 255 15 16 127

12.1.2 MIDI Messages The following is an outline of the main message types defined by the protocol. We will be mostly interested in the channel messages, which are those that can be used to control the realtime operation of an application.

12.1 The Protocol

157

1. Channel Messages (Fig. 12.1): midi message: status byte + message byte1 + message byte21 • status byte: message type (4 bits) + channel (4 bits), always starting with a set bit (1xxx xxxx): – message type: 0x80 (NOTEOFF) – sent to signal a key up2 0x90 (NOTEON) – sent to signal a key down. 0xA0 (AFTERTOUCH) – encodes key pressure (monophonic). 0xB0 (CONTROL CHANGE) – sent by continuous controllers3 . 0xC0 (PROGRAM CHANGE) – sent to request a preset change. 0xD0 (POLY AFTERTOUCH) – key pressure (polyphonic). 0xE0 (PITCHBEND CHANGE) – sent by a pitchbend wheel. – channel: from 0x00 (1) to 0x0F (16) • message byte1, message byte2: these depend on message type and always start with a 0 bit (0xxx xxxx). The range of each byte is limited to 0-127 (7 bits). Table 12.3 shows the parameter details for each message type.

status

data 1

data 2

Fig. 12.1: MIDI channel message.

2. Global messages: System exclusive messages: status byte + manufacturer’s ID + data System-realtime messages. System-common messages.

1

In C, we can use the unsigned char type to represent a MIDI byte. Also NOTEON with data byte 2 (velocity) = 0. 3 Standard continuous controller numbers (data byte 1): 1 = modulation wheel; 2 = breath controller; 4 = adjustable foot-pedal; 5 = portamento time; 7 = volume; 8 = balance; 10 = pan; 11 = expression; and 121 – 127, channel-mode messages: reset, local control, all notes off, omni on, omni off, mono on, and poly on, respectively). 2

158

12 Realtime MIDI

status byte NOTE ON NOTE OFF AFTERTOUCH POLYTOUCH PITCHBEND PROGRAM CHANGE CONTROL CHANGE

Table 12.3: Channel message types. data byte 1 data byte 2 note number key velocity note number key velocity amount – note number amount amount (coarse) amount (fine) number – number amount

12.1.3 Packing and Unpacking the Status Byte To get the channel number or the midi message type from a midi status byte, we use a bitmask with a bitwise logic AND (&) operator. The bitmask for extracting the channel is 0x0F (or 0000 1111). The logic operation is status_byte & 0x0F; For example, 0000 1111 (mask) & 1000 0001 (NOTEON, channel 2, ------------------0000 0001 (channel 2, 0x01)

0x91)

To combine a channel number and a message type to make up a MIDI status byte, use a bit-wise OR (|) operator to combine the two numbers. For example, with a message type, say NOTEON (0x90), and the channel number, say channel 9 (0x08), we have message_type | channel; 1001 0000 | 0000 1000 ----------------1001 1000 (0x98)

12.2 MIDI Programming Basics As in the case of realtime audio, MIDI programming in C is also platform-dependent. Each system will have its own hardware and software interfaces, which will in general be different and incompatible with each other. The MIDI messages and the protocol of communication, of course, will be the same, but the means of sending and receiving data will depend on the OS.

12.2 MIDI Programming Basics

159

A system that supports MIDI programming will include libraries (compiled binary code) and an exposed API to access these. Libraries are there to provide access to and communication with hardware. As we have seen in the previous chapter, the API is the public face of these libraries: the functions, data structures, etc that are offered for applications that use MIDI. Often system-provided APIs are quite lowlevel, i.e. they provide fine-grained functionality, which sometimes makes their use more involved (i.e. more lines of code to achieve a particular effect). They will also provide services to cover all aspects of MIDI use, often offering more than we need. In modern operating systems, examples of such APIs are found in ALSA (Linux) and CoreMIDI (MacOS). While it is sometimes advantageous or necessary to write applications using system APIs, in most cases, it is probably best to use a higher level API, such as Portmidi, which will have the characteristic of being crossplatform. The portability of the code, plus the advantage of having to learn and deal with only one API is a great incentive for this. However, it is useful to look a little closer into a system API to understand a bit more about MIDI programming.

12.2.1 MIDI on MacOS As an example of a system API, we will look at developing a program that outputs MIDI using the CoreMIDI framework.

Frameworks First, a note about terminology: on MacOS, system libraries and APIs are called frameworks. These are present in the OS as directories containing the given (dynamiclink) library, header files and other resources. The name of these directories are given the extension .framework, which identifies them as such. Special MacOSspecific compiler flags are used to link to them. For MIDI, MacOS offers the CoreMIDI framework (located in /System/Library/Frameworks). Other frameworks that will be used in MIDI programming are CoreAudio (for timing functions) and CoreFoundation (for text strings). Header files should be in the format: #include For CoreMIDI, we have: #include To link to the framework, we use -framework framework name as in $ cc

...

-framework CoreMidi

160

12 Realtime MIDI

The CoreMIDI API CoreMIDI treats MIDI streams as separate destinations (for output) and sources (for input). Sources and destinations are offered by the various physical MIDI devices that a system can have. Each of these can have one or more streams. The full hierarchy in CoreMIDI, defined in CoreMidi/CoreMidi.h, is shown in Fig. 12.2.

device (physical)

-

entity (one or more)

- destination/source (one or more)

Fig. 12.2: CoreMIDI hierarchy.

Usually, the first thing we should attempt to do when learning about a MIDI API is to find a way of searching the system for MIDI devices (or here, destinations and sources). With CoreMIDI, as you would expect, it is possible to query a system about its devices, the entities in each of these, and the destinations and sources in each entity, which seems a little unwieldy. Thankfully, there is also a means of just checking for all destinations and all sources in a system, directly. Sources and destinations can also be virtual, i.e. created by applications, and CoreMIDI provides means of creating these. Since these would not be linked to any device, they would only appear on direct lists of sources/destinations (another reason for using this method of querying). In order to access a destination or source for IO, we need to create a MIDI client for our application. This handles general aspects of communication with devices that generally span the application lifetime. We will then create a port to process either input or output IO for this client. In our example here, we will create an output port. With this, we can then package and send MIDI messages to a destination. Note that the port is also application-wide and we can use it to send MIDI data to separate destinations. MIDI clients should be disposed of (i.e. closed) when we are finished with them, whereas ports do not need to be explicitly closed. In CoreMIDI, messages are packaged in MIDI packet lists. A MIDI packet contains the given MIDI bytes plus a timestamp value that will indicate when the MIDI message should be sent out. The timestamp unit is the host time, which can be obtained from the time in nanoseconds4 using a utility function (in the CoreAudio framework, CoreAudio/HostTime.g). We can also query the current host time to synchronise messages correctly. A timestamp of 0 indicates send message immediately. Time is kept in Uint64 (unsigned 64-bit integer) types. For instance, to convert a time in milliseconds5 to a timestamp, we have (NANOS is 1000000) now = AudioGetCurrentHostTime(); 4 5

1/1,000,000 millisec. 1/1000 sec.

12.2 MIDI Programming Basics

161

timestamp = now+AudioConvertNanosToHostTime(NANOS*msec); Packet lists can be built using functions provided by CoreMIDI. The steps are 1. Initialise the list: cur = MIDIPacketListInit(mlist); 2. Add a packet with a message: MIDIPacketListAdd(mlist,sizeof(buffer), cur,timestamp,3,mess); Once a packet list has been built, it can then be sent to a destination: endpoint = MIDIGetDestination(dest); MIDISend(mport, endpoint, mlist); If a new set of MIDI messages is to be sent, we need to build a new packet list, repeating the steps above, before sending it to the output. It is important that the port and client are still open/active up until the last MIDI message timestamp, otherwise some messages might not be sent. As a precaution, we can send NOTEOFF messages for each note, with timestamp of 0, before closing the client, to stop any hanging notes. Finally a word about some of the types used by CoreMIDI functions. Strings are expected to be placed in CFString objects (CoreFoundation.h), and there are functions to convert to and from C strings (null-terminated character arrays). MIDI messages are placed in unsigned char arrays, which in MacOS are defined as Byte. These and other types used (such as those for clients, ports, packets etc.) are fully discussed in the CoreMIDI reference documentation; please refer to it for more details.

Example The example in Listing 12.1 shows a simple program that demonstrates MIDI output using CoreMIDI. The program plays a chromatic scale starting from middle C (note number 60). It can be built with the following compiler options: cc -o cmidiout cmidiout.c \ -framework CoreMidi -framework CoreFoundation -framework CoreAudio Listing 12.1: CoreMIDI example. #include #include #include #include #include

<stdio.h>

\

162

12 Realtime MIDI

#define NANOS 1000000 #define MD_NOTEON 0x90 #define MD_NOTEOFF 0x80 int main(){ int k, endpoints, dest; CFStringRef name = NULL, cname = NULL, pname = NULL; CFStringEncoding defaultEncoding = CFStringGetSystemEncoding(); MIDIClientRef mclient = (MIDIClientRef) NULL; /* client object */ MIDIPortRef mport = (MIDIPortRef) NULL; /* port object */ MIDIEndpointRef endpoint; Byte buffer[1024]; MIDIPacketList *mlist = (MIDIPacketList *) buffer; Byte mess[3]; MIDIPacket *cur = MIDIPacketListInit(mlist); UInt64 timestamp, now, dur; OSStatus ret; /* MIDI client */ cname = CFStringCreateWithCString(NULL, "my client", defaultEncoding); ret = MIDIClientCreate(cname, NULL, NULL, &mclient); if(!ret){ /* MIDI output port */ pname = CFStringCreateWithCString(NULL, "outport", defaultEncoding); ret = MIDIOutputPortCreate(mclient, pname, &mport); if(!ret){ /* list destinations */ endpoints = MIDIGetNumberOfDestinations(); for(k=0; k < endpoints; k++){ endpoint = MIDIGetDestination(k); MIDIObjectGetStringProperty(endpoint, kMIDIPropertyName, &name); printf("destination %d = %s\n", k, CFStringGetCStringPtr(name, defaultEncoding)); } /* select destination */ dest = 0; printf("select destination number: "); scanf("%d", &dest);

12.3 MIDI Programming with Portmidi

163

dur = 1000; /* 1000 ms */ /* fill MIDI packet list */ for(k=0; k < 12; k++){ mess[0] = MD_NOTEON; mess[1] = 60+k; mess[2] = 40; now = AudioGetCurrentHostTime(); timestamp = now + AudioConvertNanosToHostTime(NANOS*k*dur); cur = MIDIPacketListAdd(mlist, sizeof(buffer), cur, timestamp, 3, mess); mess[0] = MD_NOTEOFF; mess[1] = 60+k; mess[2] = 40; timestamp = now + AudioConvertNanosToHostTime(NANOS*(k+1)*dur*2); cur = MIDIPacketListAdd(mlist, sizeof(buffer), cur, timestamp, 3, mess); } /* send messages */ endpoint = MIDIGetDestination(dest); MIDISend(mport, endpoint, mlist); /* wait for messages to play */ sleep(1+((k+1)*dur*2)/1000); } /* close MIDI client */ MIDIClientDispose(mclient); if(name) CFRelease(name); if(pname) CFRelease(pname); if(cname) CFRelease(cname); } return 0; }

12.3 MIDI Programming with Portmidi While CoreMIDI provides a very complete API for MIDI programming, programs using it will not be portable to other systems. For this reason, using a cross-platform library that is placed at a slightly higher level might be more useful in certain situations. One such library is Portmidi [12], a MIDI counterpart to Portaudio, which

164

12 Realtime MIDI

provides a common interface to the different platform-dependent MIDI implementations. A Portmidi program requires the following headers: #include <portmidi.h> #include <porttime.h> Before Portmidi is used, we need to call Pm_Initialize() to initialise the library. As part of this process, library code will query the system for existing logical devices. These can then be searched for and listed. The total number of devices can be found with Pm_CountDevices(). For each device registered with the library, we can get its details, stored in a PmDeviceInfo data structure: typedef struct { int structVersion; const char *interf; /*underlying API */ const char *name; /* device name */ int input; /* 1 if input */ int output; /* 1 if output */ int opened; } PmDeviceInfo; Using Pm_GetDeviceInfo() we can obtain the details of each MIDI device in the system. The complete code for listing output devices is int cnt, i; const PmDeviceInfo *info; if((cnt = Pm_CountDevices()) != 0){ for(i=0; i < cnt; i++){ info = Pm_GetDeviceInfo(i); if(info->output) printf("%d: %s \n", i, info->name); } } else printf("no device found\n"); which will print the name of all available devices to the terminal, allowing users to choose one.

12.3.1 Timers In order to guarantee the correct timing of MIDI messages, we will need to find a means of keeping track of time. The Porttime library, which accompanies Portmidi, offers a timer that can be used for that purpose. The timer is started using the following code, which should be called before attempting to open a device: Pt_Start(1, NULL, NULL);

12.3 MIDI Programming with Portmidi

165

Applications can choose to use their own timebase function. If so, this should be passed to the library when a device is being opened, as a callback.

12.3.2 Opening Devices As we have seen above, devices are identified using a numeric index. Similarly to the process we have seen before in Chapter 11, a pointer to an opaque handle is passed to the Pm_OpenOutput() function, which returns an error code that can be used to check for success. The prototype for this function is PmError Pm_OpenOutput( PortMidiStream** stream, PmDeviceID outputDevice, void *outputDriverInfo, int32_t bufferSize, PmTimeProcPtr time_proc, void *time_info, int32_t latency); Note that the library offers distinct functions for each direction (input or output), so a given stream can only be opened in one of them. The outputDriverInfo is normally NULL, and the buffer size determines the amount of buffering used for MIDI message output. Depending on the platform, Portmidi may not employ a buffer, and may simply pass the data directly to the lower-level MIDI system library. If the timing is not to be obtained from Porttime, then a timing callback can be passed (as time_proc, with an associated data space time_info); otherwise we just pass null pointers to both parameters. The latency field is used to add an extra time offset to the output messages (in milliseconds), and is normally 0. As an example, the following code opens an output device: int dev; PmError retval; PortMidiStream *mstream; retval = Pm_OpenOutput(&mstream, dev, NULL,512,NULL,NULL,0); if(retval != pmNoError) printf("error: %s \n", Pm_GetErrorText(retval)); When Pm_OpenOutput() returns successfully, the handle to the MIDI output stream is ready to be used.

166

12 Realtime MIDI

12.3.3 Output To output a MIDI channel message, we can use the function Pm_WriteShort(), which is designed for non-system-exclusive output, and thus is suited to our purposes in this chapter. Its prototype is PmError Pm_WriteShort(PortMidiStream *stream, PmTimestamp when, int32_t msg); Taking an open stream, it outputs a MIDI message encoded as an integer. The timestamp parameter is only used if we have defined a latency above 0 when opening the device. Otherwise, messages are sent immediately. If timestamps are used, they should be non-decreasing (i.e. the message sequence should be sorted in time before they are passed to successive function calls). The encoding of channel messages is assisted by the macro Pm_Message(), defined in portmidi.h as #define Pm_Message(status, data1, data2) \ ((((data2) << 16) & 0xFF0000) | \ (((data1) << 8) & 0xFF00) | \ ((status) & 0xFF)) with which we can pack the status and data bytes of a message into an integer variable. In addition to this, we can define another macro ourselves to pack a message type and a channel into a status byte: #define SBYTE(msg,chn) msg | chn To send messages at the correct time, we can call the Pt_Time() function to get the current device time and decide whether we need to output a message at that time. For instance to send messages to play a note for 1 second, we can do the following: time = Pt_Time(NULL); Pm_WriteShort(mstream, 0, Pm_Message(SBYTE(MD_NOTEON,chan), note, vel)); while(Pt_Time(NULL) - time < 1000) usleep(100); Pm_WriteShort(mstream, 0, Pm_Message(SBYTE(MD_NOTEOFF,chan), note, vel)); In this particular example, all we do is wait until the time is right to send the NOTEOFF message. In other applications, a more sophisticated time management approach might be needed. To close a MIDI output stream and finish using Portmidi, we can use the following functions: PmError Pm_Close(PortMidiStream* stream); PmError Pm_Terminate();

12.3 MIDI Programming with Portmidi

167

Example An example program is shown in Listing 12.2. It follows more or less the same lines as the MIDI generator in Sect. 12.2.1, but also includes a program change message that is sent before each NOTEON, selecting a different sound for each step of the scale. Listing 12.2: Portmidi output example. #include #include #include #include #define #define #define #define

<stdio.h> <portmidi.h> <porttime.h> MD_NOTEON 0x90 MD_NOTEOFF 0x80 MD_PRG 0xC0 SBYTE(mess,chan) mess | chan

int main() { int cnt,i,dev; PmError retval; const PmDeviceInfo *info; PortMidiStream *mstream; Pm_Initialize(); if(cnt = Pm_CountDevices()){ for(i=0; i < cnt; i++){ info = Pm_GetDeviceInfo(i); if(info->output) printf("%d: %s \n", i, info->name); } printf("choose device: "); scanf("%d", &dev); Pt_Start(1, NULL, NULL); retval = Pm_OpenOutput(&mstream, dev, NULL,512,NULL,NULL,0); if(retval != pmNoError) printf("error: %s \n", Pm_GetErrorText(retval)); else { char chan = 0; int prg = 0; long time = 0; for(i=60; i < 72; prg+=4, i++){ Pm_WriteShort(mstream, 0,

168

12 Realtime MIDI

Pm_Message(SBYTE(MD_PRG,chan), prg, 0)); time = Pt_Time(NULL); Pm_WriteShort(mstream, 0, Pm_Message(SBYTE(MD_NOTEON,chan), i, 120)); while(Pt_Time(NULL) - time < 1000) usleep(100); Pm_WriteShort(mstream, 0, Pm_Message(SBYTE(MD_NOTEOFF,chan), i, 120)); } } Pm_Close(mstream); } else printf("No available output devices\n"); Pm_Terminate(); return 0; } Assuming that Portmidi is installed in /usr/local, we can use the following command line to build this example: $ cc -o midiout midiout.c -I/usr/local/include \ -L/usr/local/lib -lportmidi

12.3.4 Input Most of the steps used in MIDI output can be retraced and modified for input. Searching for devices is just a matter of checking the isInput member of the device info structure. Opening the device uses Pm_OpenInput() instead of Pm_OpenOutput, with similar parameters: retval = Pm_OpenInput(&mstream,dev,NULL,512,NULL,NULL);

Polling for data The main difference between input and output in terms of programming is that we will need to be listening to the device for incoming messages. These are going to be intermittent and asynchronous. So we need a method to do this in a clean and efficient way. Portmidi implements polling, that is, querying the device for new data, which tells the program whether it needs to go and read it. The function Pm_Poll() returns true if there is data to be read, and false otherwise. We can check it regularly and proceed to call Pm_Read() if we need to: int Pm_Read(PortMidiStream *stream, PmEvent *buffer, int32_t length);

12.3 MIDI Programming with Portmidi

169

This function takes a stream and reads the incoming data into a buffer, which is an array of length items. Each one of these is a Pm_Event: typedef long PmTimestamp; typedef long PmMessage; typedef struct { PmMessage message; PmTimestamp timestamp; } PmEvent; The timestamp member will provide a non-decreasing value that can be used to determine the sequence of events. Each message is defined, as before, as a single item, and we can use the following macros to extract the individual MIDI status and data bytes from it: #define Pm_MessageStatus(msg) ((msg) & 0xFF) #define Pm_MessageData1(msg) (((msg) >> 8) & 0xFF) #define Pm_MessageData2(msg) (((msg) >> 16) & 0xFF) The incoming data is copied into a user-supplied buffer. The number of messages received is returned by Pm_Read() and can be used by the program to loop over the array data to retrieve each individual item.

12.3.5 A MIDI Synthesiser As an example of realtime interaction, we present here a very simple MIDIcontrolled synthesiser, which will respond to incoming NOTE messages and play a sine wave monophonically. Note that, since it has only the bare minimum components to make sound, it will not have any means of shaping the amplitude of the sound over time (envelopes), or responding to pitch bend controls. However, it will be simple enough to allow us to understand the principles developed in this chapter. We will use both the Portmidi and the Portaudio libraries to implement MIDI and audio IO. The design of this program is as follows. • The program is launched by the shell and will be kept running for 60 seconds (once it starts listening to MIDI). The user can optionally pass a parameter to keep the program open for a set duration in seconds. • The user is asked to select a MIDI device from a list. • The program uses the default output audio device. • Callback audio is used to allow low-latency operation. • A listening loop will keep the program open, polling for MIDI input: – If MIDI data is received, its status byte is checked. – NOTEON and NOTEOFF messages will be responded to by the program, setting the amplitude and frequency of a sine wave generator (running in the audio callback).

170

12 Realtime MIDI

In the main program, instead of solely counting out time (as in the audio effect example in Chapter 11), we will be listening for MIDI. When a message (or messages) comes in, we will respond to it if it matches what we are looking for. A NOTEON message supplies the current note number and velocity. A NOTEOFF message sets amplitude to zero if it also matches the current note (to turn it off). Because some devices send NOTEON with velocity (data byte 2) = 0 instead of NOTEOFF, we need to check for that as well: if(Pm_Poll(mstream)) { unsigned char data1, data2, status; cnt = Pm_Read(mstream, msg, 32); for(i=0; i
N−69 12

(12.1)

12.3 MIDI Programming with Portmidi

171

while the amplitude is just normalised to the range [0, 1.0]. The code for these operations and the sine wave synthesis is then fr = 440.*pow(2., (p->note - 69.)/12); amp = p->vel/128.; for(i=0; i < frameCount; i++, n++) outp[i] = amp*sin(n*TWOPI*fr/sr); As we have mentioned above, this is a very simple and rough implementation of synthesis. On NOTEON, sound will start immediately, and on NOTEOFF, it will stop dead. If a NOTEON is followed by another NOTEON, the pitch will jump to the next value, with no gliding or smoothing. All of these transitions will cause clicks in the output waveform, which in more advanced examples we will be able to avoid. The full code for this example is shown in Listing 12.3. Again, assuming that Portmidi and Portaudio are installed in /usr/local, we can use the following command line to build the program: $ cc -o midisynth midisynth.c -I/usr/local/include \ -L/usr/local/lib -lportmidi -lportaudio Listing 12.3: MIDI synthesiser example. #include #include #include #include #include #include

<stdio.h> <stdlib.h> <math.h> <portmidi.h> <porttime.h> <portaudio.h>

#define TYPEMASK 0xF0 #define MD_NOTEON 0x90 #define MD_NOTEOFF 0x80 typedef struct udata { unsigned char vel; unsigned char note; float sr; unsigned long n; } UDATA; int audio_fn(const void *input, void *output, unsigned long frameCount, const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags, void *userData); int main(int argc, const char *argv[]) {

172

12 Realtime MIDI

int cnt,i,dev; PmError retval; const PmDeviceInfo *info; PmEvent msg[32]; PortMidiStream *mstream; PaError err; PaStreamParameters param; PaStream *handle; int bufsize = 128, sr = 44100; UDATA udata; unsigned long end = (argc > 1 ? atof(argv[1]) : 60)*1000; unsigned char note = 0; Pa_Initialize(); Pm_Initialize(); dev = Pa_GetDefaultOutputDevice(); param.device = (PaDeviceIndex) dev; param.channelCount = 1; param.sampleFormat = paFloat32; param.suggestedLatency = (PaTime) (bufsize/(double)sr); param.hostApiSpecificStreamInfo = NULL; udata.sr = sr; udata.n = 0; udata.note = 0; udata.vel = 0; cnt = Pm_CountDevices(); if(cnt == 0) { printf("No available MIDI devices\n"); return 1; } for(i=0; i < cnt; i++){ info = Pm_GetDeviceInfo(i); if(info->input) printf("%d: %s \n", i, info->name); } printf("choose device: "); scanf("%d", &dev);

12.3 MIDI Programming with Portmidi

173

err = Pa_OpenStream(&handle,NULL,¶m, sr,bufsize,paNoFlag, audio_fn, &udata); if(err != paNoError) { printf("Error opening audio output\n"); Pa_Terminate(); Pm_Terminate(); return 1; } Pt_Start(1, NULL, NULL); retval = Pm_OpenInput(&mstream, dev, NULL, 512, NULL, NULL); if(retval != pmNoError) { printf("error: %s \n", Pm_GetErrorText(retval)); Pa_CloseStream(handle); Pa_Terminate(); Pm_Terminate(); return 1; } Pa_StartStream(handle); while(Pt_Time(NULL) < end){ if(Pm_Poll(mstream)) { unsigned char data1, data2, status; cnt = Pm_Read(mstream, msg, 32); for(i=0; i
174

12 Realtime MIDI

Pa_CloseStream(handle); Pa_Terminate(); Pm_Terminate(); return 0; } #define TWOPI 6.283185307179586 int audio_fn(const void *input, void *output, unsigned long frameCount, const PaStreamCallbackTimeInfo *timeInfo, PaStreamCallbackFlags statusFlags, void *userData){ int i; UDATA *p = (UDATA *) userData; float *inp = (float *) input, *outp = (float *) output; float fr, amp, sr = p->sr; unsigned long n = p->n; fr = 440.*pow(2., (p->note - 69.)/12); amp = p->vel/128.; for(i=0; i < frameCount; i++, n++) outp[i] = amp*sin(n*TWOPI*fr/sr); p->n = n; return paContinue; }

12.4 MIDI on Jack As introduced in Sect. 11.2, the Jack Connection Kit is an API and a media server that can be used to connect applications to physical and software endpoints. As with audio, the Jack server provides a space where applications can open clients, which are made available for IO to/from other clients. The API for MIDI is very similar to what has already been explored in the realtime audio case. Similar steps need to be performed, namely: 1. Opening a client (jack_client_open(). 2. Registering ports (jack_port_register()), using a port with its type set to JACK_DEFAULT_MIDI_TYPE. 3. Setting a callback (jack_set_process_callback()). 4. Activating the client (jack_activate()). 5. Optionally, connecting to other clients (jack_connect()). 6. When done, the client should be deactivated and closed (jack_deactivate() and jack_client_close()).

12.4 MIDI on Jack

175

The main difference is that MIDI data has a different format from audio, and will be accessed in the callback using a different means, although we will still look to get the data from a port, as before. A MIDI event is encapsulated by the data structure jack_midi_event_t, which has the following members: • jack_nframes_t time: time reference for the MIDI event, in frames. • size_t size: buffer size. • jack_midi_data_t *buffer: MIDI message data bytes. The size of the MIDI event will be determined by its message type. For channel messages, it will be either two or three bytes. The first item in the buffer will be the status byte, followed by one or two data bytes. To obtain an event from an input port, we use: int jack_midi_event_get(jack_midi_event_t *event, void *port_buffer, uint32_t event_index) This retrieves an event, from a port_buffer, indexed by event_index. When the process callback is called, there may be one or more events in the port buffer. By incrementing the index, starting from 0, we can retrieve all events, one by one; the function will return 0 on success. When there are no events left, ENODATA is returned. MIDI data is sent as individual bytes (jack_midi_data_t) to the output port. This is done through int jack_midi_event_write(void *port_buffer, jack_nframes_t time, const jack_midi_data_t *data, size_t data_size) MIDI messages of data_size length should be written as a complete event (e.g. status bytes followed by one or two message bytes), and can be sorted by a time offset in frames. However, if offsets are given, messages need to be written in ascending time order, as Jack will not sort them, and will not store out-of-order events. If a program is processing audio and MIDI at the same time (as in the MIDI synth example in Sect.12.3.5), then it makes sense to pick up the MIDI input data in the same processing callback as that used for the audio data. This will be an optimal arrangement, which will not require any control data sharing between the main program and the callback. Moreover, since the MIDI data coming in may also have a time offset which will align the message to a specific sample in the audio buffer (something that we did not provide for in the Portaudio/Portmidi example).

176

12 Realtime MIDI

12.4.1 Example As we have seen in Chapter 11, a characteristic of Jack operation is that its processing is asynchronous. Therefore, alongside the main program, we will have a processing thread managed by the server that runs parallel to it. Because of this, if we want to access the data that is sent to the client in our main program, we will need to proceed carefully. In particular, we will want to avoid problems with access from two separate threads to the same memory location. Equally important is to ensure that any communication does not block the callback and that realtime-safe operation is therefore ensured. In the example in Sect.12.3.5, since we were sharing single bytes of data, where one thread was writing and another one reading, we were able to implement a simplistic approach. Although we could potentially have mismatching data bytes being read in the callback thread, this is probably very unlikely. At this point, however, it is worth introducing a more robust approach to dealing with data being shared between two threads. The idea is still that the callback can place MIDI data in memory and the main program can read it from that location, but we will try to synchronise access to avoid concurrency issues known as data races. For this, we will employ a circular buffer (or queue). This is a data structure made up of an array which is written to and read from in a circular fashion, using a first-in first-out (FIFO) access sequence. With a block of memory to write and read data to/from, we will need for a single-writer, single-reader queue three counting variables: 1. A writer position tracker. 2. A reader position tracker. 3. The number of items waiting in the queue. The position trackers will be incremented modulo queue size, to implement circular access, so that when they reach the end of the array, their position is reset to the start. The number of items will be incremented on the writer side and decremented on the reader side, and will account for the items written but not read (Fig. 12.3). Since these two operations are not synchronised, we will need to use atomic operations to ensure that the order of operations is strictly enforced. Atomic access guarantees that only one side can modify the variable at one time, whereas ordinary access cannot ensure this. So, if we are incrementing and decrementing a variable, there is a possibility that the two operations may be attempted concomitantly, which may lead to undefined results (due to a data race). The C11 standard [24] defines the type qualifier _Atomic, which marks a variable as having atomic access. Such a variable can then be used with the various atomic functions provided by the header stdatomic.h. We will use three of these: unsigned int atomic_load(_Atomic unsigned int *obj) unsigned int atomic_fetch_add(_Atomic unsigned int *obj, int op)

12.4 MIDI on Jack

-

177

rp ?

wp ?

 items Fig. 12.3: Circular buffer.

unsigned int atomic_fetch_sub(_Atomic unsigned int *obj, int op) The first of these reads from the atomic variable, and the other two increment and decrement its value, respectively. They will guarantee that the variable is only accessed in the respective thread where they are called at any given time. Alongside the item count, we will be able to increment, independently, the writer and reader positions. The latter is only incremented if there are any items to be read, and the former will also only be incremented if there is space available in the buffer. If there is not, the data is discarded. In situations where there is no realtime pressure, we can block until there is space; in this case, however, nothing should block the processing callback, and so the function just carries on without writing to the buffer. The following excerpt from the process callback demonstrates this. The variable wp tracks the writing position, and items is a pointer to the atomic item counter. Note that if the buffer is full, we just drop the data, but do not block the operation, to ensure realtime safety: while(jack_midi_event_get(&event, jack_port_get_buffer(in,nframes), i++) == 0) { /* echo input */ jack_midi_event_write( jack_port_get_buffer(out,nframes), event.time, event.buffer, event.size); /* check for overflow */ if(atomic_load(items) < JACK_MIDI_BUFFSIZE) { buf[wp] = event; wp = wp + 1 != JACK_MIDI_BUFFSIZE ? wp + 1 : 0; atomic_fetch_add(items, 1); } } Likewise, the reading side in the main program implements a loop that checks whether any items are waiting in the buffer, reads them, increments the reader position rp and decrements the atomic variable state.items:

178

12 Realtime MIDI

while(atomic_load(&state.items)) { int size = state.buf[rp].size; int offs = state.buf[rp].time; jack_midi_data_t *mdata = state.buf[rp].buffer; ... rp = rp + 1 != JACK_MIDI_BUFFSIZE ? rp + 1 : 0; atomic_fetch_sub(&state.items, 1); } This simple example prints the MIDI data to the terminal, and copies its input into the output. It runs for a set duration given in the command line. To compile it, we need the presence of the Jack library and headers: cc -o jmidi jmidi.c -I/usr/local/include \ -L/usr/local/lib -ljack The complete source code for this example is shown in Listing 12.4. Listing 12.4: Jack MIDI example. #include #include #include #include #include

<jack/jack.h> <jack/midiport.h> <stdio.h> <stdatomic.h>

#define JACK_MIDI_BUFFSIZE 1024 #define MICROS 1000000 typedef struct UDATA { jack_port_t *inport; jack_port_t *outport; jack_midi_event_t buf[JACK_MIDI_BUFFSIZE]; _Atomic unsigned int items; unsigned int wp; } udata;

static int jackProcess(jack_nframes_t nframes, void *pp) { udata *p = (udata *) pp; jack_midi_event_t event; jack_midi_event_t *buf = p->buf; int wp, i = 0; jack_port_t *in = p->inport; jack_port_t *out = p->outport;

12.4 MIDI on Jack

_Atomic unsigned int *items = &p->items; wp = p->wp; while(jack_midi_event_get(&event, jack_port_get_buffer(in,nframes), i++) == 0) { /* echo input */ jack_midi_event_write( jack_port_get_buffer(out,nframes), event.time, event.buffer, event.size); /* check for overflow */ if(atomic_load(items) < JACK_MIDI_BUFFSIZE) { buf[wp] = event; wp = wp + 1 != JACK_MIDI_BUFFSIZE ? wp + 1 : 0; atomic_fetch_add(items, 1); } } p->wp = wp; return 0; } int main(int argc, const char **argv) { if (argc < 2) { printf("jmidi dur\n"); } else { jack_client_t *client; int rp = 0; unsigned int items = 0; unsigned long end, time = 0, now; client = jack_client_open("MIDIMon", JackNoStartServer, NULL); if (client != NULL) { udata state; unsigned long end, time = 0, now; end = (unsigned long) (atof(argv[1])*MICROS); state.items = 0; state.wp = 0;

179

180

12 Realtime MIDI

/* register input port */ state.inport = jack_port_register(client, "input", JACK_DEFAULT_MIDI_TYPE, JackPortIsInput, 0UL); if (state.inport == NULL) { jack_client_close(client); printf("Could not open input port"); return -1; } /* register output port */ state.outport = jack_port_register(client, "output", JACK_DEFAULT_MIDI_TYPE, JackPortIsOutput, 0UL); if (state.outport == NULL) { jack_client_close(client); printf("Could not open output port"); return -1; } /* set process callback */ if(jack_set_process_callback(client, jackProcess, (void*) &state) != 0) { jack_client_close(client); printf("Could not set Jack callback"); return -1; } /* activate Jack */ if(jack_activate(client) != 0) { jack_client_close(client); printf("Could not start Jack processing"); return -1; } now = jack_get_time(); end += now; while(time < end) {

12.4 MIDI on Jack

181

time = jack_get_time(); while(atomic_load(&state.items)) { int size = state.buf[rp].size; int offs = state.buf[rp].time; jack_midi_data_t *mdata = state.buf[rp].buffer; printf("%.2f : %d : ", (float)(time-now)/MICROS, offs); switch (*mdata & 0xF0) { case 0x80: printf("NOTEOFF"); break; case 0x90: printf("NOTEON"); break; case 0xA0: printf("POLYAFTOUCH"); break; case 0xB0: printf("CTLCHG"); break; case 0xC0: printf("PGMCHG"); break; case 0xD0: printf("AFTOUCH"); break; case 0xE0: printf("PBEND"); break; } printf(" : CHAN %d : ", *mdata++ & 0x0F); size--; while (size--) printf("%d :", *mdata++); printf("\n"); rp = rp + 1 != JACK_MIDI_BUFFSIZE ? rp + 1 : 0; atomic_fetch_sub(&state.items, 1); } } /* close client */ jack_deactivate(client); jack_client_close(client);

182

12 Realtime MIDI

printf("closed Jack client \n"); return 0; } else { printf("Could not open Jack client\n"); return -1; } } return 0; }

12.5 Conclusions This chapter concludes the first part of our journey, from the shin program to realtime audio synthesis using the C language. We were able to cover all of the language syntax and semantics, plus a few key libraries. As far as C is concerned, this is of course only the beginning, as mastering it depends on quite a bit of practice, as well as some knowledge about the right APIs for a particular job (if anything, to stop us from reinventing the wheel, but also to be able to access some important system resources). So it is absolutely essential to be able to consult documentation (for instance, the system manual with the command man) and to follow it up. At this point, we should have built enough understanding of how the language and the systems that underpin it work to allow us to do that when we need to. In the next part of the book, we will take a detour and move to a different language, C++, and programming paradigm, object orientation. However, we will do this in a continuous manner, introducing this new environment as a superset of what we have become familiar with in this part of the book.

Problems 12.1. Using the MIDI synthesiser example as a starting point, implement an added tremolo effect to the synthesis, whose amplitude (effect amount) is controlled by the modulation wheel (controller number 1) and whose frequency is controlled by another control change message, from a different controller number. 12.2. The MIDI synthesiser example produces a very simple waveform (a sine wave), which is composed of a single harmonic. How could you add more harmonics to this waveform? Design a program that would allow the user to control the number of harmonics in the sound using the modulation wheel.

Part II

Object-Oriented Audio in C++

Chapter 13

Oscillators

Abstract This chapter discusses one of the fundamental components of computer music instruments, the oscillator. It explores this first from the perspective of a sinusoidal signal generation, discussing the concepts of phase, frequency, and sampling increment, and then introduces the principles of table lookup. Alongside this, we deal with the foundations of object-oriented programming, demonstrating how they can be employed to model sound computing components, such as the oscillator. As part of this, we swiftly move from C to the C++ language, introducing some of its basic elements. Oscillators are used primarily to generate periodic signals, such as waveforms. Starting with the simplest of signals, the sinusoidal wave, we will introduce some key concepts that will allow us to design and implement such generators. As we have seen before, sine waves can be generated by invoking the sin() function of the standard C library (defined in math.h). This function takes an angle (or phase) and computes its sine value, which is equivalent to the length of the opposite side of a right triangle with its hypotenuse measuring 1. To generate a sine wave, we make the angle increase at a given rate, determined by the ratio of the desired frequency f and the signal sampling rate fs . Since the sine function is periodic in 2π , we will use this to scale the phase values as they increase. The full expression for the varying phase becomes 2π f fts , where t is the time in samples. This is translated into C code as s[n] = sin(2*pi*f*n/sr); Such an implementation will work, as we have already shown, but only in cases in which the frequency f does not change, e.g. in a glide/glissando, vibrato, etc. We may have noticed this very clearly in the MIDI synthesiser example in Sect. 12.3.5, where a change in frequency from one note to another causes a click, before stabilising as the frequency becomes fixed again. As the sine function takes in an angle as input, we need to compute it accurately for each sample. To do that for an arbitrarily-varying frequency, we need to integrate it, as in [36]

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_13

185

186

13 Oscillators

 s(t) = a(t) sin 2π





f (t)dt

(13.1)

This allows the frequency to assume different instantaneous values at each sample. To do this integration in a digital signal, we keep an account of the previous phase and add to it a sampling increment based on the currently calculated frequency (scaled by 2fπs ). The C code becomes s[n] = sin(ph); ph += 2*pi*f/sr; It is trivial to show that, for the fixed-frequency case, the two code fragments are equivalent. However, if f varies, the first example will produce an incorrect output. Thus, a C implementation of such function would have to take account of the sample-by-sample phase values that are produced by the integration of the timevarying frequency f (n). To make it more widely available to a program, we can turn this into a function, but the current value of the phase would be kept externally to it, and modified as a side-effect (Listing 13.1). The memory address of this variable is passed to the function as a pointer: Listing 13.1: C function implementing Eq. 13.1. #define twopi 6.283185307179586 double sineosc(double a, double f, double *ph, double sr){ double s = a * sin(*ph); *ph += twopi * f / sr; return s; } For each independent oscillator that we would like to have, we will need to provide a separate variable to hold the current phase. For example, two oscillators playing two sine waves at 220 and 375 Hz would be programmed as follows: Listing 13.2: Program using the function in Listing 13.1. int main() { double ph1 =0., ph2 =0.; int i; double sr = 44100.; for(i = 0; i < sr; i++) printf("%f \n", sineosc(0.2, 220., &ph1, sr) + sineosc(0.4, 375., &ph2, sr)); return 0; } We can see that this is a little awkward, since we need to remember to keep track of the phase of each oscillator. It would be much better if we could package up the oscillator and the memory it needs (called its state) together in one programming object. The good news is that we can. We could place the oscillator and the phase

13.1 Moving to C++

187

variable in a new type defined by a data structure. C only allows functions in data structures as pointers and so we need to do something like this: Listing 13.3: Self-contained oscillator types. #include <math.h> #include <stdio.h> typedef struct _osc_ { double ph; double sr; double (*process)(struct _osc_ *, double, double); } OSC; #define twopi 6.283185307179586 double sineosc(OSC *p, double a, double f){ double s = a * sin(p->ph); p->ph += twopi * f / p->sr; return s; } int main() { int i; OSC osc1 = { 0., 44100., sineosc}, osc2 = { 0., 44100., sineosc}; for(i = 0; i < osc1.sr; i++) printf("%f \n", osc1.process(&osc1, 0.2, 440) + osc2.process(&osc2, 0.4, 375)); return 0; } This looks much better, as we have packaged everything together in one single data type, which we can instantiate many times. We can improve on this, but in order to do so, we will need to move language, to C++.

13.1 Moving to C++ C++ [62, 63] is, depending on which angle we approach it from, a completely different language to C, or an extension to it, as its name (C-increment) implies. It is also a big language, which stands opposed to the simplicity (and elegance) of C. In this and the following chapters, we will follow a route that takes it as an extended version of the C language. We will not hope to cover every single aspect of the language as we were able to do with C, but we will learn the most sane and proper, and those that will allow us to program music applications conveniently. The language

188

13 Oscillators

devices will be introduced as we need them. Note also that the compiler command to be used from now on will now be c++ rather than cc to reflect the change in language. Most of the C programs we have seen before will also be valid C++ code. We can continue using the C libraries and most of its syntax. In the case of the standard library, the only difference is that we normally employ the C++ versions of its headers. These have an added ‘c’ prefix and no ‘.h’ extension. For example, the C header file stdio.h becomes cstdio in C++.

13.1.1 C++ Structures The first main extension we would like to introduce is a significant change in how structures work to define new types: 1. Variables instantiated from data types defined by structures do not need the struct keyword to precede them. Once they are defined, they can be instantiated just as any other variables types are. 2. Functions are allowed in structures. These are called member functions or methods in this context. 3. Members can belong to instances of structures (objects), which is the general case, or to the structures themselves (and to no specific instance in particular). In this case they are marked as static. 4. Non-static methods may access directly all variables defined in the structure (called member variables or attributes in this context). 5. Structures can contain a special method called a constructor, which is used to initialise a variable (also called an object in this context). Constructors have the same name as the structure and are declared with no return type. They can have any number of parameters, like any other method (including zero). If the structure does not declare a constructor, the compiler will supply a default one, with no arguments and no function body. So, with these extensions, we can rewrite our oscillator code more conveniently in C++: Listing 13.4: C++ version of Listing 13.3. #include #include const double twopi = 8. * atan(1.); struct Osc { double ph; double sr; Osc() : ph(0.), sr(44100.) { }; double process(double a, double f){ double s = a * sin(ph);

13.1 Moving to C++

189

ph += twopi * f / sr; return s; } }; int main() { int i; Osc osc1, osc2; for(i = 0; i < osc1.sr; i++) printf("%f \n", osc1.process(0.2, 440) + osc2.process(0.4, 375)); return 0; } As can be seen, the code simplifies somewhat. The data type is more compactly described and we do not need to pass in pointers to the function, or initialise a function pointer. In the processing method, we can access and modify the struct variables directly. The constructor declaration requires some explanation: Osc() : ph(0.), sr(44100.) { }; In C++, every single type has a constructor. This includes the fundamental builtin types we have already seen in the C language, which are also called trivial or trivially constructed. So a double, will have a double(double x) constructor built into the language, which constructs a double variable with initial value x. We can invoke it by calling the name of the variable followed by the initialisation parameter, e.g. ph(0.) initialises the double ph member variable. A constructor then has the form: struct-name ( argument-list ) : member-initialisation-list { body } and the member initialisation list is a comma-separated series of calls to constructors of each member variable. The function body and argument list can be empty (as in the present example). We can also declare the constructor to take in parameters to initialise the object: Osc(double phs, double esr) : ph(phs), sr(esr) { }; If we declare an object with no initialisation parameters, the default constructor that takes no parameters is used. If only a constructor that takes parameters is provided, the object will be required to be initialised with those parameters (the compiler will complain otherwise). Also note in Listing 13.4 that the headers for C standard library functions are named slightly differently in C++ (although the C headers would generally also work here). We also introduced the const keyword, which is used to indicate that a constant (a read-only object) is created, rather than a variable.

190

13 Oscillators

C++ structures are our first step into object-oriented programming, which, as we will see, is a very convenient way of programming audio and music applications. The idea is that we can create fully-fledged new types, from which any number of objects can be instantiated and manipulated. The example in Listing 13.4 demonstrates the idea fully: a type that encapsulates the model of a sine wave oscillator, with a method to manipulate it (i.e. generate audio).

13.1.2 Overloading and Optional Parameters Another feature of C++ that can prove very useful for us is the possibility of supplying the same function name with different implementations for different argument types. For instance, it is legal to declare double process ();// no arguments double process(double amp);// one argument double process(double amp, double freq);//two arguments For each one of these we will provide a separate implementation. We could, for instance, modify our oscillator structure design to incorporate amplitude and frequency as member variables, and then provide different implementations for fixed or varying parameters: struct Osc { double fr; double amp; double ph; double sr; Osc(double a, double f) : amp(a), fr(f), ph(0.), sr(44100.) { }; double process(double a, double f){ amp = a; fr = f; double s = a * sin(ph); ph += twopi * f / sr; return s; } double process(double a) { amp = a; return process(amp, fr); } double process(){ return process(amp, fr); } }; The user can then decide which one is needed, depending on whether the frequency, the amplitude, or both need to change. Constructors can also be overloaded,

13.1 Moving to C++

191

if we want to create objects with slightly different parameter configurations, or a default constructor in addition to a constructor taking parameters. In complement to this, we can make some or all arguments have default values, which are used if a parameter is not supplied: double process(double amp = 0.5, double freq = 440.); This can be used in a constructor to allow for some parameters to be optional; for example, Osc(double a, double f, double phs = 0., double esr = 44100.) : amp(a), fr(f), ph(phs), sr(esr) { }; Optional arguments need always to be towards the right (or the end) of the parameter list. For instance, the first is not allowed to be optional if the second is not, as the semantics would not be clear in this case.

13.1.3 Memory Management C++ has three built-in memory management operators: new, delete, and delete[]. These replace the C library functions malloc() and free(). The two memory management systems should not be used interchangeably, and in C++ we should adopt the language standard operators. An object can be dynamically allocated with the following syntax: Osc *oscil = new Osc(0.5,440.); Since this is a pointer, we need to use the correct syntax to access its members: oscil->process(); When we are done with it, we dispose of the memory using delete oscil; One important reason for using new and delete is that this mechanism allows for correct object construction in all cases. It also implements destruction, which is the opposite process, when memory is disposed of and resources freed. As you might expect, a structure will also have a special method to do this, called a destructor. We do not need to define this in many cases, unless we ourselves have allocated memory or used any other resources that need to be freed (e.g. file handles, etc.). The compiler will provide a default destructor for each structure that does not define one. However, if we need to implement this, the signature for this method is ∼struct-name ( )

192

13 Oscillators

that is, it is the structure name with a ∼ in front of it and takes no parameters. Finally, we can also create arrays of objects dynamically using a slightly different syntax: double samples = new double[size] where size is an integer variable (or a constant). Memory deallocation is effected with the second version of delete: delete[] samples. We need to make sure that the correct version of this operator is used. With these new C++ extensions, we can now proceed to designing a fully-fledged oscillator.

13.2 The Table Lookup Oscillator The sinusoidal wave oscillator that we have been exploring so far has a couple of limitations. It does not allow us to generate an arbitrary waveform, and it makes one function call per output sample, which is not very efficient. So we can improve on this by designing a more flexible and general algorithm: the table lookup oscillator, which generates a vector of samples. The idea of table lookup is that we have a memory block, which is a table of values, containing the output of some pre-computed function (e.g. a sine or any other shape). The table has a size, which is the number of values in memory and we can read it (look it up) to get the value of a function given an input argument, which is an index of a position in the table. In programming terms, we have an array, which we initialise with a set of values, and the oscillator uses it instead of calling a function directly. The algorithm is defined by a couple of equations: s(t) = a(t)T(θ (t) mod N)

θ (t + 1) = θ (t) + f (t)

N fs

(13.2) (13.3)

You will recognise that the function T (), the table lookup, replaces the sin() function in our previous oscillator design. Also, because the phase θ (t) has to be within the bounds of the table used, we need to apply a mod N operation to it, as we perform the lookup (N as the table size). That will keep the index between 0 and N − 1, if it is below or above this range. Since the function ranges over these bounds, we will scale the frequency by Nfs instead of 2fπs . Also, given that we are looking up an array, the index has to be a whole number. For this we need a floor operation (x). Now we have a couple of modifications to make to our previous oscillator code, such as

13.2 The Table Lookup Oscillator

193

double process(double a, double f){ amp = a; fr = f; double s = a * table[(int)ph] ph += size * f / sr; while(ph >= size) ph -= size; while(ph < 0) ph += size; return s; } to realise the table lookup oscillator, using a double table array as a function table with int size pre-computed values. Both of these variables are assumed to be in the scope of this method, placed inside the structure that will hold it. To complete the algorithm, we will want to process a whole block of samples (a vector) instead of a single sample per function call. This is a more efficient way to proceed when computing audio [13]. Processing vectors will require us to loop over the output array to fill it: const double *process(double a, double f){ double incr = size * f / sr; amp = a; fr = f; for(int i = 0; i < vsize; i++){ s[i] = amp * table[(int)ph]; ph += incr; while(ph >= size) ph -= size; while(ph < 0) ph += size; } return s; } We are assuming that the array s exists inside the data structure (i.e. the object holds its output), and that it has size int vsize (also a member variable). Note also that since the frequency fr can change at most once every vsize samples, we can move the calculation of the amount of phase update needed (the increment) to outside of the processing loop. This saves a few operations per sample. In this code we have also introduced a couple of programming devices we have not yet used: • C++ allows us to declare a local variable, whose scope is limited to the loop body, in the for initialiser. Note that although we have not used this before, it is a feature that is also present in the C99 standard. • The function signature contains the const keyword. In this case, it means that we are returning a pointer to const double. It does not mean that the pointer itself is a constant, but that the data it is pointing at cannot be changed; the double array returned is read only. This is good practice since we want to prevent the oscillator output being modified externally. We now have all the pieces that we need to create an Osc type that implements a general-purpose table-lookup oscillator:

194

13 Oscillators

Listing 13.5: Table-lookup oscillator. struct Osc { double fr; double amp; const double *table; unsigned int size; double ph; double *s; unsigned int vsize; double sr; Osc(double a, double f, const double *t, unsigned int sz, double phs = 0., unsigned int vsz = 64, double esr = 44100.) : amp(a), fr(f), table(t), size(sz), ph(phs), s(new double[vsz]), vsize(vsz), sr(esr) { }; ∼Osc() { delete[] s; } const double *process(double a, double f){ double incr = size * f / sr; amp = a; fr = f; for(int i = 0; i < vsize; i++){ s[i] = amp * table[(int)ph]; ph += incr; while(ph >= size) ph -= size; while(ph < 0) ph += size; } return s; } const double *process(double a) { amp = a; return process(amp, fr); } const double *process(){ return process(amp, fr); } }; Note that in this code we have employed all the C++ devices we have learned so far: • Overloaded methods: process() can be called in three different ways.

13.2 The Table Lookup Oscillator

195

• Default parameters: the constructor has a number of defaults, so that the user does not need to provide them in most cases. • Read-only variables: the table should not need to be modified by the oscillator, so we make it read-only. The output of process(), as we’ve outlined above, is also read-only. • The output vector is created dynamically, since we do not know at compile time what size it will be. We use new to allocate it, initialising the pointer. • Now the structure has some resources it needs to manage, so we have to supply a destructor, which calls delete[] to free the array (otherwise we would have a memory leak). Since we have no built-in function table, we now need to supply one for this object. Any periodic function will do, but, of course, if we are generating audio, we should be trying to provide band-limited waveforms, rather than na¨ıve geometric shapes. The simplest way is to use a Fourier series [18, 36, 56], summing sinusoidal waves. The example below creates a table with two harmonics and uses an Osc object to generate an output based on this: Listing 13.6: Synthesis example. #include #include const double twopi = 8. * atan(1.); int main() { const unsigned int size = 10000; double tab[size]; const double *out; Osc osc(0.5, 440., tab, size); for(int i=0; i < size; i++) tab[i] = 0.5*(sin(i*twopi/size) + sin(2*i*twopi/size)); for(int i = 0; i < osc.sr; i+=osc.vsize){ out = osc.process(); for(int j = 0; j < osc.vsize; j++) printf("%f \n", out[j]); } return 0; } To build the program, first the Osc class needs to be added to the code in Listing 13.6, and then we can compile this file with the c++ command: c++ -o osc osc.cpp

196

13 Oscillators

Figure 13.1 shows a plot of the output of this program. We can clearly see that the presence of two partials creates a wave shape that is different from a simple sine wave. The plot shows 200 samples, which is just short of two periods at 440 Hz.

0.4

0.2

0.0

−0.2

−0.4 0

50

100

150

200

Fig. 13.1: A plot of the oscillator output from Listing 13.6.

13.3 Conclusions Oscillators are the workhorses of digital synthesis. The basic algorithm can be used to produce any type of periodic waveform. It can be used for sampled-sound playback (if we replace the single-waveform table by a whole block of recorded sound) and for envelope generation (if we use an envelope shape as the function table and adjust the frequency to be the inverse of the envelope duration). We have shown that oscillators have state and that keeping it packaged in a structure is a very good idea. To do this in a convenient form, we have upgraded our implementation language from C to C++ and introduced some relevant programming devices. We will continue on this path in the following chapters, adding some more strings to our bow.

Problems 13.1. Write a program using the Osc structure that will produce a band-limited sawtooth wave with a given frequency and amplitude given as arguments to the program. Use either libsndfile or Portaudio to implement the audio output. 13.2. Modify the Osc structure to allow for (optionally) audio-rate amplitude and/or frequency modulation. Write a program using two of these objects to implement simple (sinusoidal) frequency modulation synthesis, taking the carrier and modu-

13.3 Conclusions

197

lator frequencies, index of modulation and the signal amplitude as arguments. Use either libsndfile or Portaudio to implement the audio output.

Chapter 14

Interpolation

Abstract In this chapter we concentrate, on the signal processing side, on the concept of interpolation and how it can be applied to produce better oscillators. We also look at taking these synthesis components apart into its constituent elements, phase generation and table reading. From a programming perspective, the discussion of different kinds of oscillators allows us to introduce inheritance, and the concept of polymorphism. We also explore a new way of handling addresses of objects, which is provided by reference types in C++. The table-lookup oscillator we introduced in the previous chapter is the simplest one of its kind, and it is not as precise as we would have liked it. If we compare the output of the original sine wave oscillator (using a direct call to sin() and that of an oscillator reading a sine wave table, we will see that there are some small differences. The main reason for this is that while the sin()) access translates an angle defined in double precision to a double precision result, in the table lookup we truncate the index to an integral value to be able to access the array memory. The sine wave that is stored in the function table is sampled at N points (N is the table size), and the error in the output will be inversely proportional to this size. The solution to this problem is to be able to find intermediate values between table positions, so that we do not need to truncate the position index to get a result. For instance, if the index is 10.3, we need to be able to find a precise number that sits in between the values of positions 10 and 11. In order to do this, we interpolate [30]. While there are various methods we can apply to perform interpolation, the most common is to use a polynomial. The higher the order of the polynomial, the more precise the result will be, but this also increases computational complexity. While there is a balance to be reached between output quality and efficiency, it is understood that the low computational load of truncation does not justify its poor precision and that we should use, at minimum, first-order interpolation. In the following sections, we will explore the principles of first and second-order polynomial methods.

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_14

199

200

14 Interpolation

14.1 Linear Interpolation Linear interpolation, which uses a first-order polynomial, finds a value that is situated on a straight line between two table values in adjacent positions. Conceptually, if we have 10.3 as an index, we will mix 70% of position 10 with 30% of position 11 to get the result. The polynomial expression is f (x) = ax + b

(0 < x < 1)

(14.1)

where a and b are coefficients calculated from table values at adjacent positions, x is the fractional position between the two indices, and f (x) is the result we are interested in. It is easy to demonstrate that the coefficients can be computed as follows a = y2 − y 1 b = y1

(14.2)

with y1 = T (θ (t) mod N) and y2 = (θ (t) + 1mod N), i.e. the values of two adjacent lookup positions (for a given phase θ (t)). The extra cost is effectively one extra multiplication and two sums: const double *process(double a, double f){ double frac; int posi; amp = a; fr = f; for(int i = 0; i < vsize; i++){ posi = (int) ph; frac = ph - posi; s[i] = amp * (table[posi] + frac*(table[posi+1] - table[posi])); ph += size * fr / sr; while(ph >= size) ph -= size; while(ph < 0) ph += size; } return s; } To facilitate computation, we will assume that the table will have an extra point at the end, which is used when interpolating beyond the last point of the range. A table can be constructed to this specification. Linear interpolation does not add much computational load, and, as mentioned above, should be considered the basic oscillator lookup method. Truncation should not be used unless absolutely necessary.

14.2 Cubic Interpolation

201

14.2 Cubic Interpolation The next method of polynomial interpolation that is practical to adopt in table lookup is of third order, also known as four-point interpolation. Here we take the values of four points around the target index and trace a non-linear curve that will pass through all of these points, and get its value at the required position. The polynomial expression is f (x) = ax3 + bx2 + cx + d

(0 < x < 1)

(14.3)

where, again, we have x as the fractional position between table indices. The polynomial coefficients are obtained as follows: 1. Set f (−1) = y0 , f (0) = y1 , f (1) = y2 and f (2) = y3 , where yn = T (θ (t) − 1 + nmod N). 2. Solve the system y0 = −a + b − c + d y1 = d y2 = a + b + c + d

(14.4)

y3 = 8a + 4b + 2c + d 3. The coefficients are a = (y3 − 3y2 + 3y1 − y0 )/6 b = (y2 − 2y1 + y0 )/2 c = −y3 /6 + y2 − y1 /2 − y0 /3

(14.5)

d = y1 As is obvious, there are many more operations involved in cubic interpolation. Coefficient calculation is more complex, and there is also the need to raise the time variable x to powers of 2 and 3. It is possible to factorise Eq. 14.5 to avoid repeated operations and allow for some efficiency gains, but overall there is much more computation involved in this method than in simple linear interpolation. Considering these points, we can implement a cubic table-lookup oscillator as follows: const double *process(double a, double f){ double frac, a, b, c, d; double tmp, fracsq, fracb; int posi; amp = a; fr = f; for(int i = 0; i < vsize; i++){ posi = (int) ph; frac = ph - posi; a = posi == 0 ? table[0] : table[posi - 1];

202

14 Interpolation

b = table[posi]; c = table[posi + 1]; d = table[posi + 2]; tmp = d + 3.f * b; fracsq = frac * frac; fracb = frac * fracsq; s[i] = amp * (fracb * (-a - 3.f * c + tmp) / 6.f + fracsq * ((a + c) / 2.f - b) + frac * (c + (-2.f * a - tmp) / 6.f) + b); ph += size * fr / sr; while(ph >= size) ph -= size; while(ph < 0) ph += size; } return s; } Similarly to linear interpolation, we can expect the table to be extended by two points beyond the nominal range to allow for interpolation beyond the table size. We need, however, to protect the lookup from the case where the truncated position is 0 and the n − 1 sample needs to be read. Higher-order interpolation methods can be devised, but as we can observe, although they will increase the precision of the lookup, the computational demands will grow significantly. Most of the applications will probably be covered with linear or cubic interpolation. Fig. 14.1 shows a comparison of a test signal (a segment of a sine wave sampled at four points) and its approximation by linear and cubic interpolation. We can see how cubic interpolation does a good job of modelling the wave in between the two sample positions (1 and 2), while the linear curve is also acceptable in this case.

14.3 Inheritance To implement the various versions of the table-lookup oscillator, we have two options: to provide a mode switch in which the object will be constructed to operate with one of a number of table access algorithms; or, alternatively, to create separate structures that will implement them. The second option is probably the cleanest, since we can keep the different interpolation implementations in well-separated components. However, once we decide for this, it would also useful to add as little as possible to what we have already programmed, reusing as much as we can. How can we do that beyond cut-and-paste? The answer is to adopt another aspect of the object-oriented approach: inheritance, which is very well supported by C++. This means that we can make a structure become a child or a derived structure of an existing one, which is its parent or base. We can make the two share the attributes that were defined in the original structure, and add new elements to it to complement the process. The C++ syntax for a structure definition that inherits and

14.3 Inheritance

203

signal

1.0

0.8

0.6

0.4

0.2

0.0

0

1

2

3

linear interpolation

1.0

0.8

0.6

0.4

0.2

0.0

0

1

2

3

cubic interpolation

1.0

0.8

0.6

0.4

0.2

0.0

0

1

2

3

Fig. 14.1: A comparison of a signal sampled at four points, linear interpolation (between points 1 and 2), and cubic interpolation.

can access all members of a base structure is struct name : base-name { member-declarations }; Let’s see what we could do with the present oscillator cases. Starting from our existing Osc structure, we can define Osci (linear interpolation) and Oscc (cubic interpolation): Listing 14.1: Derived structures. struct Osci : Osc {

204

14 Interpolation

Osci(double a, double f, const double *t, unsigned int sz, double phs = 0., unsigned int vsz = 64, double esr = 44100.) : Osc(a,f,t,sz,phs,vsz,esr) { }; const double *process(double a, double f); const double *process(double a); const double *process(); }; struct Oscc : Osc { Oscc(double a,double f,const double *t,unsigned int sz, double phs = 0.,unsigned int vsz = 64, double esr = 44100.) : Osc(a,f,t,sz,phs,vsz,esr) { }; const double *process(double a, double f); const double *process(double a); const double *process(); }; The inheritance diagram for these three structures is shown in Fig. 14.2. Note that we have supplied a constructor for each structure, which calls the base structure constructor (passing, in this case, all parameters to it, since we have no other members specific to these derived structures). The sole reason we have created these structures is provide new implementations to the processing methods in the base structure, which we declare here (and can implement elsewhere). These methods will hide the base structure ones, and take the place of them when an object of the derived structure is used.

Osc 6 Osci

Oscc

Fig. 14.2: Inheritance diagram for Osc, Osci, and Oscc.

14.3 Inheritance

205

14.3.1 Polymorphism However, we can do better than this. Instead of hiding the base methods, we can let the compiler decide which one to use, when it is most appropriate. Consider this case: a pointer to Osc is used to hold a dynamically-allocated object of one of its substructures. This is perfectly allowed by C++, since the child is just an extension of the parent and so access to memory is safe. If we use this pointer to access a process() method, however, the hiding mechanism will defeat us: the base structure code is used, not the intended derived one. So reimplementing via hiding is not a good idea as its semantics breaks down in some situations. So, to improve on this, we use virtual methods, which allow the compiler to safely select the relevant function. It is just a matter of marking the base structure functions with the keyword virtual to warn that they might be reimplemented in a child: Listing 14.2: Virtual methods struct Osc { ... virtual ∼Osc() { delete[] s; } virtual const double *process(double a, double f); virtual const double *process(double a) { amp = a; return process(amp, fr); } virtual const double *process(){ return process(amp, fr); } }; Then, in the derived structures, the functions will not be hidden, but will instead use the override mechanism. In this case, a pointer to the base structure will not necessarily imply that the functions defined there will be used. It will all depend on the actual type of the object that it holds. This feature of object-oriented programming is called polymorphism. The derived object becomes a specialised subtype of the base.

14.3.2 Oscillator Inheritance Tree With this in mind, it makes sense to reorganise the three structures in the oscillator inheritance tree to adopt these principles to reuse code as much as possible:

206

14 Interpolation

• In the base, we declare the processing ‘kernel’ as virtual, that is, the oscillator code is to be reimplemented (specialised) in the derived structures. Let’s call this method oscillator(). • In the base, we declare various interfaces to it, the overloaded process() methods, which will call the actual processing code. • In the derived structures, we just reimplement the processing ‘kernel’. When an object of any of the three structures is created and calls the processing methods, these will in turn call, through the virtual mechanism, the appropriate oscillator code. The remodelled structures would look like this: Listing 14.3: Table-lookup oscillator structures declaration (oscillators.h). #ifndef _OSCILLATORS_H_ #define _OSCILLATORS_H_ struct Osc { double fr; double amp; const double *table; unsigned int size; double ph; double *s; unsigned int vsize; double sr; virtual void oscillator(); Osc(double a,double f,const double *t,unsigned int sz, double phs = 0.,unsigned int vsz = 64, double esr = 44100.) : amp(a), fr(f), table(t), size(sz), ph(phs), s(new double[vsz]), vsize(vsz), sr(esr) { }; virtual ∼Osc() { delete[] s; } const double *process(){ oscillator(); return s; } const double *process(double a, double f){ amp = a; fr = f; oscillator(); return s; }

14.3 Inheritance

207

const double *process(double a) { amp = a; oscillator(); return s; } }; struct Osci : Osc { Osci(double a,double f,const double *t,unsigned int sz, double phs = 0.,unsigned int vsz = 64, double esr = 44100.) : Osc(a,f,t,sz,vsz) void oscillator(); // overrides Osc::oscillator() }; struct Oscc : Osc { Oscc(double a,double f,const double *t,unsigned int sz, double phs = 0.,unsigned int vsz = 64, double esr = 44100.) : Osc(a,f,t,sz,vsz) void oscillator(); // overrides Osc::oscillator() }; #endif In this code, we have not implemented the oscillator() ‘kernel’, but only declared it. We will define these functions elsewhere. This is a design choice which has a subtle implication. Any methods defined inside a structure definition are by default inline: the compiler replaces any code that calls these by a complete copy of the function, eliminating the function call (see Sect. 6.1.5). This has the potential to speed up code, but also to make binary executables bigger. We tend to inline short functions as the potential to improve performance trumps any small increase in program size. In the case of the oscillator methods, it is probably better to implement them outside the structure as they are far larger in size and do a lot of work when called, which then minimises any function invocation overheads. To do this, we write an implementation file, generally with the extension .cpp, which will hold this code. In this case, the structures should be defined in a header file so that they are made accessible to programs (without having to copy the code to each one using it). The code implementing a structure method needs to use a qualified name, which has the form struct-name :: method ( argument-list ) The oscillator implementation file will look like this:

208

14 Interpolation

Listing 14.4: Oscillator implementation (oscillators.cpp) #include "Oscillators.h" // header void Osc::oscillator(){ for(int i = 0; i < vsize; i++){ s[i] = amp * table[(int) ph]; ph += size * fr / sr; while(ph >= size) ph -= size; while(ph < 0) ph += size; } }

void Osci::oscillator(){ double frac; int posi; for(int i = 0; i < vsize; i++){ posi = (int) ph; frac = ph - posi; s[i] = amp * (table[posi] + frac*(table[posi+1] - table[posi])); ph += size * fr / sr; while(ph >= size) ph -= size; while(ph < 0) ph += size; } } void Oscc::oscillator(){ double frac, a, b, c, d; double tmp, fracsq, fracb; int posi; for(int i = 0; i < vsize; i++){ posi = (int) ph; frac = ph - posi; a = posi == 0 ? table[0] : table[posi - 1]; b = table[posi]; c = table[posi + 1]; d = table[posi + 2]; tmp = d + 3.f * b; fracsq = frac * frac; fracb = frac * fracsq; s[i] = amp * (fracb * (-a - 3.f * c + tmp) / 6.f + fracsq * ((a + c) / 2.f - b) + frac * (c + (-2.f * a - tmp) / 6.f) + b); ph += size * fr / sr;

14.4 Function Table Objects

209

while(ph >= size) ph -= size; while(ph < 0) ph += size; } } A program using these oscillator structures would need to include the header file. When building the program, the implementation file should be compiled and linked to the main program using the standard c++ command. Assuming the header file to be in the same directory and the main() function in main.c, we have $ c++ -o program main.c oscillators.cpp -I. Alternatively, we can compile the two files separately into object code and then link these separately: $ c++ -c -o main.o main.c -I. $ c++ -c -o oscillators.o oscillators.cpp -I. $ c++ -o program oscillators.o main.o This is often done in larger projects to avoid the need to recompile every single file when only one of them has been modified. Build system programs such as make are used for this purpose. Note that we could go one step further in code reuse. The modulo operation, used in all three oscillators, is exactly the same. We can remove it from the code replacing it by a function defined in the base structure. As we have seen, functions defined inside the data structure are treated as inline. Therefore, making this a separate function will not incur any overhead due to function calls. As we noted in Sect. 6.1.5, we can also request that a given function is treated this way by using the inline attribute, but this is not needed in this case. A modification in the design of an existing structure, such as this one, where we might move code around, is often called refactoring. We have done this twice in this chapter: we have added virtual methods and reorganised the code into a processing kernel and an interface. This is very common in object-oriented programming, and we will keep doing this to refine the structures we are developing.

14.4 Function Table Objects Now that we have embarked more incisively on an object-oriented way of designing code, it might be useful to take a look at other components that could be modelled as structures for more convenient use. Function tables, as used by oscillators, appear to be a good target for this. Until now, they have been simple arrays with no particular regard to their size or contents. It would be useful to package them into a new type that would not only hold the data and its size but also construct the table properly according to a given algorithm. We also know much better than to create isolated one-off structures, so we should start with a proper base structure design, which will be simple enough to accom-

210

14 Interpolation

modate the various subtypes that we might require later. Basically we need two attributes, which are common to all of these: 1. The table data array. 2. The table size. The simplest type of table, which would serve well as the base, employs a generating algorithm that just copies data from an array into it: Listing 14.5: Function table structure #include struct Func { double *table; int size; Func(int siz, const double *in = NULL) : table(new double[siz+2]), size(siz) { if(in) { memcpy(table, in, siz*sizeof(double)); table[siz+1] = table[1]; table[siz] = table[0]; } } ∼Func() { delete[] table; } }; Note that, in order to work with cubic and linear-interpolation oscillators, we allocate two extra points and fill these with the first two positions in the table, expecting that the oscillator will wrap around the ends of the table in performance. We only fill in the table if an input is supplied. Any Func-derived structures will inherit the basic attributes, but can be constructed differently. Oscillators using a table object can then access its table pointer and size, which are packaged together. We should derive the Func structure to implement the various waveforms we require.

14.5 Reference Types We can also rewrite or add a new constructor to the oscillator structures to take in table objects directly, rather than have to look for their pointers and sizes. Given that we will need to pass a whole structure as a parameter to the constructor, we need to be careful how we do this. Recalling that arguments are always passed to functions by copy, we have two options: 1. Use Func as the argument type and then the whole object is copied into the constructor. This is very wasteful as we do not need copies to be made.

14.5 Reference Types

211

2. Use Func* as the argument and manipulate the address of a table object, which will just amount to copying a pointer. Clearly, option 2 is much better as we should avoid at all costs copying structures, either as arguments or as return types. The only drag with this is that we will need to work with pointers to structures, addresses and a slightly different syntax. In C++, there is a third alternative, which is to use a reference to an object. References are similar to pointers in that we do not operate on an object directly, but through another variable that is referring to it. The main differences between pointers and references are: • A reference binds to a single object at initialisation time; in that sense, it behaves similarly to a constant pointer (i.e. T* const) in that you cannot change to where it is pointed (but you can change the contents of the object that you are referencing). • It is not possible to have a NULL reference. • The reference variable does not need to be dereferenced to access the object, we can do this directly (i.e. no indirection operator is used). A reference to an object of type T is declared and initialised as T& reference = object; We always need to initialise a reference to an existing object. For example, Func tab(10000); // make tabref refer to table Func &tabref = tab; // manipulate the object via the reference for(int i = 0; i < tabref.size; i++) tabref.table[i] = (double) i; Most commonly, we use it to pass parameters to functions by reference rather than by copy: void swap(int &a, int &b) { int tmp = a; a = b; b = tmp; } This is done without having to pass variable addresses and dereference pointers to access the memory. The function can be called just by using1 int n = 1, m = 2; swap(n, m); 1 In fact, a similar function, std::swap(), defined for arbitrary argument types, is provided by the C++ standard library.

212

14 Interpolation

14.5.1 Copy Constructors One of the typical uses of reference type arguments is in the declaration of an explicit copy constructor, for example struct A { ... A(const A& x); }; where the argument may or may not be marked as const, but it is always of the structure reference type. Copy constructors are used to construct objects from other existing objects of the same type. If not given explicitly, the compiler generates one implicitly for the structure. However, in some cases, this is not suitable and a specially-written copy constructor has to be provided. This is the case for structures that include external resources (such as a dynamically-allocated memory block). In fact, our Osc and Func structures would require one if we were to copy them, or pass them as (non-reference) arguments to functions. Since we are not doing that in the current use of these structures, we may sidestep the question. However, this issue will need to be dealt with at some point if we are to make their code more robust.

14.5.2 Object Reference Arguments The use of reference types for arguments more generally is very welcome. For instance, in the particular case of a typical constructor for the Osc structure, we could have Osc(double a, double f, const Func &tab, double phs=0.0, int vsiz = 64, double esr=44100.) : amp(a), fr(f), table(tab.table), size(tab.size), ph(phs), s(new double[vsiz]), vsize(vsiz), sr(esr) { }; The parameter type is const Func&, which means a reference to a const Func, since we want it to be read-only (the table does not get modified at any point). It is always good to let the compiler know what your intentions are: if you are passing a reference and you will not going to modify the underlying object, use const to make it read-only (the same applies to pointers). Since the table pointer in Osc is also const, we have no problems initialising it with the table pointer from a const Func&, as both are read only in this case. Note also that members

14.5 Reference Types

213

of a referenced object are accessed in the same way as before, without the need for any special indirection syntax. It is true that we could have modified Osc to hold a const Func& member instead of a const double*. However that would have prevented us from changing the table we are using at some stage in the lifecycle of the object, since a reference cannot be assigned to, but a pointer can. Perhaps this is something we do not want to do at this point. We may, for instance, want to add an Osc::SetTable() method at some point. Additionally, if we were to use a table object, we would need to modify the oscillator code to access the data, and this seems unnecessary now. As a trivial example, we can modify the code in Listing 13.6 to use a function table object and the new Osc constructor that takes it: Listing 14.6: Synthesis example with table object #include #include #include #include

"oscillators.h" "func.h"

const double twopi = 8. * atan(1.); struct SinTab : Func { SinTab(int siz) : Func(siz) { for(int i=0; i < size; i++) table[i] = sin(i*twopi/size); } }; int main() { const unsigned int size = 10000; const double *out; SinTab tab(size); Osc osc(0.5, 440., tab); for(int i = 0; i < osc.sr; i+=osc.vsize){ out = osc.process(); for(int j = 0; j < osc.vsize; j++) printf("%f \n", out[j]); } return 0; } In this example we have created a very simple new type that holds a sine wave. In a more developed context, we would expect that a function table structure implementing waveforms such as this one would be more general, allowing for, say, multiple harmonics rather than a single component (see also Prob. 14.1). In such a scenario, the encapsulation of function tables as objects in a program is well worth our while.

214

14 Interpolation

14.5.3 Self References An object may, if required, reference itself through the use of the implicit member variable this, which is a pointer to its type. This member holds the address of the object in which it appears. We are allowed to employ it in any (non-static) method, as well as in constructors. For example, struct A { int a; int b; // b is initialised to 0, the value of a A() : a(0), b(this->a) { }; void set(int a) { // parameter a hides member a // this pointer explicitly refers to member a this->a = a; } // returns a reference to this class A& ref() { return *this; } }; References to self are very useful in a number of situations, and can be easily facilitated through the this pointer mechanism.

14.6 Phase Generators and Table Readers Oscillators are actually composite objects made up of three separate operations put together: 1. Table lookup: the actual reading of the function table values. 2. Phase update: incrementing/decrementing the phase value. 3. Amplitude scaling: applying a gain to the values obtained from table lookup before the output. We can separate these into individual steps and model them as signal processing objects. In some applications this can be useful as it enables certain types of manipulation that are generally not available for a single-block oscillator. For instance, if we want to implement phase modulation, as opposed to frequency modulation, we need to be able to generate the phase as a separate signal to which can apply sample-by-sample offsets. The two main components we need to implement are the phase generator, or phasor, and the table reader. To allow for interpolation modes, we should actually implement three types of the latter operator. Both phasors and table readers will

14.6 Phase Generators and Table Readers

215

have plenty of applications in synthesis and processing, which will make it worth our while modelling them as structures.

14.6.1 The Phasor A phase generator will produce a ramping signal going from 0 to 1 (or from 1 to 0) at a given rate. It is represented by the following expression:   f (t) θ (t + 1) = θ (t) + mod 1 (14.6) fs While this can be programmed recursively, it is more suitable to implement it as a loop, as in Listing 14.7: Phasor processing function const double *Phasor::process(){ for (int i = 0; i < vsize i++) { s[i] = phs; phs += incr; mod1(); } return s; } , update the phase and apply a mod 1 to it. The We set the increment to be f f(t) s output will be a rising or falling na¨ıve (geometric) sawtooth that can be used as the (normalised) phase of a periodic function. We can even use this signal directly if we do not mind the aliasing distortion it produces. More commonly, though, we will use it as the phase input to table reading.

14.6.2 Table Reader A table reader object would basically allow a function table to be accessed through a given index. There are two lookup modes: via raw index (varying from 0 to table size) or normalised (varying from 0 to 1). There are also two ways to deal with out-of-range indices: 1. Limiting: keep the phases within the table bounds. 2. Wrap-around: jump back from the ends of the table, implementing effectively a generalised modulo operation. Here’s how a skeleton TableRead structure would look like:

216

14 Interpolation

Listing 14.8: TableRead constructor struct TableRead { const double *table; double phs; bool nrm; bool wrp; unsigned int vsize; double sr; // constructor TableRead(const Func &tab, double phase = 0., bool norm = true, bool wrap = true, unsigned int vsz=64, double sr = 44100.) : table(tab.table), phs(phase), nrm(norm), wrp(wrap), vsize(vsz), sr(esr) { }; // process method taking phase indices const double *process(const double *ndx); ... }; In this example, we have also introduced a new built-in type: bool, which can be 0 or 1, and can also take the constants true or false. They are very useful as binary switches. In this case, they turn the normalised lookup and wrap-around on and off, in that order.

14.7 Conclusions This chapter has introduced, from the perspective of signal processing and audio programming, the important concept of interpolation, which is not only used in table lookup oscillators, but alsp, as we will see, in many other contexts. From the perspective of coding practice, we have introduced the twin ideas of inheritance and polymorphism, which are very useful to create relationships between types that emphasise common elements and allow us to reuse code. The advantage of this is that we can implement a feature only once and in one place, which will benefit code maintenance, bug fixing and improvement. The mechanism of reference variables was also discussed, which will allow more transparency and simplicity for passing arguments by reference rather than copy.

14.7 Conclusions

217

Problems 14.1. Derive a structure from Func that implements a Fourier series-based table, to allow for waves with any number of harmonics of different amplitudes, and an overall phase offset. Write a program to demonstrate its use (with libsndfile or Portaudio for output). 14.2. Design and implement frequency and amplitude modulation support for the oscillator structures, maximising code reuse via refactoring. 14.3. Implement a phasor structure to go around the phasor algorithm of Listing 14.7. Write a program to use it to produce a sine wave. 14.4. Implement the three table reader structures for truncation, linear interpolation, and cubic interpolation methods, using the same principles and layout introduced for the oscillator cases and using constructor signature as shown in Listing 14.8.

Chapter 15

Envelopes

Abstract This chapter discusses envelopes as an important component of computer instruments, which allow the shaping of synthesis and processing parameters over time. Their basic principles are derived from the ideas of interpolation discussed in the previous chapter. Two fundamental types are explored, linear and exponential envelopes, and a complete class example is provided to illustrate the discussion. The chapter also introduces the concept of data hiding and access control. This is complemented by a look at C++ operator overloading. We finish off with an interface design for a sound output class. A key component in audio synthesis and processing is the envelope generator. This implements time functions that can be used to modify parameters such as amplitude and frequency. Most of the interesting and musical sounds are never static over time, and thus we need a way of making them vary. As a minimum requirement, we need to be able to shape the amplitude of a tone so that it does not click when we start and stop it. For this, we define one of many types of functions that can produce smoothly-changing gain values, which are then applied to the sound. As these will apply an external, enveloping, form to the signal waveform, we call them by the generic name envelopes [36].

15.1 Envelope Generators Envelopes can be drawn using a variety of mathematical formulae. However in order to simplify their specification, we tend to employ a piecewise approach, i.e. we split the total time function into segments and generate each curve separately. There are two fundamental methods that we can use to generate these: linear and exponential.

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_15

219

220

15 Envelopes

15.1.1 Linear Envelopes A linear segment is created using the exact same first-order polynomial we employed for interpolated table lookup. In fact, generating the envelope is nothing more than interpolating between two points. As we may recall, a linear function is defined by f (x) = ax + b

(15.1)

In this case, we make our time position x vary between 0 and 1, and we define the coefficients a and b as the linear interval we want to cover and the starting point, respectively: a = y 1 − y0

(15.2)

b = y0

where yn are the extreme position values in this segment (counting from 0). The expression we will use then becomes, with t as the time in samples, and xn the time in samples corresponding to the value of yn , f (t) = y0 + (y1 − y0 )

t − x0 x1 − x 0

(15.3)

For a single segment starting from time 0, the expression simplifies considerably. We could also use an iterative method where we calculate an increment that is added to the current output, making the envelope generation very efficient: y1 − y0 d y(t + 1) = y(t) + i

i=

(15.4)

In this case d = x1 , but, more generally we could calculate d as the segment duration x1 − x0 and subtract x0 from t to offset it. For long, multi-segment envelopes, it is very important that we hold on to the start position of each segment, instead of just applying the recursive formula from the start. In other words, we reset the start of each portion to the previous end position and apply the iteration from there. With this in mind, we could design a processing function for a single-segment linear envelope as: Listing 15.1: Linear envelope. void generate() { for (int i = 0; i < vsize; i++) { s[i] = y; if (cnt < x1) { y += incr;

15.1 Envelope Generators

221

cnt++; } } else y = y1; } Note that once the envelope segment time count (cnt) reaches the required duration, we sustain the last value generated. It will be useful, however to be able to reset and retrigger the envelope generation, and thus we could have something like this in an envelope object, Listing 15.2: Retriggering method. virtual void retrig() { cnt = 0; y = y0; incr = (y1 - y0) / x1; } plus another method to reset parameters to other values, if necessary.

15.1.2 Exponential Envelopes While linear curves are very simple to generate, they are not perceptually accurate if employed to control amplitude or frequency. This is because we take notice of changes in terms of ratios rather than differences. For instance, a jump of 100 Hz from 100 to 200 Hz is heard as an interval of an octave, which is perceived as the same change as that from 500 to 1000 Hz or 1000 to 2000 Hz. What matters is that we have a ratio of 2:1 between these frequencies. Applying a linear envelope to control frequencies will translate to a non-linear perception of parameter change. This is also the case with amplitude envelopes, although there is more tolerance for the use of linear envelopes (especially for onsets) in these applications. So, in order to address these issues, we can propose an exponential curve generator as an alternative to the linear function used before. This is defined by f (x) = ax b

(15.5)

As before, the time position x varies between 0 and 1, and the coefficients a and b are the ratio we want to cover and the starting point, respectively: y1 y0 b = y0

a=

(15.6)

Some limitations are naturally imposed by this formula: the envelope end point values cannot be 0 (or smaller), as they will stop the formula working (and, in the

222

15 Envelopes

case of y0 , a 0 leads to a singularity). So we have to protect the envelope from that by either checking for this condition or adding a very small number to each end point value. As with the linear case, it is possible to calculate the envelope efficiently by employing a multiplier in an iterative process:  m=

y1 y0

1 d

(15.7)

y(t + 1) = y(t) × m As we can see, all we have needed to was to transform the value difference into a ratio and the multiplication into an exponentiation. This gets translated to the following envelope generator C++ method: Listing 15.3: Exponential envelope. void generate() { for (uint32_t i = 0; i < m_vframes; i++) { s[i] = y; if (cnt < x1) { y *= incr; cnt++; } else y = y1; } } At the end of the segment, the envelope sustains its target value. Similar retriggering and resetting can be implemented for this envelope. The two envelope methods can be designed to share/reuse code through inheritance. The choice of linear or exponential curves will depend on the application: exponential envelopes produce more realistic amplitude decay curves and frequency glides. Onsets may sometimes sound better with linear segments, and other control signal applications may require linear changes. A comparison of linear and exponential envelopes is shown in Fig. 15.1, where we should note that both envelopes start at a non-zero point, which, as we saw, is a requirement for exponential curve calculation.

15.2 Access Control and Classes In previous chapters, we introduced the idea of protecting parts of our new data structures by using the concept of read-only parameters and return types, which are a way of building robustness into our code, a form of defensive programming. We now want to extend this further by putting forward the idea of data hiding, which is enabled in C++ by its code access mechanisms.

15.2 Access Control and Classes

223

linear segment

0.5

0.0

0.5

exponential segment

0.5

0.0

0.5

Fig. 15.1: Comparison of linear and exponential envelope segments.

In all of the types we have designed so far, it is possible to freely access any of their data members, and do whatever we want with them. This is acceptable in a small software project and might save us some lines of code. In a medium-to-large complexity project, especially involving more than one programmer, or targeting a wider use base (e.g. a library), is dangerous. We should attempt to protect our code from unwanted modification as much as possible. When designing a new type, we need to be clear about what is the internal representation and what is the public interface. As a rule of thumb, we should not expose the type attributes (its member variables) to direct access, but should regulate it through a member function. In object-oriented programming, we will have a proliferation of getter/setter methods to provide this interface (of course we do not need, nor it is desirable, to have a means of accessing all attributes). The C++ language specification allows for three types of access control in new types, using specific keywords: 1. private: all members declared following this keyword are only accessible or visible to methods inside the structure to which they belong.

224

15 Envelopes

2. protected: all members declared following this keyword are only accessible or visible to methods inside the structure to which they belong, or to any substructures derived from it. 3. public: all members declared following this keyword are fully accessible from outside the structure to which they belong. In addition to this, we can use the friend qualifier to allow other classes or functions to access private (or protected) code. Structures have all their members public by default. Another new type specifier in C++ is class, which is used in the same way as struct but has its members private by default. In fact, the name class is the more usual term for a type in object-oriented programming: • A class is a kind-of thing, the model, type, or embodiment of it. • An object is a thing, an instance of it. Within this context, all structures (even the C ones used earlier on) are classes. We have avoided this terminology until now, but we can adopt it more generally from this point onwards. In terms of syntax, we have class T { // private members protected: // protected members public: // public members }; We can use the access declarations in any order, the only rule is that they will override the access rules defined before them and act on any members defined after them. In the case of derived classes, the following applies: • class X : private Y – all public and protected base class X members become private members in the subclass Y . • class X : protected Y – all public and protected base X class members become protected members in the subclass Y. • class X : public Y – all protected base class X members become protected members in the subclass, and all public members are also public in the subclass Y. Classes defined with the class keyword have private inheritance by default and those defined using struct use public inheritance if this is not specified.

15.2.1 Namespaces Another mechanism in C++ that allows more robustness in symbol naming is the principle of namespaces. This is mostly used to prevent name clashes and to help

15.2 Access Control and Classes

225

programmers make sure that the function, class, etc. that is being used is the correct one. Namespaces are defined using the keyword namespace and can apply to a range of declarations by enclosing these inside a block: namespace mine { void f(int i); const int d = 1; class T { ... }; } To access names defined in a namespace, we can use a qualified name, namespace :: name For example: mine::f(mine::d); mine::T obj; We can also employ the using statement, using namespace mine; to import the namespace fully into the current context (which can be a file, function, etc.). A very common namespace we will see in many examples is std, which identifies symbols from the standard C++ library (see, for instance, Sect. 15.3.1).

15.2.2 A Line Class Following these principles, we now give an example of a desirable access control for one of the signal processing classes we are considering in this chapter. A Line class, modelling the one-segment linear envelope can be designed as follows: Listing 15.4: Linear envelope class #include class Line

{

protected: double m_y; double m_y0; double m_y1; uint32_t m_x1; double m_incr;

226

15 Envelopes

uint64_t m_cnt; uint32_t m_vframes; double *m_vector; double m_sr; /** process the output vector */ virtual void generate() { for (uint32_t i = 0; i < m_vframes; i++) { m_vector[i] = m_y; if (m_cnt < m_x1) { m_y += m_incr; m_cnt++; } else m_y = m_y1; } } /** set the increment */ virtual void update() { m_incr = (m_y1 - m_y0) / m_x1; } public: /** Line constructor \n\n start - start value \n end - end value \n time - duration(s) \n vframes - vector size \n sr - sampling rate / * Line(double start = .0, double end = 1., double time = 1.,uint32_t vframes = 64, double sr = 44100.) : m_y(start), m_y0(start), m_y1(end), m_x1(time * sr), m_incr((end-start)/(time*sr)), m_cnt(0), m_vframes(vframes), m_vector(new double[vframes]), m_sr(sr) {}; virtual ∼Line() { delete[] m_vector; } /** process and return the output vector

15.2 Access Control and Classes

227

as a read-only array. */ const double *process() { generate(); return m_vector; } /** retrigger */ void retrig() { m_cnt = 0; m_y = m_y0; update(); } /** reset and retrigger */ void reset(double start, double end, double time) { m_y0 = start; m_y1 = end; m_x1 = time * m_sr; retrig(); } }; Note that we have a clear access control, separating the hidden (protected) members and the public interface that can modify it (nothing else can). The choice of protected instead private members is made to allow derived classes to be built upon this, reusing code as much as possible. We have nominated overridable methods very clearly based on where we see scope for specialisation: in line generation and in increment update. Making these protected allows us to have a well-defined fixed interface but with the option of specialising the signal processing operations internally. Another style matter is the choice to prefix each member variable with an m_ so we can clearly see what is local to the function or to the object as a whole. Finally, we are being somewhat more definite about the numeric types we are using. In the header, which we have already seen in Chapter 2, there are a number of short-hand type definitions for integers, in terms of signedness and size, which we are taking advantage of here. Anything that is clearly never going to be negative will use an unsigned type. For variables that may need an extra range, we also make them 64-bit wide.

228

15 Envelopes

15.3 Operator Overloading Since we are on the path to creating fully-fledged new types, we should try to make them behave as much as possible like built-in ones (as these, on the other hand, are all considered as classes as well). The compiler provides some support for simple operations such as copying (assigning) objects. However, for many types of manipulation, we will need to define them explicitly. We might wonder, for instance, what the meaning of standard language operators (such as, for instance, the arithmetic ones) is when used with an object of a given class. The answer is, of course, that it is up to us to define this by overloading the operator for our new type. The way to go about it is to declare a public method named using the following syntax: return-type operator op ( arguments ) where op is the operator we want to overload. Here is a trivial example, Listing 15.5: Overloading arithmetic operators class MyInt { int val; public: MyInt(int x) : val(x) { }; const MyInt &operator+=(const MyInt &y) { val += y.val; return *this; } MyInt operator+(const MyInt &y) { MyInt x(*this); x += y; return x; } }; This class overloads the binary addition (+) and the unary increment (+=) operators. Note the use of the this pointer. This is an expression containing the address of an object which allows a pointer to itself for self-reference applications. With it, in the addition operator, we create a local object as a self copy and return it by value. In the increment operator, we use it to return a constant reference to itself so that we can chain operations. With this class, as defined above, we can write the following code: MyInt a(1), b(1), c(0); c = a + b; Various other operators can be overloaded and we will see how we can use this mechanism to our benefit to allow for some easy-to-use syntax with signal processing objects. Depending on the class, we may need to provide assignment operators

15.3 Operator Overloading

229

as well, since the compiler-generated one is sometimes not suitable. Classes that allocate external resources (such as the one in Listing 15.4) are among these, as the copy operator will need to make sure these are dealt with properly. However, we will actively avoid these types of operations, which in most cases involve non realtime-safe code. If copying an object requires, for instance, that memory is freed and re-allocated, this is not to be done in realtime-critical sections, where audio computation is performed. In the examples above, the unary increment is generally safe, as we are only manipulating references. However, the binary addition is not: its use may lead to copying of data that might be problematic in a realtime audio context.

15.3.1 Standard IO Revisited At this point, it is useful to revisit standard IO processes to see if there are other ways that we can do this in a more object-oriented way. The standard C++ library has a number of facilities that provide an object-oriented interface to common IO operations. The iostream classes in the library model various ways in which input and output streams can be handled. In the particular case of standard IO, the standard C++ library provides three objects of these classes to facilitate the process defined in the iostream header: • std::cout – standard output, equivalent to stdout, • std::cin – standard input, equivalent to stdin • std::cerr – standard err, equivalent to stderr The classes std::istream (input streams) and std::ostream (output) overload operator>> and operator<<, respectively, which are used to input and output data. Formatted IO is provided through a series of overloaded operators of these two kinds for various types. For instance, to access stdout, if we use std::cout: cout <<

"Live Long and Prosper.\n";

we can concatenate various objects to put them into the stream: double a = 2.; cout << "this is a constant: " << 32 << "and the value of a var: " << a <<

'\n';

Note how we have employed a mixture of strings, numeric and character constants and a variable to build a formatted stream that is sent to stdout. As another example, we could use std::cout to write a simple test program for the Line class in Listing 15.4: #include

230

15 Envelopes

int main() { Line line; for(int i=0; i < 44100; i+=64) std::cout << line.process()[0] << "\n"; return 0; } For input, to get data from stdin we can use the other operator: double a; cin >> a; Note that we can provide an overloaded operator for these stream operations for our new types. This is done through the mechanism of friend functions. These are free functions1 that are granted direct access to private or protected data in a class. An operator can be given this access to allow the class data to be put in a stream. Adding to the example in listing 15.5, we have: Listing 15.6: Overloading stream operators class MyInt { ... friend std::ostream &operator<<(std::ostream &os, const MyInt &i) { return (os << i.val); } friend std::istream &operator>>(std::istream &is, MyInt &i) { return (is >> i.val); } }; allowing the MyInt class to interact with iostream objects: MyInt a(1), b(1), c(0); cin >> a; cin >> b; c = a + b; cout << c << "\n";

15.4 An Audio Output Class The ideas we have discussed so far can be applied to provide us with an interface for audio output. We would be able to design a generic class that can be implemented with a range of backends (e.g. libsndfile for soundfiles, Portaudio, Jack, or another 1

In C++, free functions are those defined outside a class, i.e. not belonging to any particular object.

15.4 An Audio Output Class

231

similar library for realtime audio, and so on). As long as we have our interface correctly set up, the implementation should be straightforward2 . The typical attributes we should be expecting for a sound output class are the sampling rate, number of channels, processing vector size and buffer size. A buffer memory should be allocated to accumulate the data before we send it to its destination. A handle will also be needed to refer to this destination (whether it is an open file or a device) and we will need a counting variable, as well as a total frame count. If we have different types of output, we should also include a mode switch. In this particular design, the class constructor would take care of any initialisation needed, opening devices or files, and setting up all the necessary elements to stream audio out. The destructor would, among other things, close any streams and/or devices, and do any required de-initialisation. The basic operational method for this class would take a block of audio frames and write it to the buffer (write()), invoking the destination write function once that is full. The class declaration for SoundOut is shown in Listing 15.7. Listing 15.7: Audio output class class SoundOut { double m_sr; uint32_t m_nchnls; double m_vsize; const char *m_dest; uint32_t m_mode; uint32_t m_cnt; uint32_t m_framecnt; void *m_buffer; uint32_t m_bsize; void *m_handle; public: /** SoundOut constructor dest - output destination nchnls - number of channels sr - sampling rate vsize - vector size bsize - IO buffer size */ SoundOut(const char *dest, uint32_t nchnls = 1, double sr = 44100., uint32_t vsize = 64, uint32_t bsize = 1024); /** SoundOut destructor 2

Such an implementation is provided, for instance, in AuLib, which is discussed in Chapter 17.

232

15 Envelopes

*/ ∼SoundOut(); /** Writes sig (vsize frames) to the output destination. Returns number of samples written, or an error code. */ uint32_t write(const double *sig); }; Note that the interface defined in Listing 15.7 is general enough to allow for various backend implementations, and is not dependent on any particular audio IO library. As an example, we would use it as follows ("dac" interpreted as realtime audio output): int main() { const unsigned int size = 10000; const double *out; SinTab tab(size); Osc osc(0.5, 440., tab); SoundOut("dac"); for(int i = 0; i < osc.sr; i+=osc.vsize) out.write(osc.process()); return 0; }

15.5 Conclusions In this chapter, we have introduced the principle of linear and exponential envelope generators, alongside several object-oriented concepts and their realisation in C++. At this stage, we have amassed a handful of audio DSP classes, a collection that is already starting to look like a small code library (oscillators, phasor, table readers, envelopers, table generators, sound output). The idea of designing such a library a little more thoroughly using established C++ standards will start to become a central preoccupation for us in the following chapters, as we progress through the algorithms and object-oriented programming.

Problems 15.1. Declare and implement a one-segment exponential envelope class, derived from Line. Write a simple program demonstrating its use.

15.5 Conclusions

233

15.2. A common type of multi-segment envelope generator is the attack-decaysustain-release (ADSR) generator, which is a four-stage envelope using linear curves. The first stage starts from 0 and leads to the maximum amplitude, followed by a decay segment from the maximum to the sustain amplitude. The sustain period holds the amplitude steady until the release is triggered. The trigger for the release stage will make the envelope jump to release from any of the earlier stages. In this final segment, the amplitude moves from whatever value it had at the trigger time to 0. Design and implement an ADSR envelope class. Write a simple program demonstrating its use. 15.3. Implement the audio output class using a backend of your choice (libsndfile, Portaudio, Jack, standard IO).

Chapter 16

Filters

Abstract In this chapter we look at how filters are constructed, concentrating on infinite impulse response types, typically found in computer music applications. After an introduction to the main signal processing aspects of filters, we explore the implementation of first-, second-, and fourth-order filters. From the programming side, we introduce the concept of templates, which offer compile-time support for polymorphism. This leads into a discussion of the standard C++ library, and, in particular, of container classes provided by it, which will be very useful in the development of an object-oriented library for audio processing. So far, we have been able to generate audio signals, modify their amplitude and frequency using envelopes, and choose what kind of waveform source we want for this. We can use frequency modulation to produce time-varying timbres [36]. It is also possible to put together a large set of sinusoidal oscillators, each one modelling a separate partial, to achieve a similar level of timbral manipulation. In this chapter, we will introduce a third means of processing waveforms, through filters. Filters are modifiers that can be used to shape the spectrum of an input sound in terms of the amplitudes and phases of its constituent partials. Their effect on amplitudes at various frequencies is called the amplitude response, while the various delays they can add to signals at various frequencies are collectively called the phase response. Digital filters are implemented using a mix of direct and delayed signals. They can be classified in terms of [36]: 1. Their effect on an input: • Low pass (LP): cuts (reduces) the amplitude of components above a certain frequency, called the cutoff frequency, and are also known as high cut. The region above the cutoff frequency is called the stop band. • High pass (HP): cuts the amplitudes below the cutoff frequency (also called low cut). The stop band is therefore below this. • Band pass (BP): passes a given band around a centre frequency. The stop-band regions are outside this band. © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_16

235

236

16 Filters

• Band reject (BR): cuts a given band around a centre frequency. • All pass (AP): passes all frequencies unaltered in amplitude, but affects the phases of an input signal (that is, adding delays to it). 2. Their order: • First order: employs delays of only one sample. • Second order: employs delays of up to two samples. • Higher orders: employs a longer delay or delays, of several samples. 3. Their structure: • Feedforward: uses a combination of an input and its delayed copies; also known as finite impulse response (FIR). • Feedback: uses a combination of an input and delayed copies of its own output (i.e. it is recursive), and possibly input delays as well; also known as infinite impulse response (IIR). While FIR filters are stable and can be designed to have attractive features such as a linear phase response, where the same amount of delay is applied to all frequencies, they are not very flexible for many musical applications. In this chapter, we will concentrate on the implementation of IIR filters, leaving the discussion of their feedforward counterparts to Chapter 18.

16.1 Feedback Filters Feedback filters come in various forms, and are usually packaged in such a way that we can control and modify their actions using standard parameters such as frequencies and bandwidths. We will tend to define them in terms of first or secondorder sections, but higher orders can be achieved by serial connections. Each filter design will have different characteristics and applications. We will start by looking at the simplest of them, the low or high-pass tone controls, then move on to secondorder designs and complete the discussion with fourth-order resonant filters.

16.1.1 First-Order Tone Filters These are filters with very smooth and gentle amplitude response curves. They will tend to shape an input sound in a very light way, and can be likened to tone controls in consumer hi-fi equipment. Their structure combines an input signal with a onesample delay (hence they are first-order filters) of their output. We can define these filters with the following equation: y(t) = ax(t) − by(t − 1)

(16.1)

16.1 Feedback Filters

237

where x(t) and y(t) are the filter input and output signals, respectively, and y(t − 1) is the output delayed by one sample (Fig. 16.1).

a ? x(t) - ×n- +n

- y(t)

6  −b - ×n

1-sample delay 

Fig. 16.1: First-order tone filter flowchart.

The filter will assume a low-pass or a high-pass amplitude response depending on the coefficients a and b. For a low-pass filter, we have   f r = 2 − cos 2π fs  b = r2 − 1 − r

(16.2)

a = (1 + b) where f is the cutoff frequency and fs is the sampling rate. It is very easy to flip the filter into high-pass mode by modifying b: 

f r = 2 + cos 2π fs  b = r − r2 − 1

 (16.3)

a = (1 + b) These filters can be implemented with a common engine or kernel that would look like this: virtual const double *filter(const double *sig) { for (uint32_t i = 0; i < m_vframes; i++) { m_del = m_a * sig[i] - m_b * m_del; m_vector[i] = m_del; } return m_vector; }

238

16 Filters

A processing function would then invoke this kernel following the appropriate setting of low-pass or high-pass values for each coefficient. These do not need to be updated at each sample (unless some sort of audio-rate modulation is required), and in fact only need to be computed if the cutoff frequency has changed. So, we can track its latest value and then decide whether we need to recalculate a and b, using the following functions: void ToneLP::update() { double costh = 2. - cos(2. * pi * m_freq / m_sr); m_b = sqrt(costh * costh - 1.) - costh; m_a = (1. + m_b); } void ToneHP::update() { double costh = 2. + cos(2. * pi * m_freq / m_sr); m_b = costh - sqrt(costh * costh - 1.); m_a = (1. + m_b); } Low-pass tone filters can also be used for smoothing control signals. One such application is in the computation of RMS (root-mean-square) estimates of a signal. The principle behind this is that the LP filter performs an averaging of an input, which would be equivalent to the type of operation employed in the ‘mean’ part of the RMS method. The root and square elements can be replaced by taking the absolute value off the signal (rectification) before filtering. We could re-implement the filter kernel to include a rectification operation through an inline function rect() so that the modified tone LP design can be used directly as an RMS estimator: virtual const double *filter(const double *sig) { for (uint32_t i = 0; i < m_vframes; i++) m_vector[i] = m_del = m_a*rect(sig[i]) - m_b*m_del; return m_vector; }

16.1.2 Second-Order Filters First-order filters exhibit an amplitude rolloff of −6 dB per octave in their stop band, whereas second-order filters are more selective, with a steeper rolloff of -12 dB. Another feature of these filters is that it is possible to define both band-pass and band-reject amplitude response curves (which is not possible in first-order designs). As we have mentioned above, these filters feature two-sample delays, and may also include feedforward signal paths. The simplest second-order design is called the resonator, which is defined by the following equation:

16.1 Feedback Filters

239

y(t) = ax(t) − b1 y(t − 1) − b2 y(t − 2)

(16.4)

The coefficients b1 and b2 are determined from the basic filter parameters, the centre frequency fc and bandwidth B:   π R = exp B fs

  4R2 fc cos 2π b1 = − 1 + R2 fs

(16.5)

b2 = R 2 The coefficient a is an input scaling gain that is used to keep the filter under control. It prevents the filter amplitude from increasing out of control when the bandwidth is very small (high resonance) [61]:    b1 a = (1 − R2 ) sin cos−1 (16.6) 2R This scaling can be left out if we have another means of controlling the filter output amplitude (e.g. via balancing, which we will discuss later in this chapter). The filter kernel can be implemented as: const double *Reson::filter(const double *sig) { double y; for (uint32_t i = 0; i < m_vframes; i++) { y = sig[i] * m_scal - m_b[0] * m_del[0] - m_b[1] * m_del[1]; m_del[1] = m_del[0]; m_vector[i] = m_del[0] = y; } return m_vector; } Resonators can be slightly modified by adding a two-sample delay feedforward path (Fig. 16.2): y(t) = ax(t) + a2 x(t − 2) − b1 y(t − 1) − b2 y(t − 2)

(16.7)

In order for us to implement this filter, we can rearrange it as a pair of equations, which allows the delays to be shared between the feedback and feedforward paths. The first one of them is the original feedback filter, and the second implements the feedforward section: w(t) = ax(t) − b1 w(t − 1) − b2 w(t − 2) y(t) = w(t) + a2 w(t − 2)

(16.8)

240

16 Filters

a ? - +i - y(t) x(t) - ×i - +i i 6  −b2 ×6  1-s delay 61-s delay  −b1 - ×i Fig. 16.2: Resonator filter flowchart.

For the coefficient b2 , we have the choice of two values: −R or −1. The rearranged filter kernel has the following inner loop: for (uint32_t i = 0; i < m_vframes; i++) { w = sig[i] * m_scal - m_b[0] * m_del[0] - m_b[1] * m_del[1]; y = w + m_a[2] * m_del[1]; m_del[1] = m_del[0]; m_vector[i] = m_del[0] = w; } More generally, we can talk of a second-order filter section, which includes both feedback and feedforward delays, with five coefficients, for each one of the delays and for the direct signal. With this in place, we can determine the values of the coefficients for different types of filter package. The second-order section equations are: w(t) = x(t) − b1 w(t − 1) − b2 w(t − 2) y(t) = a0 w(t) + a1 w(t − 1) + a2 w(t − 2)

(16.9)

The filter structure denoted by these equations is known as direct form II (DF II). Alternatively, we have DF I, which uses separate delays for the feedback and feedforward paths, in a single expression: y(t) = a0 x(t) + a1 x(t − 1) + a2 x(t − 2) − b1 y(t − 1) − b2 y(t − 2)

(16.10)

The two forms are generally equivalent (within a certain numeric range), and so we normally implement the DF II version as it slightly more economical. For this, we only need to modify the alternative resonator implementation slightly by adding one extra term in the second equation and the a0 coefficient, and removing the input scaling: for (uint32_t i = 0; i < m_vframes; i++) {

16.1 Feedback Filters

241

w = sig[i] - m_b[0] * m_del[0] - m_b[1] * m_del[1]; y = m_a[0] *w + m_a[1] * m_del[0] + m_a[2] * m_del[1]; m_del[1] = m_del[0]; m_vector[i] = m_del[0] = w; } With this filter kernel, we can provide several filter recipes to realise different types of curves, low-pass, high-pass, band-pass, band-reject, and all-pass. For example, a family of designs that is used widely in music applications is given by digital versions of the classic analogue Butterworth filters. The coefficients for the   −1 various response curves for these are shown in Table 16.1, with L = tan π ffs   −1 and M = tan π Bfs [15].

Table 16.1: Butterworth filter coefficients. coefficient a0



LP HP −1  −1   2 1 + (2)L + L 1 + (2)L + L2

BP

BR

(1 + M)−1

(1 + M)−1   −2 cos 2π ffs a0

a1

2a0

−2a0

0

a2

a0

a0

−a  0 −2M cos 2π ffs a0

a0

(M − 1)a0

(1 − M)a0

b1 b2

2(1 − L2 )a0 2(L2 − 1)a0       1 − (2)L + L2 a0 1 − (2)L + L2 a0

a1

16.1.3 Fourth-Order Filters A fourth-order filter will have a stop-band rolloff of -24 dB/octave. It is possible to construct such filters by connecting two second-order sections in series. Indeed, any higher-order filter can be achieved in this way. One typical design for fourth-order structures is the low-pass resonating filter, which is made up of a series of four firstorder IIR sections with a feedback path connecting the output of the filter back into its input [36]. This can be used to emulate, for instance, the classic ladder filters employed in analogue synthesisers. For example, the following equations define a fourth-order resonant low-pass filter [20]:

242

16 Filters

     x(t) − 4ry4 (t − 1) y1 (t − 1) y1 (t) = y1 (t − 1) +V g tanh − tanh V V      y1 (t) y2 (t − 1) y2 (t) = y2 (t − 1) +V g tanh − tanh V V      (16.11) y2 (t) y3 (t − 1) y3 (t) = y3 (t − 1) +V g tanh − tanh V V      y3 (t) y4 (t − 1) y4 (t) = y4 (t − 1) +V g tanh − tanh V V

where the constant V is determined by the physical characteristics ofthe system (it  refers to the thermal voltage of transistors), and g = 1 − exp − 2πfs f . This filter is computationally more complex than the previous ones we have seen, given the nonlinear distortion terms using the tanh function (which acts as a soft overload limiter). Not that we can reduce the number of calls to this function if we cache (e.g. place in memory) the repeated terms in Eq. 16.11. Also, to provide better stability, a firstorder averaging FIR can be placed in the feedback path between the output and input of the filter. This filter is given by the simple equation x(t) + x(t − 1) (16.12) 2 A number of variations on this basic fourth-order design exist, for instance using other means of non-linear distortion in the signal path, but preserving the general structure of four first-order filters in series. y(t) =

16.1.4 Balancing Filters can dramatically change the amplitude of an input sound, either by reducing or by increasing it. This can happen for instance, if we have a bank of filters consisting of a series connection, where each one might reduce further the amplitude of parts of the spectrum that are already very soft. Or we might run the resonator without input scaling and squeeze the bandwidth to a point where the output overloads. For these applications (and others) we might want to have a way of balancing the output amplitude according to a comparator signal (e.g. the pre-filter audio). This can be easily achieved with a pair of RMS estimators, as follows: y(t) = x(t)

RMS(c(t)) RMS(x(t))

(16.13)

where RMS() is the estimator and c(t) a comparator feed. The only consideration here is that we have to protect this operation from the case where the RMS amplitude of x(t) is 0, to avoid a singularity. This can be done by a check or by adding a very small value to the denominator. The balance operation can also be used as

16.2 Templates

243

an envelope follower, as it will apply the extracted time-varying amplitude of one sound into another.

16.2 Templates Now we turn back to C++ to look at some new features that will be useful in our programming of DSP operations. In order to facilitate further code reuse, in the spirit of object-oriented programming, the language allows us to create families of entities (types or functions) from a single prescription. This is done through templates, as follows: 1. A template is defined using the keyword template followed by a parameter list and the template body: template< parameter-list > definition ; The parameter list can be made of types (classes etc) and non-types (e.g. variables). The former are declared by the keywords typename or class and the latter by the variable type (which can be, for instance, an integral, pointer, or reference type). template class X { T var; } 2. A template is instantiated by passing arguments to match each one of the template parameters: template-name< parameter-list > name; X

a;

Templates are commonly used to define classes that are similar in structure but depend on different types. For example, we could define a class that will hold an array of an arbitrary type. For this, we need a type and a non-type parameter, to define the array basic cell and how many of them we want: template class MyArray { T data[N]; public: MyArray(T init) { for(uint32_t i; i < N; i++) data[i] = init; } const T& operator[] (uint64_t n) const { return data[n]; } T &operator[] (uint64_t n){

244

16 Filters

return data[n]; } }; This template provides the internal data storage (private data) with a constructor/initialiser, and the [] operator to access each member in the usual way. Note that there are two operators defined: one for read access (which returns a const T reference and is marked const, telling the compiler it does not modify the object it belongs to) and another for writing (which returns a reference to a memory location whose contents can be modified): int main(){ MyArray f(0); f[0] = 1; cout << f[0] << "\n"; return 0; } Note that we can also use any user defined type, such as for instance one of the classes we designed earlier on: MyArray fm(0.5, 440, sine); ... fm[1].process(amp, fm[0].process(ndx)); It is also possible to define function templates, which can be applied to arbitrary types, generating a family of functions: template void message(T a) { std::cout << "Message: " << }

a

<< std::endl;

Function templates need to be instantiated before use. This can be done explicitly, e.g, template void message(const char*); or, in many cases, this can be done implicitly at the time the function is called: message("hello"); where the type is inferred directly from the argument, and therefore we do not need to supply it. It is very common for template functions to be instantiated implicitly,

16.2.1 Templates in the Standard C++ Library The standard C++ library provides an extensive collection of templates for all sorts of applications (which also includes the previously-seen iostream classes). These can be accessed by including the relevant headers, as well as the namespace qualifier

16.2 Templates

245

std. As part of this library, we have a number of container template classes that can be used very conveniently to create dynamically-allocated objects of arbitrary types. The advantage of using these is that we will eliminate the need to manage memory directly, by delegating this task to the container class. Objects of these types will take care of allocating and deallocating memory automatically as they go in and out of scope. Using standard library containers allows us to avoid the need for new and delete, and, as a consequence, we will not in general be required to define destructors for our classes. This is particularly important as there are some complex issues associated with these, which we have avoided discussing so far. It is generally accepted that the presence of a destructor performing a non-trivial task (such as freeing resources) also requires the explicit definition of other special member functions. These are: a copy constructor (which allows classes to be copied properly, Sect. 14.5.1); a copy assignment operator (which allows classes to be assigned, Sect. 15.3); and a move constructor and assignment operator (both of which optimise copy/assignment operations1 ). We have already noted in Sect. 15.3 that if a class holds external resources, we would need to define an assignment operator for it to handle these properly. In fact, the explicit definition of any of these five methods requires that the full complement should be given [63], since it implies that their compiler-generated versions are not suitable for the class design. By taking them out of our class definition, we can safely ignore this issue, which will reduce the complexity of the code structures we will be using. The added benefit is that we can concentrate more fully on the algorithms. The fundamental container template we need to know about is std::vector (whose definition is found in ). This is a wrapper around a dynamically allocated array (very much in the direction of the array template example, but more complete and flexible). It should perform nearly as efficiently as an ordinary dynamically-allocated C array, so we will be able to replace all our audio data vectors with it. The template takes a type argument and the class constructor a size (which can be a variable), or an initialiser expression (inside brackets { }): #include ... std::vector data(size); For instance, from now on, we should declare class Proc { protected: ... std::vector<double> m_vector public: Proc(..., uint32_t vframes, ...) : ..., 1

The move operation was introduced in C++11 to allow better performance when copying nontrivial classes. It is beyond the scope of this text to discuss it. Readers are referred to [63] for a complete description.

246

16 Filters

m_vector(vframes); ... }; and forget about the destructor, as we will not need it any more. The vector object will take care of all the allocation and deallocation of resources behind the scenes. The vector class has a number of important methods: • • • • • • • • • •

operator=: assigns a vector to another vector. assign(): assigns a value to an element. at(): accesses specified element with bounds checking . operator[]: accesses specified element without bounds checking. front(): first element. back(): last element. data(): returns a pointer to the data vector. size(): returns the number of elements. clear(): clears the vector. resize(): resizes the vector (which may involve reallocation).

Once we have a vector defined this way, we can treat it more or less like any array. This is because it includes a square-braces operator (operator[]()), which allows us access through the usual array index symtax. Therefore we do not need to modify much of the code we have been using so far in order to use a vector object: it is almost a drop-in replacement. Vectors also provide iterator members, which can be used to traverse the container. The following methods return iterators to an object: iterator begin(); const_iterator begin() const; and iterator end(); const_iterator end() const; An iterator can be used to walk through the array and access it through dereferencing: int main(){ std::vector v{1,2,3,4,5,6,7,8,9,10}; for(std::vector::iterator i = v.begin(); i < v.end(); i++) cout << *i << "\n"; return 0; } As you can see, the iterators in this case wrap up pointers to the underlying data type, pointing to the beginning and end of the vector. For some applications, these are not necessary (we could use just as well a counting variable and the vector size),

16.3 Conclusions

247

but for others, they will be useful. For instance, to copy memory, we will now use std::copy: std::copy(src.begin(), src.end(), dest.begin()); and, to fill in an array, we have std::fill(dest.begin(), dest.end(), value); Both operations are defined in the header file , which also contains a number of other useful functions.

16.2.2 Range-Based Loops For containers of this kind, which include iterators, C++ provides a variation on the for loop syntax called a ranged-based for. This is defined as: for ( range-declaration : range-expression ) body The range declaration provides a suitable local variable, which will be set to the various elements of the range expression. This needs to be a suitable object that provides begin() and end() methods, such as one from a vector class, or alternatively an array (but not one that is dynamically allocated, since the compiler needs to know the range of objects to iterate over). For example, int main(){ std::vector v{1,2,3,4,5,6,7,8,9,10}; for(int i : v) std::cout << i << "\n"; return 0; }

16.3 Conclusions In this chapter, we have introduced the idea of filters and shown how they can be implemented, with a number of examples of different types. These DSP operations are central to sound synthesis and processing, so it is very important to be able to understand how they work and how they are programmed. Following this, we looked at further ideas from C++ that will enhance our capacity to program in an object-oriented way. Templates can be very useful for a number of applications. In particular, they feature very strongly in the standard C++ library. From this, we explored a very useful container, the vector class, and demonstrated how it can be used to simplify memory management for our classes. From now on, we can avoid having to allocate memory directly and can use standard library containers to provide support for this. At this point, we should be looking at refactoring completely

248

16 Filters

our existing code library to take advantage of these new ideas and provide robust support for DSP programs. This task will be taken up in the next chapter.

Problems 16.1. Draw a flowchart for second-order DF I and DF II filters (Eqs. 16.10 and 16.9) and for a fourth-order low pass filter (Eq. 16.11), using the same format as in Figs. 16.1 and 16.2. 16.2. Implement the ToneHP, RMS, and ToneLP classes using the snippets of code provided. Make two of them derive from the other. 16.3. Reimplement the oscillator classes using std::vector. Write a program to test these.

Chapter 17

AuLib

Abstract This chapter explores the design of a class library for computer music instrument development. We review some of the principles outlined in earlier chapters and explore the object-oriented principles that are relevant to the implementation of this library. A tour of the existing classes is offered, alongside a fully worked-out application example. At this stage, we are just about ready to develop a library of classes for audio processing and synthesis, which we will call AuLib [35]. In this chapter, we will look at a design that will take advantage of the ideas sketched in the preceding discussions of object-oriented programming and signal processing components. We will take some time to reflect on the ideas exposed earlier and summarise them to provide the principles for the layout of this class library. Similar work has been explored in earlier projects with cognate aims. The Sound Object (SndObj) library [38] was released in 1998 as one of the first generally available free and open-source general-purpose C++ class libraries for audio processing. The original SndObj code was mostly based on pre-standard C++ [62], evolving later to include other developments from C++98 and C++03. Another early C++ toolkit was the Synthesis Toolkit (STK) [11], which was mostly oriented to synthesis with physical models. The SndObj library included not only signal processing classes but also support for cross-platform realtime audio and MIDI, before this was provided by dedicated libraries such as Jack, Portaudio, and Portmdi. The project aimed to encompass all the general-purpose audio use cases [32] in the time and frequency domains [64]. In the following decades since these early projects, many other C++ object-oriented audio projects were developed, as the language became firmly established as the pre-eminent platform for mid-to-low level sound and music computing. For the reasons already implied in previous chapters, the OOP paradigm is very useful as the fundamental model for a DSP library design. Function-only libraries, such as the one explored in [31] are useful insofar as they expose the algorithms in a simple form, in which they can be studied and played with. However, such an approach is not robust enough to be used more generally. © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_17

249

250

17 AuLib

With the advent of C++11 [23], the language has reached a stable state, supporting a flexible approach to audio programming. In comparison to the C language, C++ is still a system of large proportions, lacking some of the qualities of small languages, such as lightness, simplicity, and, in the case of C, ubiquitous portability (although it is present on a great number of platforms, it is not as universally supported by devices). C++ has been described as having “the power, the elegance, and the simplicity of a hand grenade”1 . These caveats notwithstanding, we should be able to proceed with this language as our mainstay language for musical signal processing. As mentioned above, the previous chapters in Part II of this book provide a good background to the development of AuLib. The main motivation is to create a simple, lightweight platform for the study of, and research into audio DSP algorithms, taking advantage of the newer, more established, C++ standards. This will also give us the added benefit of developing efficient code that can be easily packaged and deployed in general-purpose applications. After all, we have to ensure that the programs we write can be realistically applied. Within the scope of this book, the fundamental aim is to provide wrappers for algorithms, where the audio processing code can be easily accessed, studied, and modified. By using a thin interface layer, we will attempt to support easy connectivity of objects, as well as a class hierarchy that emphasises common components and code reuse. The library code attempts to adhere strictly to C++14 [25] standards and best practice, as it aims to provide an example of robust software design. In this chapter, we will present the library design within the context of audio programming systems in general. We will outline the decisions taken in its development, some of which are a reinforcement of ideas we have already rehearsed in earlier chapters. Following this, we will take a tour of the library and its current constituent classes. We will explore it from the perspective of signal generation, processing, and input and output. The source code is available from the repository at https://github.com/aulib/aulib. A developer’s reference is provided at http://aulib.github.io. Full programming examples at the end of this chapter and in Appendix A illustrate some typical applications.

17.1 Object-Oriented Audio Systems It can be claimed that some form of object-oriented programming has been present in sound and music computing since the very beginning [34, 52, 55]. One way to look at the pioneering work by Max Mathews in MUSIC III [41, 42] and IV [43] is to say that he was designing incipient object-oriented systems for music programming. In those systems, it is possible to liken the concept of instruments and their instances to classes and objects. These principles hace remained and evolved in some shape or form in all of their direct successors, MUSIC V [44], MUSIC 11 [65], cmusic 1

Quote attributed to Kenneth C. Dyke, 5 April 1997.

17.2 Library Design

251

[48], MUSIC 4C [4], and Csound [39], as well as in their indirect successors such as Pure Data [54]. It is interesting to note that even programming systems that support other paradigms, such as functional programming in FAUST [50], have backends written with, or are compiled to, OOP-based languages. This confirms much of what we have been stating in previous chapters: that object-orientation provides ways of modelling audio components in a very robust manner. This provides some confidence for us to embark on the design of a class library, whose details are outlined in the next section.

17.2 Library Design The library design borrows from a number of sources which have shown the best practice in the implementation of audio programming code. One of the guiding aspects was to allow a good deal of flexibility in the construction of classes, instead of mandating the presence of specific components via an abstract base class with a number of empty virtual methods. Instead, processing methods may or may not exist in a derived class, depending on what they are supposed to implement. They can be given any name, although the established informal nomenclature is to call them process(). The principal base class in the library is AudioBase. This is subclassed to implement all the different audio-handling objects, from synthesis/processing to signal buffers, function tables, delay lines, and audio IO. The layout of AudioBase is informed by a number of basic design decisions that underpin the principles of AuLib.

17.2.1 Stateful versus Stateless Representations One of the basic motivations for AuLib is to place self-standing algorithms, previously implemented as free functions, within an object that allows the safe-keeping of internal states. Let’s explore this idea, by reviewing the simple example of a sinusoidal oscillator that we explored in Chapter 13. As we have noted, their algorithm is basically described by:    s(t) = a(t) sin 2π f (t)dt (17.1) We have also seen that a C implementation of such a function would have to take account of the sample-by-sample phase values that are produced by the integration of the time-varying frequency f (t). Typically, in a sane implementation, the current value of the phase would be kept externally to the function, and modified as a sideeffect (Listing 17.1).

252

17 AuLib

Listing 17.1: C function implementing Eq. 17.1. double sineosc(double a, double f, double *ph, double sr){ double s = a * sin(*ph); *ph += twopi * f / sr; return s; } where twopi is an externally-defined constant, set to 2π . While this is entirely appropriate to demonstrate and expose the oscillator algorithm for study, it is clearly not robust enough to be incorporated into a library. Quite rightly, users would expect to be able to use such functions to implement multiple oscillators, in banks, or for amplitude or frequency modulation. In this context, a programmer could inadvertently supply a single phase address to a series of calls to such functions when implementing a bank of oscillators and would clearly fail to get the intended result. While it could work when carefully employed, such as stateless presentation of the algorithm is clearly incomplete. While there are ways of describing a sine wave oscillator in a stateless or purely functional fashion, once we are committed to defining the computation in a stateful form, we need to provide a means to keep an account of the current state. Clearly, a self-contained oscillator will need to maintain the last computed value of the phase, as the algorithm contains an integration. For this, we can wrap the whole algorithm in a class that models its state and the means to get an output sample. A minimal C++ class, similar to some of those discussed in Chapter 13 can be used to implement such oscillator is shown in Listing 17.2. Listing 17.2: C++ class implementing Eq. 17.1. struct SineOsc { double m_ph; double m_sr; SineOsc(double ph, double sr) : m_ph(ph), m_sr(sr) {}; double process(double a, double f){ double s = a * sin(m_ph); m_ph += twopi* f / m_sr; return s; } }; With an object-oriented implementation, the stateful description of the algorithm is complete and provides enough robustness for use in a variety of contexts. Likewise, if we look across the various types of DSP operations that a library would hope to implement, we will see all sorts of state variables involved. This provides enough motivation for the wrapping of such algorithms in C++ classes.

17.2 Library Design

253

17.2.2 Abstraction and Encapsulation In fact, by clearly describing an algorithm as having a state and a means of computing its output, we are abstracting the DSP object as a specific data type. This encapsulates all the kinds of operations we would expect to be able to apply to such an object. What are the things we would like any DSP algorithm to contain? It would be useful for instance for it to hold its output so that we only need to compute it once. Basic attributes such as the sampling rate and the frame size (number of channels in an interleaved signal) would also be essential. Additionally, we have noted before, in Chapter 13, that processing should not be limited to frame-by-frame computation (as in the minimal example of the oscillator in Listing 17.2). It has been firmly established that this is not the best practice for efficient audio computation [13]. Therefore as we have already become used to seeing, a block of frames, which may vary in size, is generated for each call of a processing method. A means of registering whether the object is in an error state would also be useful for program diagnostics. In this formulation, a class that models a generic audio DSP object would contain the following attributes (Listing 17.3) Listing 17.3: Attributes of the audio DSP base class class AudioBase { protected: uint32_t m_nchnls;// no of channels uint32_t m_vframes;// vector size std::vector<double> m_vector; double m_sr;// sampling rate uint32_t m_error;// error code ... }; These are protected so that no unintended modification is allowed. This class is for all practical purposes a wrapper around an audio vector (of double floating-point samples). Methods for basic manipulation are also added: scale, offset, modulation, mixing, and sample access are provided through overloaded operators. Setting and getting samples off the vector are also provided (single channel samples, full blocks, etc.), and to modify the vector size, as well as methods to get the value of the object attributes. It is important to take good care in the design of the base class, as this will pay good dividends as the library is developed.

17.2.3 Code Reuse Since we have embraced, for good reasons, the object-oriented approach, it is very useful to take advantage of inheritance, as well as composition. For this reason,

254

17 AuLib

the class hierarchy has been designed from the most general to the most specific, although overall the tree is not very deep (six levels at most). As an example, the ResonZ class shows how the reuse of code can be employed. In Fig. 17.1, we see that it is subclassed from a series of parents.

Fig. 17.1: The ResonZ class and its parents.

At the top level, Iir implements the basic second-order IIR filter engine in direct form II (eq.17.2), with externally-defined coefficients. w(t) = x(t) − b1 w(t − 1) − b2 w(t − 2) y(t) = a1 w(t) + a1 w(t − 1) + a2 w(t − 2)

(17.2)

The LowP class holds a frequency parameter to calculate Butterworth low-pass coefficients; BandP adds a bandwidth attribute and re-implements the calculation of coefficients for a Butterworth band-pass configuration; ResonR re-implements the coefficient computation for a resonator with an extra zero at R; and ResonZ just sets the a2 coefficient to −1, otherwise using the coefficient update code from its parent. This shows an example of how each subclass represents a small modification of its parent, with most of its code reused. Another benefit is that if a modification

17.2 Library Design

255

needs to be made (e.g. a bug fix), it does not need to be reproduced at several places (which opens the door to introducing small errors at these different locations). Code reuse through composition is also employed throughout the library. For example, the Delay class (see Sect. 18.2) holds an AudioBase object that implements its delay line, using the inlined access methods provided in that class. The Balance class, which implements envelope following and signal amplitude balancing, is made up of two Rms objects that are used to measure the RMS amplitude of input signals. Rms itself is a specialisation of a first-order low-pass filter class. In another example, the TableSet class, which is a utility class for the band-limited oscillator class BlOsc, is made up of a vector of FourierTable objects containing waveform tables.

17.2.4 Connectivity Some special attention needs to be given to the ways in which objects can be easily connected with each other forming high-level signal processing graphs. It is also important to consider how library objects can interact with code from other libraries (both in C and in C++). There are two major ways in which we could connect objects together in a graph: 1. Through raw pointers to data: these are presented in the form of const double* arguments to allow signals from other libraries and non-AuLib sources to be inserted as inputs to processes. These, in turn, also return a const double* to the object vector so that they can be sent to other destinations. This type of connectivity is unsafe from a C++ perspective, as it requires the programmer to carefully observe and match the vector boundaries, although it is commonplace within a C-language context. 2. Through object references: processing methods also allow connections to and from const AudioBase& variables, which provides more safety since vector boundaries are checked before access. They are the preferred way to pass signals from one library object to another. For convenience, classes overload operator() as a shortcut for processing methods using object references. This allows a function-like composition of operations, as in, for instance, out(obj1(obj2(in())); While there is no mandatory way in which this is enforced in derived classes, an informal convention is to provide two processing methods as part of the public interface, in addition to the overloaded function operator. One of them would deal with data pointers (producing and/or consuming arrays) and the other would use object references as input and/or output. This shown in Listing 17.4, where an AudioBase-derived class is laid out. These public methods delegate to a private virtual DSP method, which does the actual processing for the object and may be overridden in a derived class. In general, the AuLib design follows best practice of

256

17 AuLib

avoiding virtual methods in the public interface. The only exception to this is in the definition of signal arithmetic operators (in AudioBase), where implementation simplicity is the main concern. Appendix A discusses further details on deriving new classes from AudioBase. Listing 17.4: Processing methods and their connectivity in an AudioBase-derived class. class Proc : public AudioBase { virtual const double *dsp(const double *sig); public: const double *process(const double *sig) { return dsp(sig); } const Proc &process(const AudioBase &obj) { if(obj.vframes() == m_vframes && obj.nchnls() == m_nchnls) { process(obj.vector()); } else m_error = AULIB_ERROR; return *this; } const Proc &operator()(const AudioBase &obj) { return process(obj); } ... }; No blocking operations (and/or resource allocation) should take place in a processing method. This is also the case for all inline vector manipulation methods (operators, etc.) provided by the base class, which are all realtime safe (cf. Sect. 15.3). It is particularly important for the library design to be defensive with regard to these aspects, so that developers are not led into inadvertently writing code that may turn out not to be good for realtime use.

17.3 A Tour of the Library Most of the library classes sit im the main AudioBase tree. Figures 17.2 and 17.3 show some of these, in terms of generators, input and output, function tables, and processor classes. Many of these originated through refactoring of the code discussed in earlier chapters. The library classes can be loosely categorised as follows: processing (signal generators and processors, i.e. they implement process() methods); function tables, holding mostly constant buffers; and input/output, which allow some form of audio IO through read() or write methods. In addition to these, a number of specialised classes have been designed for high-level control of signal processing and

17.3 A Tour of the Library

257

Fig. 17.2: AuLib class hierarchy: generators, input and output, and function tables.

synthesis. Most of these classes take full advantage of the facilities and design of the AudioBase class.

17.3.1 Signal Generators The signal generators in AuLib include standard table-lookup oscillators, sampledsound and band-limited waveform oscillators, a phase generator, table readers, and envelope generators. The SamplePlayer class takes a buffer/function table con-

258

17 AuLib

Fig. 17.3: AuLib class hierarchy: processors.

taining recorded samples and plays it back with pitch and amplitude control either in a loop or as a single-shot performance. It can handle multichannel sample tables, producing multichannel output, and uses linear interpolation for table lookup. The library supports a number of function table classes (derived from a basic model given by FuncTable), which hold waveforms, envelopes, or signal samples. These can be read by oscillators or by table lookup objects, whose indices can be derived from any signal. A phase generator connected to a table reader implements an oscillator algorithm. The BlOsc class implements band-limited waveform synthesis using wavetables stored in a TableSet object. This contains a set of band-limited tables that are selected according to the desired fundamental frequency. Currently, TableSet supports classic waveforms (such as sawtooth, square and triangle) constructed us-

17.3 A Tour of the Library

259

ing FourierTable objects. However, the mechanism can be expanded to handle generalised band-limited waveforms. The library contains single-segment linear and exponential signal generators, which can be triggered and reset. Extending these, a generalised multi-segment plus release envelope class Envel is provided. It uses a utility class, Segments, that is used to set up a segment list that can be shared among several envelopes (and also used for envelope tables). A pre-packaged four-segment envelope ADSR is derived from it as a convenient way to create simple envelopes. The release segment in these classes is triggered by a specific method (release()), which makes the envelope jump immediately to that stage.

17.3.2 Signal Processors AuLib includes a basic set of signal processing classes. Seven types of second-order and two first-order (low- and high-pass) filters are present, alongside root-meansquare detection and signal balancing. As we will see in the next chapter, the Delay class implements fixed or variable delays (depending on the choice of overloaded processing functions), with or without feedback. It can implement comb filters, flangers, vibrato and chorus effects. Derived from it, we have a high-order all-pass filter and a general-purpose finite impulse response filter (implementing direct convolution, which is discussed in Chapter 18). Delay objects can be tapped by Tap (truncating) or Tapi (interpolating) processors. Some signal-processing utilities are present. A channel extractor, Chn, takes an interleaved multichannel input and outputs a requested channel. This is needed to allow access to single channels for objects that are designed to manipulate mono signals only. A signal bus is provided by SigBus, which can be used as a mixing buffer with scaling and offset. Completing these, there is an equal-power panning class, Pan, that produces a stereo output from a mono input signal. In addition to these time-domain processing classes, AuLib provides support for streaming spectral processing using the short-time Fourier transform and its derivative, the phase vocoder. Stateless free functions for complex and real-input discrete Fourier transform (using a radix-2 algorithm) are implemented from first principles. Chapter 19 provides a detailed discussion of these techniques as implemented in AuLib.

17.3.3 Audio Input and Output A basic audio IO facility is provided as part of the library, through the SoundIn and SoundOut classes. This is to allow programs to be written without the need to access external libraries directly, rather than to provide a complete cross-platform

260

17 AuLib

IO solution. The interface is fairly agnostic as far as its implementation is concerned. Currently, it provides a frontend to libsndfile [40], for soundfile IO; Portaudio [5], for realtime device IO; and std::iostream for standard text IO. It is implemented asynchronously, and it is capable of low-latency audio (at least as far as the underlying service allows it). Users of the library do not actually depend on these two IO classes. For instance, applications would place the processing classes directly in an audio system callback (e.g. through Jack), without the use of any AuLib IO object. Equally, a processing graph based on library objects can be incorporated into a variety of settings, such as embedded hardware, mobile devices, etc.

17.4 Synthesis and Processing Control AuLib includes support for controlling sound synthesis and processing at a higher level. This is provided by the following classes: • AuLib::Note: this class provides support for composing signal processing graphs. It can be subclassed to provide a container object that will model a note on an instrument, with a well-defined control interface. • AuLib::Instrument: this is a template class that takes in a Note-derived class to create an instrument based on it. This class is responsible for instantiating and controlling note objects. • AuLib::MidiIn: this is a MIDI controller class that takes in instruments, listens for MIDI input, and dispatches control data to them. This class currently uses Portmidi, but as with audio IO, the backend implementation can be changed to a different MIDI library if needed. • AuLib::ScorePlayer: this a score controller, also taking in instruments and dispatching the control data from an AuLib::Score object to them. These classes are built to take advantage of the framework given by AudioBase, so that data can be passed seamlessly from control to processing objects, whose output can be tapped very flexibly.

17.5 An AuLib Instrument To complete the discussion, we present a full programming example demonstrating some AuLib classes within a C++ OOP design. For this, we choose to implement a signal processing instrument that will take an input with any number of channels, apply a feedback delay effect and produce a stereo output with the input sources spread evenly between the two channels. The structure of this instrument, for a single channel, is shown in a flowchart in Fig. 17.4. Multiple channels will share the SoundIn, SigBus and SoundOut

17.5 An AuLib Instrument

261

objects, but will feature separate Chn, Delay, and Pan objects. The program takes in input and output names (source and destination), plus the delay time and feedback gain as arguments: $ delay <src> <dest> <delay time> The key signal processing object in this program is provided by the Delay class, which is discussed in Sect. 18.2 in the next chapter. It takes an input and puts it through a delay line, feeding its output back into it, scaled by a gain. Depending on the delay time, the effect can consist of a series of echoes (long delays) or of a resonating filter (short delays). The feedback gain needs to be less than 1, otherwise the output will grow out of control.

SoundIn ? Chn ? Delay ?  +i ? Pan ? SigBus ? SoundOut Fig. 17.4: Signal flowchart for a single input channel in the delay program.

The complete program is shown in Listing 17.5. It uses an AuLib::SoundIn object to access its input (line 38), whose source may be a soundfile, the default soundcard ("adc"), or the standard input ("stdin"). The choice of input is taken as the first argument of the program, and if the object cannot be constructed without errors, the program exits. This input determines the number of channels that will be used in the instrument. Since this is dependent on the source, we will dynamically create vectors of objects to process each channel of input. This demonstrates yet another use for the std::vector container of the Standard C++ library class. For each channel we will require a channel reader, a delay, and a panner object. The vectors holding these are constructed in lines 44–51. Note that delay times and

262

17 AuLib

feedback gain are taken from the third and fourth arguments in the command line. In order to prevent any blow-up, we coerce the feedback gain to be a non-negative number less than 1. Finally, a single AuLib::SigBus object is employed to accumulate the outputs of each separate channel (line 53) and feed the AuLib::SoundOut output (line 55), which also takes its destination from the command line (second argument), with similar options to the input (soundcard, soundfile, or standard output). In order to facilitate the processing of each channel, we create an ordered list of channels as an integer vector, which can then be used in a range-based for loop that iterates over this list. The std::iota function is used to fill the vector with the channel numbers.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

#include #include #include #include #include #include #include #include #include #include #include #include #include

Listing 17.5: Delay example program. <Pan.h> <SigBus.h> <SoundIn.h> <SoundOut.h>

using namespace AuLib; using namespace std; // handle ctrl-c static atomic_bool running(true); void signal_handler(int signal) { running = false; cout << "\nexiting...\n"; } int main(int argc, const char **argv) { if (argc > 4) { // audio input SoundIn input(argv[1]); if(input.error() != AULIB_NOERROR) { cout << "error opening input\n"; return -1;

17.5 An AuLib Instrument

33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77

263

} // input channels vector chn(input.nchnls()); // delay lines double fdb = fabs(atof(argv[4])); vector delay(input.nchnls(), Delay(atof(argv[3]), fdb < 1.0 ? fdb : 0.99, def_vframes, input.sr())); // stereo panning vector<Pan> pan(input.nchnls()); // mixing bus SigBus mix(1./input.nchnls(), 0., false, 2); // audio output SoundOut output(argv[2], 2, def_vframes, input.sr()); if(output.error() != AULIB_NOERROR) { cout << "error opening output\n"; return -1; } uint64_t end = input.dur() + 5*output.sr(), t = 0; // list of channels vector channels(input.nchnls()); iota(channels.begin(), channels.end(), 0); signal(SIGINT, signal_handler); cout << Info::version(); while (t < end && running) { input(); for(uint32_t channel : channels) { chn[channel](input, channel + 1); delay[channel](chn[channel]); pan[channel](delay[channel] += chn[channel], (1 + channel)*input.nchnls()/2.); mix(pan[channel]); } t = output(mix); mix.clear(); } return 0; } else cout << "usage: " << argv[0] << " <source> <dest> <delay> \n"; return 1; }

264

17 AuLib

The processing loop, lines 60–71, processes one vector of audio at a time and continues processing until the input ends or a ctl-c (SIGINT) signal is sent to the program (for which a signal handler is registered in line 65). We employ the function-call operators of each object to access their DSP operations (e.g. input(), delay[channel]()), and each channel is processed in the inner range-based for loop. We take advantage of the overloaded operator+= (in line 74) as a means of adding dry and wet-effect signals in the pan processing input. Note that the mix object is called to accumulate the outputs of each channel and is then cleared after use.

17.6 Conclusions This chapter has described the design of a simple, lightweight audio DSP library in C++, based on the principles developed in earlier chapters. The main motivation is to provide a platform to develop and collect algorithms for the study, teaching and research in audio programming. The library classes are effectively thin wrappers that envelop succinct and efficient implementations of DSP operations. The code has been designed to be robust enough for general-purpose deployment in audio processing applications. The next chapters in the book will employ the library while exploring specific DSP algorithms. A general reference for the library is given in Appendix A The library developer’s manual can be found at https://aulib.github.io, and the source code repository URL is https://github.com/aulib/aulib.

Problems 17.1. Write a version of the MIDI synthesiser program presented Chapter 12 using AuLib classes and including an envelope. 17.2. Derive a class from AuLib::AudioBase to implement a fourth-order lowpass filter as described in Chapter 16 and [20]. Write a simple test program to demonstrate it.

Chapter 18

Delay Line Processing

Abstract In this chapter we explore the concept of delay lines and their applications, as found in computer music instruments. Following an introduction to the main aspects of delay line programming, we explore the implementation of fixed, variable and multitap delays. The convolution operation and finite impulse response filters are discussed as an extreme example of tapping a delay line at multiple points. From the programming perspective, we introduce the principles of lambda functions and closures, which provide elegant means of implementing certain types of computation. Delay lines are employed in a significant number of audio signal processing applications. Given that the methods we have been exploring so far are intrinsically linked to the timing of the samples in a waveform, this is not surprising. In fact, we can group all the techniques and algorithms we have discussed up to now under the time-domain designation. In this chapter, we will expand our understanding of what it is to apply delays to an input signal, as well as of the results of manipulating both the amounts of delay (delay times) and the mix of direct and delayed signals. A whole category of related effects can be derived from these principles. In general, we can distinguish between two groups of delay effects, according to whether they are based on fixed or variable delay times. In the first case, we have primarily the IIR filter equations we have already studied in Chapter 16, which employ mostly very short delays of one to a few samples, and the echo and reverberation effects that we will see in the present chapter. These, instead, are based on much longer delays, which can be up to several seconds long. The variable-delay effects also often employ delay times that will be modulated within a wide range, from zero to several milliseconds. These take advantage of side effects of varying the delay time that can have pitch and timbre modification consequences. Regardless of the type of delay effect, if it requires longer delays, it should use a basic algorithm to apply the expected time delay to the signal. This is called a circular buffer and the one to be employed here is based on a variation on the principles we have already encountered in Chapter 12, Sect. 12.4. While there the requirements were to have a queue to keep two threads synchronised, here we only want © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_18

265

266

18 Delay Line Processing

a sequence of samples to be held in memory for a certain period of time, while the expected delay time elapses. It turns out that the most efficient way to do this for a large number of samples is also to employ a circular buffer.

18.1 Circular Buffers Any signal delay is only meaningful if it has a time reference to which it is going to be compared. For instance, if we are writing audio samples to disk or to memory, then the time delay is just an offset in relation to a given frame position. If we are sending the data in realtime to a DAC, then, while the delay is just still a sample offset, it can also be conceived as diverting the signal to a memory location, from which we will read some time later. This is in fact the most general solution to implementing delays, regardless of the destination of the audio signal (realtime or not). So what we need is a queue where samples go in at one end and come out at the other end after a certain number of sampling periods. This is what we call a FIFO queue (Fig. 18.1).

in -

-

out

Fig. 18.1: FIFO queue.

Naturally, we might think that such a queue should be implemented as follows: out = fifo[fifo_size-1]; for(int i = fifo_size-1; i > 0; i--) fifo[i] = fifo[i-1]; fifo[0] = in; In other words, at each sampling period, we pop the head of the queue, and then move, one by one, each sample in the queue one position forward. Then we push in a new sample to the first position that is now vacant. While this is correct from the point of view of the inputs and outputs and the FIFO principles, it is hardly very efficient from a computational perspective. For each sample processed, we have to move N numbers around (N being the delay time in samples). It is optimal only for a few samples, and not practical for delays of several milliseconds. The main observation that we can draw from looking at the FIFO layout is that we only need to care about pushing one sample in and popping another out from a buffer. This means we only need to replace one position in memory, if we accept that it can be read in a circular fashion. What we do then is to move the nominal head/tail position of the queue along the memory block, rather than move its contents. This is

18.2 Fixed-Delay Effects

267

shown in Fig. 18.2. It only works if we are able to wrap around the end of the buffer, but we have been doing this all the time in table-lookup oscillators, which employ the same reading principle, without the overwriting part. Interestingly, with a simple modification of the naive FIFO code above, we can achieve an efficient circular-buffer implementation: out = fifo[rp]; fifo[rp] = in; rp = rp != fifo_size - 1 ? rp + 1 : 0; Note that, as if by magic, the processing loop has been disposed of. That is right: with this algorithm to impose a delay of N samples, we only need three operations: pop, push, and update. It has constant time complexity as opposed to being dependent on the delay size. All we needed to do is to keep track of the replacement position. In the simplest cases, we only need to keep track of this position, as the writing operation will always be preceded by the reading operation in the same buffer location. However, in the case of variable delays, we will need to keep track of the reading and writing indices separately.

in ? -

?

out Fig. 18.2: Delay line circular buffer.

18.2 Fixed-Delay Effects Fixed-delay effects employ the simplest cases of circular-buffer queues. The delay time is determined by the size of the buffer, ansd therefore we will allocate memory according to the requested delay. This can be simply computed as the product of the sampling rate and the delay time. Once this is done, as part of the object initialisation, we will not change it (after all, it is fixed). The effect is given by the operation

268

18 Delay Line Processing

of reading and writing to the circular buffer, as shown above, with the following order of operations: • The output is read from the buffer at the current position. • The input is written to the buffer at the same location. • The position index is incremented, circularly (modulo the delay size). The following processing function implements these steps for a delay-line class1 : const double *AuLib::Delay::dsp(const double *sig) { for (uint32_t i = 0; i < m_vframes; i++) { m_vector[i] = m_delay.set(sig[i], m_pos); m_pos = m_pos == m_delay.vframes() - 1 ? 0. : m_pos + 1; } return vector(); } where the member variable m_pos is responsible for keeping the current index. The delay line is an AuLib::AudioBase object (see Listing 18.1), and the AudioBase::Set() method is used to set the delay line sample at a given position, returning the old sample stored in it (steps 1 and 2 above). AuLib::Delay is the base class for all delay-based objects. It encapsulates a delay line, which is implemented as an AuLib::AudioBase object, as shown in Listing 18.1. As with other classes in the library, it was designed in such a way that it could be specialised for the various different types of delay effects we will explore in this chapter, maximising code reuse. For this reason, you will see in the public interface some prototypes for variable delay lines, which are also implemented by this class and will be discussed later. Listing 18.1: The AuLib::Delay class. /** Fixed or variable delay line with optional feedback (Delay, Comb filter, Flanger) / * class Delay : public AudioBase { virtual const double *dsp(const double double dt); virtual const double *dsp(const double const double virtual const double *dsp(const double

*sig, *sig, *dt); *sig);

protected: double m_fdb; 1 From now on, examples will be given directly from code in AuLib, please refer to it for the complete classes.

18.2 Fixed-Delay Effects

269

uint32_t m_pos; AudioBase m_delay; public: /** Delay constructor \n\n dtime - delay time \n vframes - vector size \n sr - sampling rate */ Delay(double dtime, double fdb, uint32_t vframes = def_vframes, double sr = def_sr) : AudioBase(1, vframes, sr), m_fdb(fdb), m_pos(0), m_delay(1, 1, sr) { m_delay.resize_exact(dtime >= 0. ? dtime * sr : 1); } /** delay a signal sig for a fixed time */ const double *process(const double *sig) { return dsp(sig); } /** delay a signal for dt seconds */ const double *process(const double *sig, double dt) { return dsp(sig, dt); } /** delay a signal for dt seconds and with optional feedback fdb / * const double *process(const double *sig, double dt, double fdb) { m_fdb = fdb; return dsp(sig, dt); } /** delay a signal for delay time taken from the signal dt / * const double *process(const double *sig, const double *dt) { return dsp(sig, dt); }

270

18 Delay Line Processing

/** delay a signal for delay time taken from the signal dt and with optional feedback fdb */ const double *process(const double *sig, const double *dt, double fdb) { m_fdb = fdb; return dsp(sig, dt); } /** delay a signal in obj for a fixed time */ const Delay &process(const AudioBase &obj) { if (obj.vframes() == m_vframes && obj.nchnls() == m_nchnls) { process(obj.vector()); } else m_error = AULIB_ERROR; return *this; } /** delay a signal in obj, optionally for dt seconds. */ const Delay &process(const AudioBase &obj, double dt) { if (obj.vframes() == m_vframes && obj.nchnls() == m_nchnls) { if (dt < 0) process(obj.vector()); else process(obj.vector(), dt); } else m_error = AULIB_ERROR; return *this; } /** delay a signal in obj, optionally for dt seconds and with feedback fdb. / * const Delay &process(const AudioBase &obj, double dt, double fdb) { m_fdb = fdb; return process(obj, dt); }

18.2 Fixed-Delay Effects

/** delay a signal in obj for dt sec with variable delay time sig */ const Delay &process(const AudioBase &obj, const AudioBase &dt) { if (obj.vframes() == m_vframes && obj.nchnls() == m_nchnls && dt.vframes() == m_vframes && dt.nchnls() == m_nchnls) { process(obj.vector(), dt.vector()); } else m_error = AULIB_ERROR; return *this; } /** delay a signal in obj for dt sec with variable delay time sig and with optional feedback fdb. / * const Delay &process(const AudioBase &obj, const AudioBase &dt, double fdb) { m_fdb = fdb; return process(obj, dt); } /** operator(a,b,c) convenience method */ const Delay &operator()(const AudioBase &a, const AudioBase &b, double c) { return process(a, b, c); } /** operator(a,b) convenience method */ const Delay &operator()(const AudioBase &a, const AudioBase &b) { return process(a, b); } /** operator(a,b,c) convenience method / * const Delay &operator()(const AudioBase &a,

271

272

18 Delay Line Processing

const double b, return process(a, b, c);

double c) {

} /** operator(a,b) convenience method */ const Delay &operator()(const AudioBase &a, double b) { return process(a, b); } /** operator(a) convenience method */ const Delay &operator()(const AudioBase &a) { return process(a); } /** get the current write position */ uint32_t pos() const { return m_pos; } /** get a reference to the delay line. / * const AudioBase &delayline() const { return m_delay; } }; The Delay class and its subclass AllPass implement two typical algorithms for fixed-delay reverberation and echo: comb and all-pass filters. They can be combined together in different arrangements to construct artificial reverberation effects.

18.2.1 Comb Filters Comb filters are very similar to the straight fixed-delay algorithm as implemented in Sect. 18.2. The only difference is that they include a scaled feedback signal path from the output back to the delay line input (Fig. 18.3). The amount of feedback g determines how much of the output gets recirculated. Comb filters are named after their characteristic amplitude response, which resembles an upside-down comb (with teeth sticking upwards). This causes the effect of filtering out some bands (at the trough of the amplitude response), while enhancing others (at the peaks). The peaks are spaced evenly in frequency, at τ1 Hz, where τ is the delay (or loop) time. For more details on the signal processing characteristics of this processor, see [39]. The g parameter determines the total reverb time r of the comb filter for a given delay line length:

18.2 Fixed-Delay Effects

273

gain ? i ×

input

- +? i

- output

delay Fig. 18.3: Comb filter.

 g=

1 1000

r τ

(18.1)

A simple modification of the Delay DSP method presented earlier implements a comb filter, where g is defined by the m_fdb member variable: const double *AuLib::Delay::dsp(const double *sig) { for (uint32_t i = 0; i < m_vframes; i++) { m_vector[i] = m_delay.set(sig[i] + m_delay[m_pos] * m_fdb, m_pos); m_pos = m_pos == m_delay.vframes() - 1 ? 0. : m_pos + 1; } return vector(); } In AuLib, rather than deriving a trivial class implementing a comb filter, the Delay class instead implements this code instead of the original process with no feedback. This behaviour can be easily emulated in the modified class by setting its feedback parameter to zero. An example of the use of the Delay class was given in Listing 17.5.

18.2.2 All-Pass Filters All-pass filters, unlike comb filters, feature a flat amplitude response, and therefore they do not impart a strong timbral colouration to their input, although they are prone to ringing at abrupt transitions (e.g. when a signal is turned on or off suddenly) [15]. To implement this, they require both a feedforward and a feedback signal path in addition to the delayed signal, as shown in Fig. 18.4. Given these differences, a minimal derived class can be created to model this component. Note that since it implements solely fixed-delay processing, the DSP methods related to variable delay time (those with variable delay parameters) dele-

274

18 Delay Line Processing

gain ? i ×

input

? - +i -

delay

- +i- output

- ×i 6 -gain

6

Fig. 18.4: Allpass filter.

gate to the fixed delay method, ignoring any changes in delay time. This can only be set when an object is constructed. Listing 18.2: The AuLib::AllPass class. /** All-pass filter */ class AllPass : public Delay { virtual const double double dt) { virtual const double const double return dsp(sig); } virtual const double

*dsp(const double *sig, return dsp(sig); } *dsp(const double *sig, *dt) {

*dsp(const double *sig);

public: /** AllPass constructor \n\n dtime - delay time \n fdb - feedback gain \n vframes - vector size \n sr - sampling rate */ AllPass(double dtime, double fdb, uint32_t vframes = def_vframes, double sr = def_sr) : Delay(dtime, fdb, vframes, sr){}; }; The only method that needs to be implemented in this class is AllPass::dsp(): const double *AuLib::AllPass::dsp(const double *sig) {

18.3 Variable Delay Lines

275

double y; for (uint32_t i = 0; i < m_vframes; i++) { y = sig[i] + m_fdb * m_delay[m_pos]; m_vector[i] = m_delay.set(y, m_pos) - m_fdb * y; m_pos = m_pos == m_delay.vframes() - 1 ? 0. : m_pos + 1; } return vector(); } As discussed in [39], comb and all-pass filters may be connected together to construct an artificial reverberator. A standard way of doing this is to connect a set of comb filters in parallel, which would feed a set of all-pass filters in parallel. The basic layout employs four comb filters, whose delay times are unrelated to each other (to even out the timbral colouration), followed by two all-pass filters with short delay times (of the order of a few milliseconds) and reverb times. The function of these is to provide early echoes and help thicken the reverberation, whereas the comb filters are responsible for the diffuse tail end of the effect. Therefore the total reverb time is defined by the comb filter g parameter2 .

18.3 Variable Delay Lines The AuLib code discussed so far takes the advantage of the fact that the delay line size will not change once an object is constructed. This allows the code to be simplified: one single read and write index is needed, and the delay time is given by the size of the buffer. Also in this case, the delay time in seconds is rounded to the nearest integer length in samples. While this is not exact, it is not problematic in most cases. If a precise fractional-sample delay were required, then we would need to apply one of the methods described in the literature [30]. The need for a more precise delay line lookup also arises when we are implementing variable delays. For a number of applications where some sort of pitch modification is the target, linked or not to a delay effect, a modification of the delay algorithm admitting a change in delay time during performance is required. For this, we will need to decouple the reading and writing operations and use two separate indices. The writing into the delay line will always proceed in single-step increments, as we would expect, but the reading position may jump by more than one position, move backwards, or stay fixed. This is because the actual delay time will now be calculated on the basis of the difference between the reader and writer positions. So the read index will be a certain number of samples behind the writer, up to a maximum which is defined by the buffer size. If the requested delay time in seconds translates into a fractional number of samples, we must be careful not to always round it to an integral value. This, in many 2

See also Problem 18.1.

276

18 Delay Line Processing

applications, will result in poor quality audio. For this reason, we should apply interpolation algorithms such as those discussed in Chapter 14. For these in particular, we will need to be careful when the interpolation needs to occur at the end of the delay line, since in this case we will need to look back at the first sample or samples in order to apply the algorithm correctly. In the case of cubic interpolation, we also need to be careful when the reader index is at the start of the delay line. All of these cases follow from experiences we have had with interpolating oscillators. The simplest applications of variable delays, however, may not require any interpolation at all. For delays that are only changing very slowly, we can implement truncated lookup. For instance, the Delay::dsp() method taking a scalar value argument for the delay time does this: const double *AuLib::Delay::dsp(const double *sig, double dt) { uint32_t ds = dt * m_sr; int32_t rp; if (ds > m_delay.vframes()) ds = m_delay.vframes(); for (uint32_t i = 0; i < m_vframes; i++) { rp = m_pos - ds; if (rp < 0) rp += m_delay.vframes(); m_vector[i] = m_delay[rp]; m_delay[m_pos] = sig[i] + m_vector[i] * m_fdb; m_pos = m_pos == m_delay.vframes() - 1 ? 0. : m_pos + 1; } return vector(); } Nevertheless, in general we will need some sort of interpolation. This is provided by the Delay::dsp() method that takes a signal vector instead to control the delay time: const double *AuLib::Delay::dsp(const double *sig, const double *dt) { double rp, ds, a, b, frac; uint32_t irp; for (uint32_t i = 0; i < m_vframes; i++) { ds = dt[i] < 0. ? 0 : dt[i] * m_sr; if (ds > m_delay.vframes()) ds = m_delay.vframes(); rp = m_pos - ds; if (rp < 0) rp += m_delay.vframes(); irp = (uint32_t)rp; frac = rp - irp;

18.3 Variable Delay Lines

277

a = m_delay[irp]; if (++irp == m_delay.vframes() - 1) irp = 0; b = m_delay[irp]; m_vector[i] = a + frac * (b - a); m_delay[m_pos] = sig[i] + m_vector[i] * m_fdb; m_pos = m_pos == m_delay.vframes() - 1 ? 0. : m_pos + 1; } return vector(); } The following effects are typical applications of variable delays: • Flanger: the flanger is a variable-delay comb filter, whose delay time is variable between close to 0 to only a few milliseconds. The amount of gain feedback will determine the quality of the effect, by making the peaks of the comb more prominent. As noted before, the delay (or loop) time determines the spacing of the comb peaks, and therefore varying it will cause a filter sweep effect. • Vibrato: vibrato is implemented by a straight variable delay with no feedback, which is modulated by a periodic source such as a low-frequency oscillator (LFO)3 . By varying the delay time a little more than in the flanger effect, we will cause a pitch modulation effect. This is because of the difference in the delay reading and writing rates: if the delay is decreasing, then the read index is proceeding at faster rate than the writing one (their difference decreases), and we have a raise in pitch; on the other hand, if the delay is increasing, the reading rate has to be slower, causing a drop in pitch. If the modulating source is made up of linear segments (e.g. a triangular wave), the result will be an alternation of two (or more) fixed pitch transpositions. In the case of non-linear sources (e.g. a sine wave), we will hear a smooth variation of pitch within a range that is determined by the amount of modulation applied. • Chorus: the chorus effect tries to model two things: (a) the slightly asynchronous nature of multiple instruments playing together (time delays); and (b) a slight detuning that also takes place. The first is achieved by the delay time effect, and the second by a pitch modulation effect (a slow vibrato). The chorus effect is created by modulating a signal and mixing the result with its input (generally no feedback is used). The delay times are slowly modulated so to create fine detuning effects and some delay asynchrony. • Doppler: the Doppler effect is constructed by applying a change in delay time. Since distance can be equated with a given time delay, by varying this we can simulate a change in source sound position. Associated with this, a change in amplitude is also needed to make the effect realistic. • Pitch shifter: as we have noted, the differences in reading and writing rates caused by delay time changes will create a pitch shifting effect. If these differences are 3

Effectively, this is just an ordinary oscillator with sub-audio fundamental frequencies.

278

18 Delay Line Processing

constant, then a constant pitch shift will take place. However, with a finite delay buffer at some point one of the indices will overtake the other, causing the delay to go from maximum to minimum or the other way round. At this point there is a waveform discontinuity, and a click is heard. In order to remove this, we will need to fade the reading in and out in synchrony with this discontinuous position, using a periodic envelope (or windowing). With one reading ‘head’, this will add an amplitude modulation effect. With two or more of these, offset by a certain amount, this modulation artefact may be minimised.

18.4 Multiple Taps The example of the pitch shifter effect suggests an interesting approach, the use of more than one reading position, also called multiple taps into a delay line. In that particular application, it is used to smooth the modulation effects caused by the envelope, but in others it can be used for multiple delays or echoes. To take advantage of this, the AuLib library provides two classes, Tap and Tapi that implement taps into an existing delay line object. The latter is an interpolating tap, and the former truncates the readout position. Here is the class interface for Tap: Listing 18.3: The AuLib::Tap class. /** Creates a tap for a Delay object truncating readout. / * class Tap : public AudioBase { virtual const Tap &dsp(const Delay &obj, double time); virtual const double *dsp(const Delay &obj, const double *time) { return dsp(obj, time[0]).vector(); } public: /** Tap constructor \n\n vframes - vector size \n sr - sampling rate / * Tap(uint32_t vframes = def_vframes, double sr = def_sr) : AudioBase(1, vframes, sr){}; /** tap a delay object at time secs */

18.4 Multiple Taps

279

const Tap &process(const Delay &obj, double time){ return dsp(obj, time); } /** tap a delay object according to time signal in secs */ const double *process(const Delay &obj, const double *time) { return dsp(obj, time); } /** tap a delay object according to time signal from obj in secs / * const Tap &process(const Delay &del, const AudioBase &obj) { if (obj.vframes() == m_vframes && obj.nchnls() == m_nchnls) { dsp(del, obj.vector()); } else m_error = AULIB_ERROR; return *this; } /** operator () convenience method */ const Tap &operator()(const Delay &del, const AudioBase &b) { return process(del, b); } /** operator () convenience method */ const Tap &operator()(const Delay &del, double b) { return process(del, b); } }; } The Tapi class reuses much of the mechanism in Listing 18.3, as it is a derived class that only needs to reimplement the DSP methods.

280

18 Delay Line Processing

18.4.1 Convolution The extreme case of multitap delays is where we have one tap at each buffer position. If we place a gain multiplier at each output, we are in effect implementing directly a signal processing operation called convolution, which also gives the general layout of a finite impulse response (FIR) filter. Such filters are the result of the sum of N delayed and scaled samples of a waveform; the sequence of scaling multipliers (or coefficients) is itself another signal called the impulse response (IR). This is also equivalent to the actual output of the delay line if we were to place a single impulse at its input. After N samples, the output would be zero (hence the FIR denomination, see Chapter 16). The convolution algorithm is shown schematically in Fig. 18.5, where an input signal (x[t]) is placed into a delay line and each tap output is multiplied by a coefficient (the ir[] array), and mixed to yield the output y[t] (the time t is in samples). Therefore for this algorithm, we need a delay line that is tapped at each point, and a table of coefficients making up the impulse response.

x[t] x[t-1] x[t-2] x[t-3] ????

?

 ir[3] ×g

x[t-N-1] ?

···

?  ir[2] ×g ?  ir[1] ×g ?  ir[0] ×g

?

×g ir[N-1]

? - +g ? - g +

? - +g ?- g +

Fig. 18.5: Delay line convolution.

y[t]

18.4 Multiple Taps

281

An AuLib class derived from Delay models the direct convolution processor (Listing 18.4). It takes in a function table object holding the impulse response, optionally truncating its size (i.e. using only a portion of the impulse response). Listing 18.4: The AuLib::Fir class. /** This class implements a direct convolution engine using an impulse response defined in a function table. */ class Fir : public Delay { virtual const double *dsp(const double double dt) { return dsp(sig); } virtual const double *dsp(const double const double return dsp(sig); } virtual const double *dsp(const double

*sig,

*sig, *dt) {

*sig);

protected: const double *m_ir; uint32_t m_ir_nchnls; uint32_t m_chn; public: /** Fir constructor \n\n ir - impulse response chn - selected channel from IR len - if > 0, set the FIR length vframes - vector size \n sr - sampling rate / * Fir(const FuncTable &ir, uint32_t chn = 0, uint32_t len = 0, uint32_t vframes = def_vframes, double sr = def_sr) : Delay((len > 0 && len <= ir.vframes() ? len : ir.vframes()) / sr, 0, vframes, sr), m_ir(ir.vector()), m_ir_nchnls(ir.nchnls()), m_chn(chn < m_ir_nchnls ? chn : m_ir_nchnls - 1) { }; };

282

18 Delay Line Processing

To implement the DSP method, we only need to modify the original delay line algorithm so that we can get a delay out of every sample and apply the coefficient to it. For this, we need to source the data from a function table (holding the m_ir array). Another modification is that we can have an IR with multiple channels (interleaved), in which case we will select a specific channel (m_ch) from it to apply the convolution to the input signal: const double *AuLib::Fir::dsp(const double *sig) { double out = 0; uint32_t N = m_delay.vframes(); uint32_t nchnls = m_ir_nchnls; for (uint32_t i = 0; i < m_vframes; i++) { m_delay[m_pos] = sig[i]; m_pos = m_pos != N - 1 ? m_pos + 1 : 0; for (uint32_t j = 0, rp = m_pos; j < N; j += nchnls) { out += m_delay[rp] * m_ir[N - 1 - j + m_chn]; rp = rp != N - 1 ? rp + 1 : 0; } m_vector[i] = out; out = 0.; } return vector(); } The direct convolution algorithm demands a significant number of operations, demanding N multiplications and additions for each sample. It is only practical for very short delay lengths. For anything longer, we will need to employ a spectral domain version, which can take advantage of the fast Fourier transform (FFT) algorithm [36], as we will see in Chapter 19.

18.5 Lambda Functions Now we turn to a programming topic that might be useful for the implementation of these and other processes discussed in this book. In the C++11 ISO standard a number of new features were introduced. Among these, we have the concept of lambda functions, which are anonymous functions used in contexts where a small, temporary function is called for. This concept originated in the mathematics of the lambda calculus [9] and is commonly found in functional-style languages [1]. Associated with it, we have the principle of a closure, which is an environment, a set of variables etc., to which the function has access (it becomes part of its scope). C++ allows us to construct lambdas with a well-defined capture list that will define the closure.

18.5 Lambda Functions

283

The lambda syntax is as follows: [ capture-list ] ( parameter-list ) –> return-type { body } where capture-list is a comma-separated list of the variables that are captured as part of the closure. This can be empty (no captures), or it can name specific variables. Captures can be by copy or by reference (using &); a lambda inside an object can capture its members through its this pointer. The return type can be omitted in most cases if it is clear implicitly what the function return type is, in which case the syntax simplifies to: [ capture-list ] ( parameter-list ) { body } Lambda functions are useful when processing elements in a vector (or a list). For example, to change the gain of a signal vector, we can use a combination of iterators and a lambda, with the std::transform function from the standard library: std::vector<double> audio(vecsize); int gain; ... std::transform(audio.begin(),audio.end(),audio.begin(), [gain](double s) { return s*gain; }); This particular example will multiply every element of audio by gain and return it. The elements will be transformed in place. A similar operation can be applied to any AudioBase-derived object in the AuLib library, because these objects have iterators defined for them, which allows us to act on the audio vector directly.

18.5.1 Auto Types A lambda expression has a type that is unique and unnamed (it is actually a temporary object of the type ClosureType). It is possible to assign this expression to a C-style function pointer containing the same type declaration as the resulting function: int (*add)(int, int) = [](int a, int b) { return a + b; }; std::cout << add(2,1) << std::endl; However, to simplify this and other use cases, the C++11 standard introduced the auto type specifier, which allows variables to have their types deduced from their initialisers. For example, auto add = [](int a, int b) { return a + b; }; The auto specifier can be used for other variable types, whenever it is possible to deduce from the context what the type is. It cannot be used, however, to name

284

18 Delay Line Processing

a function parameter type. For these, we need to be clear what the type is. In this case, the exact function type is used, which can sometimes look complicated (e.g. int(*)(int,int) in the example above). Thankfully, in these cases we can also use the standard library utility class template std::function, with the function arguments and return type as the template parameter, which looks much simpler and C++-like: std::function. Consider this example, a classic functional-style operation, where we take a function in as an argument to another function and return a new function with fewer parameters: auto curry = [](std::function f, int b) { return [f,b](int a){ return f(a,b); }; }; auto add1 = curry(add,1); auto add2 = curry(add,2); std::cout << add2(add1(1)) << std::endl; The first line defines a lambda with two parameters: a function of two int variables returning an int, and an int. This function returns another lambda, of one int parameter. Note that we capture two variables from the enclosing environment, by copy: f and b. This allows us to access and use their values inside the lambda, as we have seen before. Alternatively, we could have created a closure over the two variables by using the default copy capture syntax, [=], instead of naming the two individual items: auto curry = [](std::function f, int b) { return [=](int a){ return f(a,b); }; }; Given that the enclosing environment has only the two variables we need, it probably makes sense to use this notation. The curry function allows us to get two different one-parameter functions from a two-parameter function. This is an example of partial function application.

18.6 Conclusions In this chapter, we have looked at delay line processing, including both fixed and variable delays. Example DSP methods from the AuLib library were used to illustrate how these are usually implemented. The concept of convolution and its direct application through a tapped delay line was also explored. In the next chapter, we will re-examine the convolution algorithm to provide a very efficient implementation for it. To complement the discussion from a programming point of view, we have introduced lambda functions and their application in C++.

18.6 Conclusions

285

Problems 18.1. Create a Schroeder reverb using four comb filters and two all-pass filters (see [15] for an outline). 18.2. Design a comb filter class that includes high-frequency losses and apply it in a second version of the above program. 18.3. Create a flanging effect with the following characteristics: (a) Sinusoidal LFO with frequency control. (b) Depth-of-modulation control. (c) Feedback gain control. (d) Dry + wet mix control.

Chapter 19

Frequency-Domain Processing

Abstract This chapter is dedicated to the topic of spectral processing, in different applications. We introduce the main concepts related to frequency analysis, such as the discrete Fourier transform, and provide an implementation-based exploration of the typical fast Fourier transform algorithm. Applications in fast convolution and streaming spectral processing are discussed, with a number of programming examples. Processing audio in the frequency domain involves the transformation of waveform data into a different representation. As we have seen, a digital waveform can be described as a discrete-time (and discrete-amplitude) encoding of continuous functions that express the audio signals we want to manipulate. These are functions of time, and therefore we are in this case working in the time domain. If, instead, we take a specific point in time and look at the audio signal for its frequency content, we need to transform our functions of time into functions of frequency, yielding the spectrum of a waveform, which is a frequency-domain representation. What is the representation of this frequency content that we are seeking to manipulate? Generally speaking, this could be expressed in more than one way, but the most practical form is to model a waveform as a set of sinusoidal components. In this case, just as we have conceived a sound in the time domain as a sequence of samples indicating the amplitude at various time points, we can also think of it as a set of amplitudes of sinusoids of different frequencies. In the time domain, this sound is realised by samples being played one after the other at a certain rate (the sampling frequency), whereas in the frequency domain, this happens through summing (mixing) all the component sinusoids scaled by their respective amplitudes. In other words, a digital waveform is a sequence of amplitude samples; its spectrum is also a sequence of amplitude samples, but, this time, each represents the weight of a sinusoid of a given frequency in the mix. While this is the general interpretation we should always have in mind, there are a number of important details we should note. We will look at these one by one, using a mostly non-mathematical approach. The signal processing aspects of this are detailed in [36] (Chapter 7), which can be taken as a companion text to © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_19

287

288

19 Frequency-Domain Processing

this chapter. Nevertheless, it should be possible to gain a good understanding of the subject from a programming perspective, which is the way we will proceed here. Where a mathematical formula might be a succinct way of describing a process we are about to implement, we will use it.

19.1 Fundamental Principles The basic idea of what we will be programming in this chapter has already been outlined completely: we will take a digital waveform, comprising N samples of a function of time (i.e. the sound) and transform it into a spectrum made up of N samples of a function of frequency. These samples are the weights of a set of N sinusoids of different frequencies that, when mixed together, can recompose the original waveform. That being conceptually straightforward, we can move on to the details and principles that underpin it. Let’s first think about how we can completely model a sinusoid of any kind. Sinusoid is the generic name we give to sine and cosine waves. This actually just refers to a particular wave shape, which can be produced by either function. Therefore there are three parameters that absolutely define such a wave, and can be used to distinguish individual instances: 1. Frequency, or how many cycles it completes per second (Hz), 2. Amplitude, which can be measured by its greatest absolute value, 3. Phase, the waveform starting point, relative to a given reference (e.g. the cosine wave phase). Once we have these three parameters, the sinusoid is perfectly defined. The sampled spectrum, which is a function of frequency modelling a digital waveform (as a mix of sinusoids), is based on this representation, where: • frequencies are determined by the sample index, • amplitudes and phases are defined by the (2-dimensional) sample value. It is possible to have an incomplete spectral representation using either the amplitude or the phase, which is useful in some applications. In particular, the amplitude spectrum is a common frequency-domain representation for the purposes of identifying the distribution of energy in a waveform (see Fig. 19.1). Just as in a digital waveform each sample index refers to a time point, in a sampled spectrum it indicates a frequency point. If in the time domain we divide the index by the sampling rate fs to get the sample time in seconds, in the frequency domain we will divide the index by the number of samples in the waveform and scale it by the sampling rate. For example, let’s say we have a 1-second digital waveform (N = fs ): in this case each spectral sample defines frequency points that are 1 Hz apart: 0, 1, 2, ... up to fs 1 2 , which is half of the sampled spectrum. The other half refers to frequencies that 1

This point, index N/2, refers to both

fs 2

and − f2s Hz.

19.1 Fundamental Principles

289

are negative: 1 − f2s , 2 − f2s , ..., up to −1 Hz. Each one of these frequency points represents a single sinusoid; together, this mix of N sinusoids composes the original waveform. The plots in Fig. 19.1 demonstrate this: we have a waveform with N = 100, and its amplitude spectrum showing sinusoidal components at 1, 5, 11, 16, and 17 Hz, as well as −17, −16, −11, −5 and −1 Hz (we are using in this case fs = N). The left half of the plot refers to the positive-frequency components, whereas the right side displays the negative side of the spectrum. It is possible to notice that in this case, the negative and positive spectra are mirror-like images of each other: the positivefrequency indices are k, from 0 to N/2 and the negative-frequency ones are N − k, from N/2 to N − 1. Therefore, the five components will show up at 1, 5, 11, 16, and 17, and also at N − 1, N − 5, N − 11, N − 16, and N − 17. A single component always shows up on both sides. This will always be the case for audio waveforms (as discussed below).

waveform

1.0 0.5 0.0 −0.5 −1.0

0

20

0

20

1.0

40

60

80

100

40

60

80

100

amplitude spectrum

0.8 0.6 0.4 0.2 0.0

Fig. 19.1: A sampled waveform and its amplitude spectrum.

19.1.1 Complex Numbers In order to represent both the amplitude and the phase of a sinusoid, each spectral sample must be a two-dimensional number. While time-domain samples are unidi-

290

19 Frequency-Domain Processing

mensional, each just a single real number representing the amplitude at a time point, a frequency-domain sample has to pack two numbers together so that it can represent amplitude and phase independently. This is done through a complex number, which can be thought of as a pair of real numbers in separate dimensions. Complex numbers have two representations (Fig. 19.2): 1. Rectangular, which can be interpreted geometrically as coordinates measured on two number lines at 90 degrees to each other. The complex pair is a projection of the number on these two lines, conventionally known as the real and imaginary parts of the number. The usual mathematical notation adds a j to the imaginary part of the number to distinguish it from its real part, e.g. a + jb. 2. Polar, which can be interpreted geometrically as a line starting from the intersection of the real and imaginary lines (called the origin) to the point representing the complex number. This line makes an angle with the real axis. The two parts of the number are the line length (magnitude) and angle. For the sinusoids in question, these parts correspond directly to their amplitude and phase, respectively. Therefore one complex spectral sample can determine the frequency (through its index), amplitude, and phase of a component of a waveform.

19.1.2 Spectral Analysis From a digital waveform, we can determine its spectrum by performing a discrete Fourier transform (DFT) [8]. This is an operation that can be defined by the following formula [33]:      t t 1 N−1 X(k) = ∑ x(t) cos 2π k fs − j sin 2π k fs N t=0

(19.1)

where x(t) is the waveform, N is its size in samples, X(k) is its spectrum, t is the time index, and k is the frequency index. To obtain the full spectrum, we increment k from 0 to N − 1. In this case, we call N the transform size. As we can see, the operation involves the sample-by-sample product of the waveform with a complex sinusoid (cos(ω ) − j sin(ω )), which is made up of a cosine for its real part and a sine for its imaginary part (with a negative sign). This yields N complex numbers, which are all summed together and scaled by N1 to give the spectral sample at frequency point k. The principle behind this is that the complex sinusoid works as a detector: if a sinusoidal component exists in the vicinity of its frequency, the complex sinusoid will pick it up, yielding a non-zero sample. The spectrum X(k) can be used to recompose the original waveform by using the inverse formula (the inverse DFT, or IDFT):      t t X(k) cos 2 π k π k + j sin 2 ∑ fs fs k=0

N−1

x(t) =

(19.2)

19.1 Fundamental Principles

291

imag (j)

0.8

0.6

z = a + jb

b

am p(z

)

0.4

0.2

pha(z)

0.0

real

a −0.2 −0.2

0.0

0.2

0.4

0.6

0.8

Fig. 19.2: Geometrical interpretation of the complex number z = a + jb = 0.5 + 0.5 j.

where we vary the time t from 0 to N − 1 to get all the waveform samples. Each complex amplitude sample X(k) scales a complex sinusoid at frequency k/ fs . Half of these frequencies are in the positive spectrum (from 0 to f2s Hz, and half in the negative side (from − f2s to close to 0 Hz). The waveform x(t) is a mix of these components. The spectrum given by Eq. 19.1 comes out in rectangular form, as real and imaginary pairs. To get the amplitude and phase, we need to convert this to polar form using the following expressions, for an arbitrary complex number z = a + jb:  (19.3) A(z) = a2 + b2

φ (z) = arctan

  b a

(19.4)

The first of these is also known as the absolute value, modulus, or magnitude of the complex number, and can be calculated with the C++ abs() function, as we

292

19 Frequency-Domain Processing

will see later. The second is also known as the argument, and can be computed with arg() (or also atan2()). From the amplitude A(z) and phase φ (z), we can get the rectangular form a + jb with a = A(z) cos(φ (z))

(19.5)

b = A(z) sin(φ (z))

(19.6)

and

Together with the DFT and IDFT, these expressions are the fundamental tools of spectral analysis and resynthesis, and we will be employing them in various situations later in this chapter. An important assumption that is built into the DFT is that the waveform that is subject to the operation is periodic, with the period T = N. In other words, the analysis breaks down the input as if it were made of up of harmonics of fs /N Hz, which might be appropriate in some cases, but not ideal in others. It is always important to bear this in mind when attempting to understand the result of the DFT. Another way to interpret this is that each frequency point is in fact a band, channel, or bin, centred at a given multiple of fs /N Hz. When the input period T does not align with a multiple of N, there will be some smearing in the analysis results, and a single sinusoidal partial will be detected in more than one bin. Since this is more likely to be the case in practical applications of the DFT, this is a better way of reading the analysis results. One final point worthy of note is that audio waveforms are unidimensional, and therefore can be represented by functions of real numbers. The spectrum of an audio waveform, as we have seen, is represented by a function of complex numbers. When we recompose the waveform from the spectrum, the result is again unidimensional, as the imaginary part in the results of Eq.19.2 gets cancelled out. So we can go from real waveform to complex spectrum, and vice versa. An important characteristic of the complex spectrum of real functions is that it displays a certain symmetry2 : the amplitudes are symmetric (mirrored) about 0 Hz, and the phases anti-symmetric. An example of this was shown in Fig. 19.1, for the amplitude spectrum only. This means that we can always infer the negative spectrum from its positive side in this case, and therefore we can work solely with positive frequencies. This will allow some simplifications and efficiencies when we come to program the DFT.

19.2 The Fast Fourier Transform The DFT in Eq. 19.1 can be written out in C++ using one loop nested inside another. The inner loop realises the sum of products, and the outer one increments the fre2

This is called a Hermitian symmetry.

19.2 The Fast Fourier Transform

293

quency index k. The following function implements a DFT of a real-valued input, excluding the 1/N scaling: #include #include using namespace std::complex_literals; extern double TWOPI; void dft(double *x, std::complex<double> *X, int N, float sr) { double w; for(int k =0; k < N; k++) { X[k].real(0.); X[k].imag(0.); for(int t = 0; t < N; t++) { w = TWOPI*k*t/sr; X[k] += x[t]*(cos(w) + 1i*sin(w)); } } } We should note the std::complex type, which holds a complex pair of type T. This class is made up of two contiguously-stored numbers, and a number of methods to manipulate them. It is possible to cast it to an array of two; a vector of complex numbers can also be reinterpreted as a real-valued vector of twice its length. In C++14, the constant 1i is a complex type (with real part = 0), and therefore we can use it to make sin(w) the imaginary part of a complex sinusoidal. It allows us to translate the complex product in Eq. 19.1 more or less directly. This constant is available in the namespace std::complex_literals. While the DFT implementation is quite compact, it is of considerable computational complexity, especially if N is large. For each of the N output samples we have to compute N multiplications and additions. If N is a highly composite number, such as a power of two, many of the exact same operations are repeated in the process. It is possible to factor them to allow for an efficient implementation of the DFT. This is called the fast Fourier transform (FFT). We will now examine the implementation of the most common type of FFT, the one where we use power-of-two transform sizes, called the radix-2 FFT. While the mathematical details of its derivation are beyond the scope of this book (but explored in the companion text [36]), we will give an outline of the algorithm and explore a reference implementation in C++. The first step in working out the details of the FFT is to take a very general approach, assuming that the input is a complex function, rather than considering the specific case of audio signals, where we are dealing with real-valued data. We can apply it to our case simply by placing the waveform samples in the real part of each complex number, and zeros in the imaginary part. However, in a second

294

19 Frequency-Domain Processing

stage, we will actually be able to take advantage of real-valued input to simplify the computation even further. So for a complex-to-complex DFT, we can proceed to break it down into a fundamental operation, which is applied iteratively to the input to yield the transform. The outline of the process is as follows. For a transform of size N, to implement a radix-2 FFT, we start with setting a counting variable M = 2 and 1. Divide the N input (complex) samples into one or more sets containing M samples each. 2. Mark each half of each set alternately as even and odd. 3. Apply a pair of equations to transform each data set (of size M) in place (i.e. replace the original data with the results), merging the even and odd sides to produce the result. 4. Multiply M by 2. 5. If M > N, stop; otherwise continue from 1. The core of this process is in step 3, where we apply the following expressions [36]: XM (k) = E(k) + ω −k O(k) XM (k + M/2) = E(k) − ω −k O(k)

(19.7)

with 0 < k < M, E(k) and O(k) are the even and odd input data, and XM (k) is where we will store the result. The ω −k factor is a complex sinusoid (which stems from the complex product in the straight DFT formula), also known as the twiddle factor. These two equations arise from the factorisation of the sums and products in the DFT formula. They represent the fundamental computation needed to calculate the transform, which will be applied repeatedly to the input data in several stages. In an implementation of the FFT, this step represents the innermost loop, which iterates over the N-sample array and applies the transform to sets of M samples: for (uint32_t m = 0; m < n; m++) { M = n * 2; for (uint32_t k = m; k < N; k += M) { i = k + n; even = s[k]; odd = w * s[i]; s[k] = even + odd; s[i] = even - odd; } w *= wp; } In this code, n represents M/2, and k is the index k, which is offset by M each time, so that the transform covers all of the N-sample input in blocks of M samples. The variables even and odd hold the even and odd inputs to the expression, which

19.2 The Fast Fourier Transform

295

are taken from the s array (the input data). The results are put back in place, ready for the next iteration. The twiddle factor represented by the variable w needs to be updated every time k is incremented by raising it to higher powers (ω −k ). Starting from 1, we use cos(−2π /M) + j sin(−2π /M) to update it through a spectral product (w *= wp). This is done in the code for every start offset of the variable k. In the base case of M = 2, this never happens, because we are transforming pairs of numbers. For longer transforms, ω is incremented so that the sinusoid can complete one cycle over the M/2 samples. The complete three-level iteration code involved in the FFT is shown below: for (uint32_t n = 1; n < N; n *= 2) { o = -pi/n; wp.real(cos(o)), wp.imag(sin(o)); w = 1.; for (uint32_t m = 0; m < n; m++) { for (uint32_t k = m; k < N; k += n * 2) { i = k + n; even = s[k]; odd = w * s[i]; s[k] = even + odd; s[i] = even - odd; } w *= wp; } } This is the basic radix-2 FFT algorithm, which is almost as compact as the straight DFT, but has a lower computational complexity. A transform of size N requires roughly N 2 operations, whereas the FFT only needs N log2 N. This becomes quite significant when N is large. There are a few other requirements to complete a full reference FFT implementation. The first of these is to do with the order of data points at the end of the process. Because the FFT operates by merging the even and odd data points of the input, in successive iterations, we will need to reorder the input so that the output frequency bins are in the correct sequence. That is, we will need to move the samples of the input waveform around to place them in even and odd pairs, otherwise the output spectrum will be scrambled. Alternatively, we can reorder the output. A diagram of the FFT (for N = 8) including the reordering of the input is shown in Fig. 19.3. We can see how the merging performed in step 3 affects the order of the data in each successive iteration. The required reordering can be described recursively as splitting the sequence into even and odd points. Starting with M = N, 1. Divide an M-sample block into two sets of M/2, 2. Move every odd sample to the right-hand side and every even sample to the lefthand side,

296

19 Frequency-Domain Processing

3. If all N samples have been processed, continue; otherwise start from 1. with the next block of M samples, 4. Divide M by 2, 5. If M = 2, stop; otherwise start from 1. with the first block of M samples.

x(0) x(1) x(2) x(3) x(4) x(5) x(6) x(7) re-order

? x(0) x(4) x(2) x(6) x(1) x(5) x(3) x(7) E(0)

O(0)

?

E(0)

O(0)

?

E(0)

O(0)

?

E(0)

O(0)

?

X2 (0) X2 (4) X2 (2) X2 (6) X2 (1) X2 (5) X2 (3) X2 (7) E(0)

E(1)

O(0)

?

O(1)

E(0)

E(1)

O(0)

?

O(1)

X4 (0) X4 (2) X4 (4) X4 (6) X4 (1) X4 (3) X4 (5) X4 (7) E(0)

E(1)

E(2)

E(3)

O(0)

?

O(1)

O(2)

O(3)

X8 (0) X8 (1) X8 (2) X8 (3) X8 (4) X8 (5) X8 (6) X8 (7) Fig. 19.3: Diagram of operations for FFT with N = 8, adapted from [36].

For instance, let’s take an array of eight points {0, 1, 2, 3, 4, 5, 6, 7}. The first step is to divide it into two sets and separate the even and odd points: {0, 2, 4, 6} and {1, 3, 5, 7}. Then we take each of these and split it into two sets, separating the even and odd points again: {0, 4}, {2, 6}, {1, 5}, and {3, 7}. Since we now have pairs, we have finished the reordering: {0, 4, 2, 6, 1, 5, 3, 7}. However, this is an expensive way of reordering, since we may have to iterate several times over the data. There is a pattern in the reordering, which we can observe to simplify things. Some data points stay put, so we could ignore them, and others get swapped. If we observe the bit-pattern of each sample index, we will know which ones have to be moved. By reversing these bit-patterns we can determine the swaps, for instance for N = 8, we have the following indices (0–7): 000 001 010 011 100 101 110 111

19.2 The Fast Fourier Transform

297

which reversed become 000 100 010 110 001 101 011 111 allowing the data to be swapped accordingly: 1 & 4, 3 & 6, the rest staying in place. We should note that this results in the same reordering as in the worked-out example: {0, 1, 2, 3, 4, 5, 6, 7} becomes {0, 4, 2, 6, 1, 5, 3, 7}, where only two pairs of points were actually swapped. If we observe this, it is possible to write a function that only acts on the required points, in one single iteration over the data array: static void reorder(std::vector<std::complex<double>> &s) { uint32_t N = s.size(); uint32_t j = 0, m; for (uint32_t i = 0; i < N; i++) { if (j > i) { std::swap(s[i], s[j]); } m = N / 2; while (m >= 2 && j >= m) { j -= m; m /= 2; } j += m; } } Finally, to complete the FFT function, we should also note that the same code can be employed for the inverse transform, with a simple change of sign in the twiddle phase calculation. Therefore we can have a single function to perform the DFT and IDFT very efficiently. The code for this, which is part of AuLib is shown in Listing 19.1. It calculates an in-place transform, that is, the output data replaces the input. Listing 19.1: A reference C++ implementation of the radix-2 FFT in AuLib void transform(std::vector<std::complex<double>> &s, bool dir) { uint32_t N = s.size(); std::complex<double> wp, w, even, odd; double o; uint32_t i; reorder(s); for (uint32_t n = 1; n < N; n *= 2) { o = dir == forward ? -pi / n : pi / n; wp.real(cos(o)), wp.imag(sin(o)); w = 1.; for (uint32_t m = 0; m < n; m++) { for (uint32_t k = m; k < N; k += n * 2) {

298

19 Frequency-Domain Processing

i = k + n; even = s[k]; odd = w * s[i]; s[k] = even + odd; s[i] = even - odd; } w *= wp; } } if (dir == forward) for (uint32_t n = 0; n < N; n++) s[n] /= N; }

19.2.1 Real-to-Complex and Complex-to-Real Transforms Finally, as we have discussed earlier on, we will be dealing with real-valued input data, and therefore we can take advantage of this to produce spectral data representing only the non-negative frequencies. This would allow us to use a half-size FFT to compute the spectrum, thus saving some more computation in the process. The approach is as follows: 1. We will re-nterpret the real input data as if it were a half-size complex array. The even samples will be taken as the real parts and the odd as the imaginary parts of these complex numbers. 2. We can then apply the half-size complex FFT from Listing 19.1. 3. We convert the output data into a form that contains only the non-negative spectrum. We treat the points referring to 0 Hz and f2s separately, as these will always have zero imaginary parts. They are derived from the complex data in the first point of the output array. 4. The rest of the points are converted in a loop starting at 1, covering the rest of the data. This operation combines the samples from the two halves of the spectrum to produce the resulting non-negative frequency points. The mathematical details of this conversion can be found in [36]. The non-negative spectrum contains N/2 + 1 bins, that is all positive frequency points including f2s , plus 0 Hz. It means that we could potentially pack the output into the same memory space of the input, since a complex number occupies twice the size of a real sample. With N inputs there is space for N/2 outputs. How can we store the extra point? There are two ways: 1. The packed format: since 0 Hz and the Nyquist frequency ( f2s ) points are purely real (zero imaginary), we can place them together in the first array position in place of a single complex number.

19.2 The Fast Fourier Transform

299

2. The extra point: we provide an extra position at the end, and place the Nyquist sample there. This means that the imaginary parts of the first and last points of the complex array will be zero. We can accommodate the two formats in the real-input FFT by adding an optional boolean argument that indicates whether or not a packed format is used (true by default). If an extra point is used, we will also expect the input array to contain enough memory for it. The function implement a real-to-complex FFT is shown in Listing 19.2. It takes two required parameters: a complex vector and a double pointer. If the two refer to the same memory, the transform happens in place. Otherwise, we will expect the pointer to refer to an array of N samples that will be used for input. The output will always be placed in the complex-data vector. The line double *s = reinterpret_cast<double *>(c.data()); tells the compiler to reinterpret the vector data as a double array, so that if we need to, we can copy in the real input data from a different location to reinterpret it as complex and perform the FFT in place. This real-to-complex FFT function uses the complex FFT as implemented in Listing 19.1 to perform the spectral analysis. Listing 19.2: The real-to-complex FFT from AuLib. void transform(std::vector<std::complex<double>> &c, double *r, bool pckd) { using namespace std::complex_literals; uint32_t N = c.size() - (pckd ? 0 : 1); std::complex<double> wp, w = 1., even, odd; double o, zro, nyq; double *s = reinterpret_cast<double *>(c.data()); if (s != r) std::copy(r, r + 2 * N, s); if (!pckd) c.resize(N); transform(c, forward); zro = c[0].real() + c[0].imag(); nyq = c[0].real() - c[0].imag(); c[0].real(zro * .5), c[0].imag(nyq * .5); o = -pi / N; wp.real(cos(o)), wp.imag(sin(o)); w *= wp; for (uint32_t i = 1, j = 0; i < N / 2; i++) { j = N - i; even = .5 * (c[i] + conj(c[j])); odd = .5i * (conj(c[j]) - c[i]); c[i] = even + w * odd; c[j] = conj(even - w * odd); w *= wp; }

300

19 Frequency-Domain Processing

if (!pckd) { c.resize(N + 1); c[N].real(c[0].imag()); c[0].imag(0.); c[N].imag(0.); } } The inverse operation, complex-to-real, applies the same steps, but now in reverse (Listing 19.3). The data is converted back from a single-sided spectrum into the original complex-FFT data, and then we use an inverse transform to obtain the output. This is reinterpreted again as a real-valued sequence. Similarly, the function takes in two required arguments, now in reverse order: a double pointer to the output data location, and a complex vector containing the input. In-place transforms are also possible if the two parameters refer to the same memory location. Listing 19.3: The complex-to-real inverse FFT from AuLib. void transform(double *r, std::vector<std::complex<double>> &c, bool pckd) { using namespace std::complex_literals; uint32_t N = c.size() - (pckd ? 0 : 1); std::complex<double> wp, w = 1., even, odd; double o, zro, nyq; double *s = reinterpret_cast<double *>(c.data()); if (pckd) zro = c[0].real() * 2., nyq = c[0].imag() * 2.; else zro = c[0].real() * 2., nyq = c[N].real() * 2.; c[0].real(zro + nyq), c[0].imag(zro - nyq); o = pi / N; wp.real(cos(o)), wp.imag(sin(o)); w *= wp; int j; for (uint32_t i = 1; i < N / 2 + 1; i++) { j = N - i; even = .5 * (c[i] + conj(c[j])); odd = .5i * (c[i] - conj(c[j])); c[i] = even + w * odd; c[j] = conj(even - w * odd); w *= wp; } if (!pckd) c.resize(N); transform(c, inverse); if (s != r)

19.2 The Fast Fourier Transform

301

std::copy(s, s + 2 * N, r); if (!pckd) c.resize(N + 1); } An example program is shown in Listing 19.4. It generates a test waveform with three partials (1, 5, 13), then applies the real-to-complex DFT, as implemented using the radix-2 FFT algorithm, and prints the amplitude spectrum. The transform is performed in place, reusing the input memory, and we input the data as a double array, which is reinterpreted as a std::complex<double> vector in the FFT function. The format of the spectral data is packed: that is, the 0 Hz and Nyquist bins are stored as the first complex pair. We print out the amplitudes, which are computed using the abs() function with a complex input. Listing 19.4: Real-to-complex transform example. #include #include #include #include #include



using namespace AuLib; const int N = 32; int main() { // complex vector with N/2 bins (packed format) std::vector<std::complex<double>> cdata(N/2); // reinterpret it as double array double *rdata = reinterpret_cast<double *>(cdata.data()); // generate a N-sample waveform for(int n=0; n < N; n++) rdata[n] = sin((twopi*n)/N) + 0.5*sin(5*(twopi*n)/N) + 0.25*sin(13*(twopi*n)/N); // apply the real-to-complex DFT fft::transform(cdata, rdata); // set printing precision to 2 decimal positions std::cout << std::fixed; std::cout << std::setprecision(2); // print amplitude spectrum for(auto s : cdata) std::cout << abs(s) << std::endl; return 0; }

302

19 Frequency-Domain Processing

Piping the output of this code to a plotting program yields the graph in Fig. 19.4, where we can see the amplitudes of the three components in the waveform (1, 0.5, and 0.25). The FFT functions developed here will be the basic tools for all the processes explored in the remaining sections of this chapter.

1.0

0.8

0.6

0.4

0.2

0.0

0

2

4

6

8

10

12

14

16

Fig. 19.4: A plot of a magnitude spectrum as produced by the program in Listing 19.4.

19.3 Fast Convolution A very useful application of the FFT is to implement fast convolution. As we have seen in Chapter 18, computing the output of a long FIR filter is very processingheavy, and may not be achievable in realtime. An alternative is to apply the operation in the frequency domain. This is possible because the convolution of two waveforms is equivalent to the multiplication of their spectra (Fig. 19.5). Therefore, if we have a fast method of spectral analysis, we can perform this operation in a single pass in the frequency domain.

signal - DFT IR

- DFT

R @

×

- IDFT - output



Fig. 19.5: Fast convolution: the product of two spectra is equivalent to the convolution of their corresponding waveforms.

The convolution of a sequence of L samples by another of M samples results in an output that is L + M − 1 samples long. This is because of the delay operation

19.3 Fast Convolution

303

involved in convolution. So, if we want to produce the convolution in the spectral domain, we will need to use a transform size of at least L + M − 1 samples. Since we are using the FFT, then this also needs to be a power of two. How can we work under these restrictions? We can simply select our transform size according to these requirements (a power of two no smaller than L + M − 1) and then zero-pad the inputs to that length. That is, we pack the data with zeros to make up the correct size. At the output, the first L + M − 1 samples are the result of the convolution. With this in mind, we can test this idea in a program. This is what we will do: 1. Produce a test waveform and impulse response (IR), each of size N. 2. Create input arrays that can hold 2N − 1 samples and are set to a power-of-two size. 3. Fill these with the signal and IR, padding with zeros. 4. Take their individual DFTs. 5. Multiply their spectra, sample by sample. We can store the result in one of the existing arrays. 6. Take the inverse DFT of the result, which is the convolution output. The two test signals that we will use are of the simplest kind: a single cycle of a sine wave, and an IR consisting of one unit sample at N/4. The convolution of these two inputs should show a sine wave delayed by N/4 samples. The full test program is show in Listing 19.5. Since the FFT implements the 1/N normalisation in the forward transform, we need to scale the impulse response by its size (N), and therefore the unit sample is set to that value. In some implementations, the normalisation is placed in the inverse transform instead, in which case this scaling is not needed. Note that in this particular application it suits us to use a non-packed format, with the 0 Hz and Nyquist bins in separate complex array positions. This is because we want to use a compact C++ std::transform algorithm to multiply each point. The packed format would have meant a need to treat the first position differently from the rest, which is less convenient in this case. A plot of the program output is shown in Fig. 19.6, where we can see the sine wave delayed by N/4 (8) samples.

#include #include #include #include #include

Listing 19.5: Fast convolution example.

using namespace AuLib; const int N = 32; int main() { // FFT size: 2N (N bins) std::vector<std::complex<double>> ir(N+1);

304

19 Frequency-Domain Processing

std::vector<std::complex<double>> sig(N+1); // input arrays double *irdata = reinterpret_cast<double *>(ir.data()); double *sigdata = reinterpret_cast<double *>(sig.data()); // generate a 2N-sample sine wave for(int n=0; n < N; n++) sigdata[n] = sin((twopi*n)/N); // zero-pad to 2N size std::fill(sigdata+N, sigdata+2*N, 0.); std::fill(irdata, irdata+2*N, 0.0); // impulse response: single unit sample at N/4 irdata[N/4] = N; // apply the real-to-complex DFT (not packed) fft::transform(ir, irdata, !fft::packed); fft::transform(sig, sigdata, !fft::packed); // complex multiplication (in-place) std::transform(sig.begin(), sig.end(), ir.begin(), sig.begin(), [](std::complex<double> sig, std::complex<double> ir) { return sig*ir; }); // complex-to-real IDFT fft::transform(sigdata, sig,!fft::packed); // set printing precision to 3 decimal positions std::cout << std::fixed; std::cout << std::setprecision(3); // print convolution result for(int n=0; n < 2*N; n++) std::cout << sigdata[n] << std::endl; return 0; } While this test program is a very good proof of concept for fast convolution, it skirts around some practical problems that we are faced with when implementing it in real-life scenarios, especially in realtime applications. First of all, it is very unlikely that we will have the whole signal ready for input. In offline processing, that might be the case, but then the FFT sizes involved may be extremely large if we

19.3 Fast Convolution

305

1.0

0.5

0.0

−0.5

−1.0 0

10

20

30

40

50

60

Fig. 19.6: A plot of the convolution output from the test program in Listing 19.4.

have minutes-long sounds to work with. In more general cases, it is not practical to use a single-batch operation such as this one. Instead, we can use partitions of the input signal and apply the convolution to these, and then recompose the signal using the output. This is known as partitioned convolution. There are two ways to go about this, both involving creating a partition of the input signal whose size is determined by the impulse response (and set to a power of two), applying the same process as outlined in the test program, and then gluing the individual output blocks together to form the final result. For this, we need to overlap the data: we can overlap the output or the input. The former is known as the overlap-add algorithm (OLA) and the latter as the overlap-save (OLS) algorithm.

19.3.1 Overlap Add Of the two options we have, the OLA approach is the simplest conceptually. It is based on the principle that an N-size partition will result in a 2N − 1-sized output, and that the partition blocks are spaced N samples apart. So, we can just extract N samples from the input, pad them to the DFT length, apply the convolution, and then place the result at N-sample boundaries, overlapping with the final N − 1 samples of the previous partitition (Fig. 19.7). The partition size N is set to the next power of two no less than the impulse response size.

19.3.2 Overlap Save OLS is potentially more efficient because it avoids the need for an output overlap by taking an overlapping transform instead. This is possible because of the fact that the DFT has a circular (periodic) property: it implies that its input is periodic over the transform size. So, instead of padding the input signal with zeros for the

306

19 Frequency-Domain Processing

A A

B

C

D

? zeros ?

convolution

? A B C D Fig. 19.7: OLA method: each partition is zero-padded. The convolution is applied and the result is overlap-added to form the output.

second half of the DFT block, it just fills it with input data without any padding. We normally save the final N samples of the previous block to be placed in the first half of the current 2N input frame. At the start of the process, the first N samples are zero because there is no previous input block. The convolution is applied with a zero-padded IR as before. Only the N final samples are kept, the rest of the block discarded (because it is redundant) and the result is just made up of a sequence of these blocks with no overlaps (Fig. 19.8). The input partition boundaries are still at N samples, so what we are in effect doing is taking 2N − 1 samples from the input waveform, but only keeping N of them. The overlapping happens at the input, and it is caused by the circular property of the DFT 3 . The partition size (N) is set in the same way as in the overlap-add case.

19.3.3 Multiple Partitions One difficulty with fast convolution is that in order to perform spectral analysis, we need to wait until all data has been input into the DFT analysis frame. This effectively places a fixed latency between the input and the output in a realtime scenario. Since the transform size depends on the IR length, long filters will insert a noticeable delay in the signal path. This would be the case for convolution reverb effects.

3

This is due to the fact that the DFT models a signal as if it were a periodic waveform DFT-size samples long.

19.3 Fast Convolution

307

A (saved)

B

C

C

D

D

? A ?

convolution

discard ? A

?

A B

Fig. 19.8: OLS method: each partition is made up of the current N samples. preceded by the previous block, which has been saved from before. The convolution is applied and the result is just placed in the output.

For disk rendering, this is not a problem as we can read ahead and compensate for the latency, but for realtime input, it can be problematic. A practical solution to minimise this problem is to create multiple partitions of the impulse response, instead of a single one. This has the effect of reducing the transform size, on one hand, but also increasing the computational demand, on the other, as we are not able to take full advantage of the FFT. In practical terms, this involves more iterations in the multiplication of the frequency-domain data, as we will need to implement a spectral delay line to compute the convolution of the multiple partitions (Fig. 19.9). Here is an outline of what we need to do: 1. Split the impulse response into M partitions of N samples (and transform each partition into spectral data, 2N bins) 2. Take the input data, using either the OLA or the OLS method, and keep filling the convolution buffer. 3. When the buffer is full, perform the DFT and place the result in a delay line containing M spectral blocks at the current write position. 4. Move the write position one partition ahead in the delay line (circularly). 5. Multiply each IR partition by a corresponding input block and sum the products together. 6. Take the inverse DFT of this sum and place it in the output buffer. 7. Use either OLA or OLS to recompose the output. As we can see from this, it is effectively a mix of the direct convolution and the fast convolution methods. If the partition size is set to 1, then the DFT becomes an identity operation and we have simply a delay line, sample-by-sample, process. If it is set to match the impulse response size, we have a single partition and fast con-

308

19 Frequency-Domain Processing

st st−N ? st−2N DFT ? st−3N DFT ? DFT

st−3(M −1)

?

?

DFT

DFT

????

?

?  ×g ? g

×

? ×g ? g 

×

DFT

DFT

DFT

DFT

 ir3

 ir2

 ir1

 ir0

···

?

×g

DFT

 irM −1

? - +g ? - g +

? - +g ?- g +

IDFT

-y

Fig. 19.9: Partitioned convolution: each input block is placed in a delay line. The output is a sum of all products of the input and IR spectra, to which an IDFT is applied.

volution. Anything in between combines the two approaches, to balance out latency and computational efficiency. We can now examine the implementation of partitioned convolution in AuLib. This is found in the AuLib::PConv class, which implements both OLA and OLS, in single or multiple partitions (the actual algorithm used can be selected by the constructor). The IR data is provided by a function table, and kept in the spectral domain. The input data is taken in, transformed, and then the convolution with the IR is applied to it. Listing 19.6 shows the partitioned convolution code, which is common to both OLA and OLS. This code is responsible for steps 2–5 in the outline of the algorithm. Listing 19.6: Partitioned convolution core. void AuLib::PConv::convolution(){ // transform it and store in the delay line fft::transform(m_del[m_p], m_in.data(), !fft::packed); // clear the spectral mix buffer std::fill(m_mix.begin(), m_mix.end(), 0.);

19.3 Fast Convolution

309

// increment the delay write position m_p = m_p == m_nparts - 1 ? 0 : m_p + 1; auto del = m_del.begin() + m_p; // do the spectral products and mix // m_part contains all IR partitions // m_del is the input delay line for (auto part = m_part.rbegin(); part != m_part.rend(); part++, del++) { if (del == m_del.end()) del = m_del.begin(); auto dsamp = del->begin(); auto psamp = part->begin(); // product & sum for (auto &mix : m_mix) mix += *dsamp++ * *psamp++; } // inverse transform into output buffer fft::transform(m_out.data(), m_mix, !fft::packed); } With this core convolution code, we can implement the process using either OLA or OLS. Listing 19.7 shows the PConv::ola() method, which implements OLA as described in Sect. 19.3.1. The code is more or less self-explanatory, but has comments that highlight the key steps of the process. The zero-padding of the input happens as a consequence of never writing to the second half of the buffer. Listing 19.7: OLA partitioned convolution method const double *AuLib::PConv::ola(const double *sig){ for (uint32_t n = 0; n < m_vframes; n++) { // data from sig feeds the conv input buffer m_in[m_count] = sig[n]; // data from the conv output is // overlap-added into signal vector m_vector[n] = m_out[m_count] + m_saved[m_count]; m_saved[m_count] = m_out[m_count + m_psize]; // if we have enough data in input buffer if (++m_count == m_psize) { convolution(); m_count = 0; }

310

19 Frequency-Domain Processing

} // return the object signal vector pointer return vector(); } Likewise, the OLS implementation follows similar principles, but replaces the method of input and output (Listing 19.8.) We can see that the main difference is that we will not zero-pad the input; instead we will have it filled completely with input data. In order to stream the data properly, this implementation writes new data to the second half of the buffer, and, after the process is complete, saves this back into the first half for next time (this is the save aspect of the method). The result always consists of the second half of the output buffer, with no overlaps. As we can see, the actual convolution process is the same in both approaches. Listing 19.8: OLS partitioned convolution method const double *AuLib::PConv::ols(const double *sig) { for (uint32_t n = 0; n < m_vframes; n++) { // data from sig feeds the conv input buffer // always in the second half of the buffer m_in[m_psize + m_count] = sig[n]; // we output only the second half of the // conv output buffer m_vector[n] = m_out[m_psize + m_count]; if (++m_count == m_psize) { convolution(); // save the second half of the previous input // back in the first half. std::copy(m_in.begin() + m_psize, m_in.end(), m_in.begin()); m_count = 0 } } return vector(); }

19.3.4 Convolution Reverb One of the typical applications of fast convolution is in the implementation of reverberation. Convolution reverb uses IRs recorded in different locations to create very realistic room effects. As we have noted above, processing live input requires that we minimise the input–output latency imparted by the fast convolution process. In fact, it is possible to reduce it to zero by employing a scheme with non-uniform partition sizes [19].

19.3 Fast Convolution

311

The principle is to divide the IR into sections, where the front ones have small partition sizes, and those towards the end have longer sizes. For instance, we could divide an arbitrary-length IR into three sections: early, using direct convolution; mid, using a small partition; and tail, with a long partition, as illustrated in Fig. 19.10. In order for the sizes to fit together, we start with a small power of two for the early section (e.g. 32 or 64), which then becomes the partition size for the middle section. The sizes of this section and the first section determine the partition size of the final stretch (a power of two, of course).

early mid tail

early

mid

tail

Fig. 19.10: Non-uniform partitioned convolution, splitting the IR into three sections.

So, for example, we can divide the impulse response into 32, 992, and L − 1024 samples, with L as the IR size. The partition sizes are then, respectively, 1, 32 and 1024. We use a different convolution for each section, feeding them with the correctly offset portions of the impulse response: the first 32 samples; the samples 32 to 1023; and the samples from 1024 to L − 1. We then run the three processors and add their output together. The following code shows an example of this principle. It is a reverb class made by composing three AuLib classes Fir (direct convolution), PConv (partitioned convolution), and SampleTable (a function table used to hold an IR taken from a file). The class is designed to be able to take an AuLib AudioBase object as input (that is, an arbitrary processing object) and apply a non-uniform multi-partition convolution. The complete code for the Reverb class is shown in Listing 19.9. Listing 19.9: Convolution reverb class. #include #include #include using namespace AuLib; class Reverb { // direct convolution frames const uint32_t dfrms = 32; // tail partition size const uint32_t part = 1024;

312

19 Frequency-Domain Processing

// impulse response SampleTable m_ir; // direct convolution Fir m_early; // middle section fast conv PConv m_mid; // tail section fast conv PConv m_tail; public: // parameter is impulse response filename Reverb(const char *impulse) : m_ir(impulse, 1), m_early(m_ir, 0, dfrms), // psize: drms, begin: dfrms end: part m_mid(m_ir, dfrms, 0, dfrms, part), // psize: part begin: part (to the end of table) m_tail(m_ir, part, 0, part){}; const AudioBase &operator()(const AudioBase &in, double g) { m_tail(in); m_mid(in); // mix the three outputs m_tail += m_mid += m_early(in); // scale reverb m_tail *= g; // mix direct signal return m_tail += in; } }; This reverb class can be used with any AuLib input object. Since its processing method outputs a reference to an AudioBase object, it can be used as an ordinary AuLib object, as well as avail of facilities in the library base class.

19.4 Streaming Spectral Processing We might have noticed, even though this has not been explicitly stated, that the DFT works like a snapshot of the frequency content of a waveform. It tells us something about the N samples that went in, and assumes that these are a single period of a wave that repeats forever. While we were able to put this to good use in implementing fast convolution, at some point we might want to try to do more with it. The most interesting types of audio signals have time-varying characteristics (frequency

19.4 Streaming Spectral Processing

313

glides, changes in amplitude and brightness, etc.). Therefore we need to try to capture this in the spectral analysis so that we can manipulate waveforms in useful ways. The key to this is an extension of the DFT to take account of time: the short time Fourier transform (STFT). At one extreme, this is represented by taking individual analyses of the signal at one-sample spacings. More commonly, however, we can take these at larger intervals of, for instance, N/4 or N/8 samples. The STFT allows us to process input signals as streams of data, producing frames of DFT bins at every analysis period. Such spectral streams are then characterised by the following parameters: • DFT frame size (N): how many bins we are holding, which affects the width of each bin, and therefore the frequency resolution of the analysis. The analysis splits the positive spectrum into N/2 evenly-spaced bins (plus the extra bin at the Nyquist frequency), centred at fs /N Hz. The more bins, the finer the analysis will be frequencywise. However, since we are taking more samples, this will also imply a longer time averaging of the data, which implies a less accurate analysis time-wise. Common sizes for streaming spectral analysis are 1024 and 2048 samples (with bin bandwidths of 46.8 and 23.4 Hz, respectively at fs = 48, 000). • Hopsize: how many samples of the input are in between each frame, affecting the quality of the resolution of the analysis, and also determining the stream analysis frame rate. Smaller intervals will also be more computationally demanding. • Window size: when we extract samples from a waveform to make up an analysis input frame, we are windowing the signal at a point. The size of this window is generally the same as N, but can be different in some cases. • Window shape: likewise, the windowing can simply extract the samples, in which case we are using a rectangular window, or apply an envelope of some other shape. This is used to smooth the transitions at the edges of the frame, providing a better analysis and reducing the artefacts that occur when we apply the inverse analysis to produce the output waveform. • Data format: the format of the spectral frame data (e.g. polar, rectangular). • Frame count: an index that defines the frame time in a stream, in much the same way that a waveform frame index identifies a time point for that sample frame. The STFT can be implemented as a sequence of DFT analyses spaced at hopsize samples, but we have to address two additional issues that arise as part of this process. The first of these is windowing. When we extract samples of an input stream, it is likely that we will cut the waveform in awkward positions, creating abrupt discontinuities at the edges of the window. This leads to a certain amount of smearing in the analysis, but, more critically, it prevents us from manipulating the spectral data more generally, since the recomposed waveform blocks will not fit together smoothly. A solution to this problem is to employ an envelope, that is, a window shape that will fade out to zero at its edges. Three examples of these are shown in Fig. 19.11, of which the third type (Hanning, also known as the Von Hann window) is

314

19 Frequency-Domain Processing

the most commonly used. Applying this to the data will prevent more significant problems with waveform discontinuities. This also has the side effect of broadening the analysis bandwidth to encompass not only a single bin, but also some of its neighbours. The net effect is of a certain band overlap in the spectrum; we may liken it to a band-pass filter bank whose bandwidths are wider than their spacing.

Bartlett

1.0

0.8

0.6

0.4

0.2

0.0

0

10

20

30

40

50

60

40

50

60

40

50

60

Kaiser

1.0

0.8

0.6

0.4

0.2

0.0

0

10

20

30

Hanning

1.0

0.8

0.6

0.4

0.2

0.0

0

10

20

30

Fig. 19.11: Three window shapes (N = 64): Bartlett (triangle), Kaiser, and Hanning (inverted raised cosine).

The second issue is that since the spacing between the analysis time points is less than a full frame (N), the phases in each analysis bin will contain an offset. This will be relative to the time position modulo the frame size. If we do not intend to

19.4 Streaming Spectral Processing

315

manipulate the phases of the spectral data, we may ignore this, as this offset will not matter. However, if the process we are applying affects the phases as well, then we need to address this issue. The simplest way to fix the phase offset is to rotate the samples of the waveform relative to their time position modulo the frame size. For example, if the hopsize is set to N/4, then successive frames will have the following rotations: 0, 1/4, 2/4, 3/4, 0, 1/4, . . . . To implement this, samples are moved around circularly in the frame, before the actual DFT is applied. Alternatively, the offset can be corrected in the phase of each bin, but that is probably less efficient (and more awkward), since the rotation can be implemented as part of the data input process. With these ideas in place, we can introduce a streaming spectral analysis/synthesis framework, which we will see implemented in AuLib by the Stft and SpecBase classes. Similar principles are implemented in Csound, as we will see when we discuss frequency-domain plugins in Chapter 20. The main pieces are an analysis (STFT) processor, which will produce the spectral-domain stream, and a synthesis (Inverse STFT) processor, which will recompose this stream into a timedomain signal. In between these two, a number of processes may be applied to the data (Fig. 19.12).

input -

STFT

- . . . - ISTFT - output

spectral stream

Fig. 19.12: Streaming spectral processing framework flowchart.

Since we have a well-defined format (based on the stream parameters listed above), this is an open framework for time-domain processing. Any process consuming and producing data in conformance with it can be slotted in between the analysis and the synthesis.

19.4.1 Spectral Analysis As we have outlined above, a spectral stream is created by producing a sequence of DFT frames from an input signal. Assuming the hopsize H is less than the frame size N, which is the most common case, the analyses will involve a certain input overlap. That is, each frame will contain some samples in common with previous ones. The extreme case is where H = 1, and N −1 samples overlap. Most commonly, H = N/D, where D, called the decimation, is 4, 8, or 16, which is also the number of overlapping frames. The overlap in the analysis means that as we input data, each waveform sample will feed D frames (Fig. 19.13). In AuLib, the analysis is implemented by the Stft class inthe fft::forward mode. In this class, a two-dimensional array with D

316

19 Frequency-Domain Processing

rows and N columns is used to hold the overlapped input samples. The following line in the analysis code feeds the input samples into D frames: m_framebufs[j][m_pos[j]++] = sig[i];

s(t)

where j is the frame index (0 to D − 1), m_pos[j] holds the sample count for the respective frame, and sig[i] is the sample from the input signal feeding the analysis. For each sample, we loop D times, placing it in each frame.

2 3

0 3H

? ?

2H

1

?

? H

0

Fig. 19.13: Overlapping frames in streaming spectral analysis, with D = 4.

Once the sample count for a frame reaches N, we can proceed to transform and output new spectral data. This will happen at every H samples of input. At that point, we window and copy the frame into the DFT input buffer and perform the transform in place. The Stft class takes as input a reference to a table object which will contain the window. This is set to have the same number of points as the analysis frame, and is accessed via the m_win member variable. As noted above, in order to correct the implicit phase offset that arises from overlapping analyses, the copying of the data has to take into account the required input rotation. This is calculated according to the frame time modulo N, and is a periodic circular shift: 0, H, 2H, . . . , (D − 1)H, which can be easily set by multiplying H by the frame index j. We can then input the data into a reinterpreted double array as follows: uint32_t offset = j * m_H; double *r = reinterpret_cast<double *>(m_cdata.data()); for (uint32_t n = 0; n < m_N; n++) r[(n + offset) % m_N] = m_framebufs[j][n] * m_win[n]; fft::transform(m_cdata, r); where m_cdata is a complex vector. Note that we are assuming a packed FFT format with the 0 Hz and the Nyquist points forming the first pair in the array. There are two options for the format of the bin data output by the object: fft::rectang or fft::polar. If the former is chosen, then the data is simply

19.4 Streaming Spectral Processing

317

copied into the output buffer. Otherwise, it is converted into amplitude and phase pairs before the copying.

19.4.2 Resynthesis At the resynthesis stage, we will attempt to retrace the steps performed during the analysis stage. The time domain signal will be a mix of D overlapping frames; therefore, for each sample we must loop D times, picking up the samples from each frame and accumulating them, with the line m_vector[i] += m_framebufs[j][m_pos[j]++]; which does the reverse of the analysis, where we extracted samples from the input into the D frames. At every H samples, we will take new data from the input, and then, depending on the numeric format, we will either convert it to real + imaginary pairs, or just copy the data into the DFT buffer. We then perform the inverse transform in place, and feed the current frame. We will have to take care to reverse-rotate it and apply a window. Finally, we overlap add all the transformed frames. The following code takes care of all of these steps: uint32_t offset = j * m_H; double *r = reinterpret_cast<double *>(m_cdata.data()); if (m_repr == fft::polar) { m_cdata[0].real(sig[0]), m_cdata[0].imag(sig[1]); for (uint32_t n = 2, k = 1; n < m_N; n += 2, k++) { m_cdata[k] = std::polar(sig[n], sig[n + 1]); } } else std::copy(sig, sig + m_N, r); fft::transform(r, m_cdata); for (uint32_t n = 0; n < m_N; n++) { m_framebufs[j][n] = r[(n + offset) % m_N] * m_win[n]; } The STFT transforms in the forward and inverse direction can be combined into one single function, which will act on an input signal that is either a vector of timedomain samples or a spectral frame (with complex numbers stored as pairs). The output vector will be also of one or the other form. The transform method of the AuLib::Stft class is shown in Listing 19.10. Listing 19.10: The short-time Fourier transform code in AuLib. const double * AuLib::Stft::transform(const double *sig, uint32_t vframes) {

318

19 Frequency-Domain Processing

for (uint32_t i = 0; i < vframes; i++) { if (m_dir == fft::inverse) m_vector[i] = 0.; for (uint32_t j = 0; j < m_D; j++) { if (m_dir == fft::forward) { m_framebufs[j][m_pos[j]++] = sig[i]; if (m_pos[j] == m_N) { uint32_t offset = j * m_H; double *r = reinterpret_cast<double *>(m_cdata.data()); for (uint32_t n = 0; n < m_N; n++) { r[(n + offset) % m_N] = m_framebufs[j][n] * m_win[n]; } fft::transform(m_cdata, r); if (m_repr == fft::polar) { m_vector[0] = m_cdata[0].real(), m_vector[1] = m_cdata[0].imag(); for (uint32_t n = 2, k = 1; n < m_N; n += 2, k++) { m_vector[n] = std::abs(m_cdata[k]); m_vector[n + 1] = std::arg(m_cdata[k]); } } else std::copy(r, r + m_N, m_vector.begin()); m_pos[j] = 0; m_framecount++; } } else { m_vector[i] += m_framebufs[j][m_pos[j]++]; if (m_pos[j] == m_N) { uint32_t offset = j * m_H; double *r = reinterpret_cast<double *>(m_cdata.data()); if (m_repr == fft::polar) { m_cdata[0].real(sig[0]), m_cdata[0].imag(sig[1]); for (uint32_t n = 2, k = 1; n < m_N; n += 2, k++) { m_cdata[k] = std::polar(sig[n],sig[n + 1]); } } else std::copy(sig, sig + m_N, r); fft::transform(r, m_cdata); for (uint32_t n = 0; n < m_N; n++) {

19.4 Streaming Spectral Processing

319

m_framebufs[j][n] = r[(n + offset) % m_N] * m_win[n]; } m_pos[j] = 0; m_framecount++; } } } } return vector(); } An example of the typical usage of the Stft class in streaming processing is shown in Listing 19.11. This example demonstrates the analysis/synthesis operations with no processing; any spectral manipulation would be placed in between these two steps. The process() method calls transform() to do the analysis or synthesis as shown in Listing 19.10. Listing 19.11: Streaming spectral processing example program. #include #include #include #include

<SoundOut.h> <Stft.h> <Wintabs.h>

using namespace AuLib; int main(int argc, const char **argv) { Oscil sig; Hann win; Stft ana(win, fft::forward), syn(win, fft::inverse); SoundOut output(argv[1]); // DSP loop for (int i = 0; i < def_sr * 2; i += def_vframes) { sig.process(0.5, 440.); ana.process(sig); syn.process(ana); output.write(syn); } return 0; } The Stft class overrides the AudioBase signal arithmetic operators (see Appendix A) so that they can be applied to spectral data. It also contains other methods that allow access to vector samples as complex numbers.

320

19 Frequency-Domain Processing

19.4.3 Spectral Manipulation As we have noted before, any process that manipulates bin data may be applied to the stream between the analysis and synthesis stages. For this, the most straightforward way to start manipulating the spectrum is to use a polar representation, which conveniently separates the amplitudes detected at each bin from the phases. This allows us to apply different types of filtering to the data by modifying only the amplitudes. We can think of the analysis as a bank of band-pass filters, spaced linearly in fs /N-wide bins. To make a low-pass filter, we can draw an amplitude curve that cuts off the higher bins, and apply it by multiplying the amplitudes of each frame. The filter can be time-varying: each frame may have a unique curve applied to it. Another effect could be created by using two spectral streams, multiplying their amplitudes together and using the phases of one or the other. This would work as a kind of cross-synthesis where one sound would be used as a ‘filter’ to modify the other. A bin-by-bin ‘noise gate’ is another possibility. Using a spectral amplitude mask (which could be derived from an input signal), cancel out all bins whose amplitudes fall below the mask amplitudes. A ‘spectral trace’ effect can also be created by keeping only the n loudest bins and zeroing the amplitudes of the others. Many different types of manipulation can be applied to the amplitudes alone. Classes implementing these effects can be derived from SpecBase, which has the fundamental supports in place for streaming spectral processing. For instance, it includes a ready() method, which can be used to check if the input data is ready to be processed. Called at every time-domain frame block, e.g. inside the DSP loop in Listing 19.11, it updates an internal count and returns true when it is time to produce a new spectral frame at the output. This allows seamless integration of spectral and time-domain processing. For instance, a spectral processing object spec could be slotted in between the analysis and synthesis, as in ana.process(sig); spec.process(ana); syn.process(spec); in which case it would call SpecBase::ready() to determine if it needed to act on its input. To process the spectral phases, we will need however to transform them into a more useful format. By taking the inter-frame phase deltas at every bin, we can calculate their instantaneous frequencies (IFs), which are more flexible to manipulate. In fact, from these deltas we can in some cases detect quite accurate actual partial frequencies. By applying some conversion to the raw phase differences, we can obtain these in Hz. This will then allow us to hold frames of amplitude–frequency pairs, opening up further possibilities for manipulation. This is known as phase vocoder [16] analysis, synthesis, and processing. In AuLib, we derive a Pvoc phase vocoder class from Stft, as the STFT is a more general form of streaming spectral analysis. Taking a frame in polar format, we 1. Take the difference, bin by bin, of the current and the previous frame phases (Δk )

19.4 Streaming Spectral Processing

321

2. Store the current phases for next time. 3. Unwrap the phase difference to bring it into the −π to π range. 4. Apply a conversion based on the bin centre frequency (k fs /N, where k is the bin index), and a scaling by a constant defined by the ratio of fs and the hopsize H times 2π (this is to do with how much the phase is supposed to increment between frames): fs fs (19.8) IF(k) = k + Δk N 2π H 5. Store the IF, replacing the phase. Processing only needs to happen once there is a new frame at the input, produced by the STFT every H time-domain frames. This is the norm for any streaming spectral manipulation function: data only needs to be acted on at these time intervals. For this purpose, we can keep track of the frame count and only process data if this has been incremented. To reconstitute the bin phases, we can do the process in reverse: apply the inverse conversion equation, and then accumulate the results to produce a running phase for each bin. Since instantaneous frequencies are phase deltas, the phases are obtained by adding up all the instantaneous frequencies of successive frames. The code providing a forward and inverse phase vocoder transform is shown in Listing 19.12. Listing 19.12: Phase vocoder code in AuLib. const double * AuLib::Pvoc::transform(const double *sig, uint32_t vframes) { // delta and conversion constants double delta, c = m_sr / m_N, d = m_sr / (twopi * m_H); uint32_t fmcnt = m_framecount; if (m_dir == fft::forward) { // STFT forward transform Stft::transform(sig, vframes); // if we need to produce a new frame, // m_framecount was updated in Stft::transform() if (m_framecount > fmcnt) m_done = false; if (!m_done) { // for each bin (except 0, Nyq) for (uint32_t i = 2, j = 1; i < m_N; i += 2, j++) { // take the inter-frame delta delta = m_vector[i + 1] - m_sbuf[j]; // save the current phase m_sbuf[j] = m_vector[i + 1];

322

19 Frequency-Domain Processing

// unwrap the delta if (delta >= pi) delta -= twopi; if (delta < -pi) delta += twopi; // apply the conversion into IF in Hz m_vector[i + 1] = j * c + delta * d; } m_done = true; } } else { // inverse transform if (!m_done) { // for each bin for (uint32_t i = 2, j = 1; i < m_N; i += 2, j++) { // store the current amplitude m_sbuf[i] = sig[i]; // re-cover the delta from IF in Hz delta = (sig[i + 1] - c * j) / d; // accumulate the delta m_sbuf[i + 1] += delta; } m_done = true; } // apply inverse STFT Stft::transform(m_sbuf.data(), vframes); // check that we need to produce a new // frame next time. if (m_framecount > fmcnt) m_done = false; } return vector(); } This process provides a new format for spectral streams, one that holds pairs of amplitudes and frequencies for each bin, in addition to rectangular and polar data frames (Fig. 19.14). With spectral data in amplitude–frequency format, all manner of manipulations are possible. We can for instance, shift the pitch of a stream by scaling all frequencies and then moving them (along with their amplitudes) to a new bin whose centre frequency is close to the new frequency. We can create spectral morphing effects by interpolating two streams. We can place the stream in a delay line and read it at different rates, time-scaling the stream (stretching or compressing it). Many different effects are possible, given the malleable nature of the format. To implement

19.5 Conclusions

323 bin 1

bin 2

rectangular

0Hz Nyq

re im re im . . .

polar

0Hz Nyq

a ph a ph . . .

phase vocoder

0Hz Nyq

a fr

bin 1

bin 1

bin 2

bin 2

a fr . . .

Fig. 19.14: The different spectral frame formats and their data.

these, we can derive from SpecBase, as this class has the basic infrastructure for amplitude–frequency processing. Finally, this format allows us to resynthesise the data with an oscillator bank instead of doing the inverse STFT operation. In this way, we can treat each bin as a control stream containing the amplitudes and frequencies of a sine wave generator. The output is a sum of all sine waves, up to one per bin, in a process called additive synthesis. Furthermore, we may process frame data, finding the amplitude peaks that correspond to waveform partials. By connecting these in successive frames, we can create control tracks that will model these partials. This process would lead us from N bins to M tracks (M < N). This has the effect of reducing the number of oscillators needed, and effectively modelling the input as a mix of sinusoidal tracks [45]. It is also possible to extract, separately, the more noisy and transient sound components that resist representation as sinusoids [51, 57].

19.5 Conclusions Frequency-domain audio effects are a very important component of the toolbox for music signal processing. This chapter has introduced the fundamental aspects of spectral manipulation: the DFT, the FFT, fast convolution, and streaming spectral processing. These were explored from the perspective of object-oriented programming and their implementation in AuLib. The DSP details of these operations are described in more detail in the companion text [36], which the reader may refer to also for ideas for frequency-domain processes that can be applied to streaming data. In fact, in the next chapter, as part of discussing plugins in general, we will see how streaming spectral processing can be implemented in another context and another music computing framework.

324

19 Frequency-Domain Processing

Problems 19.1. Create a reverberation program to demonstrate the Reverb class (Listing 19.9). 19.2. Derive a class from AuLib::SpecBase to implement a low-pass spectralprocessing filter of your design, with time-varying parameters. Write a test program to demonstrate it (using AuLib::Stft objects for analysis and synthesis). Make sure the processing only occurs when a new frame is available for input.

Chapter 20

Plugins

Abstract In this chapter we introduce computer music instrument components as plugins to larger systems. As a practical example, we take a look at a C++ framework designed to facilitate the implementation of plugin opcodes for Csound. The chapter explores each aspect of component development with reference to the principles already introduced in earlier chapters, but now applied to plugins. Examples of signal generation, processing, and spectral manipulation are provided. One of the typical ways in which developers can supply new algorithms to extend working audio processing systems and digital audio workstations is through a plugin mechanism. Plugins are in most cases built as dynamically-loaded modules, which implement some key functionality through a well-defined interface. The most common languages in which these interfaces are defined are C and C++. The OOP paradigm is particularly useful in this context, and we will observe that many systems adopt it to provide a model for plugins. In this chapter, we will study the development of plugins in C++ for Csound [39]. A basic understanding of this system is assumed, and users may refer to the companion text [36] for a general introduction to it. However, many of the principles explored here can also be adopted more widely in other systems.

20.1 Plugins in Csound The Csound [39] sound and music computing system is composed of a set of processing units known as opcodes or unit generators [34], which are central to it. In order to allow for extensions to the system, Csound provides a plugin mechanism to load new opcodes from user-supplied libraries. Plugins can be written in C or C++, as well as using other programming languages. Csound has a very comprehensive C API for opcode development, which provides low-level access to the underlying audio engine to support it.

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0_20

325

326

20 Plugins

While the C interface follows an OOP design, it does not provide significant support for some of the principles we have been developing in previous chapters, particularly in terms of code reuse and benefitting from existing object-oriented algorithms and libraries. As we have been learning, C++ in its more modern incarnations [63] supports these concepts very well, and, as an extension to C, can be made to use the underlying API in a seamless way. For this purpose, Csound now includes a lightweight C++ framework designed to facilitate the programming of Csound plugins, the Csound Plugin Opcode Framework (CPOF) [37]. This is effectively an OOP C++ layer that attempts to be thin, simple, and complete, handling internal Csound resources in a safe way. It takes advantage of the same mechanisms provided by the underlying C interface. CPOF is composed of the framework proper, that is, base classes that are specialised to make new opcodes, and a collection of support classes that wrap the underlying C API in a convenient way.

20.2 Framework Design Csound plugins need to conform to a number of constraints, which will limit the types of C++ construct we will be able to employ. The following list outlines some of these conditions: 1. The Csound engine is written in C. It includes mechanisms to instantiate opcodes and call opcode functions, which will work with C++ opcodes as long as these obey some rules of linkage and data structures. 2. Owing to condition 1, we are not able to employ the dynamic binding that is needed for virtual methods. All binding needs to be done at compile time. 3. Certain functions will have to be defined as static so that they can be registered with Csound. 4. Three different functions for signal processing are normally implemented in plugins, which are set to be called by the engine at different action times. This makes up the required method space for a plugin. 5. An opcode is derived from a basic model, which in C is given by the data structure OPDS. In CPOF, this becomes the fundamental base class to the opcode framework. Since we cannot make use of virtual functions but would still want to take advantage of inheritance for code reuse, we could have opted to use a design approach called the curiously recurring template pattern (CRTP) [2]. This mimics the virtualfunction mechanism at compile time, at the cost of some loss in code readability and a somewhat complex class structure. However, as it turns out, there is no need for this, the reason being that, in the narrow scope of Csound plugins, opcode classes are never user-instantiated, and therefore it is much simpler to provide the correct function override. As the engine is responsible for object instantiation and method calls, it is easy to define at compile

20.2 Framework Design

327

time which functions should be invoked. We can do this just by providing methods that hide rather than override the base class ones. We should remember, from Chapter 14, that if we do not mark a member function as virtual, any appearances of the same in derived objects will hide the base class one (instead of overriding it). However, given the particular application scenario here, there is no difference between these two cases. We can have a plugin base class from which we will inherit to create the actual opcodes. This class is derived from OPDS, providing some extra members that may be useful to all of its subclasses. It will also provide stub methods for the three required processing functions, which can then be specialised in the actual opcode classes.

20.2.1 The Base Classes The opcode framework actually consists of two template base classes, from which opcode classes derive and instantiate. CPOF classes, enumerations, and functions are declared in the namespace csnd, in the header file plugin.h. The fundamental template base class, used in the majority of cases, is called Plugin. In order to create a new opcode, we write our class as its subclass, and instantiate the template by passing the numbers of outputs and inputs needed. For example, a minimal, non-operational, opcode with one input and one output can be declared as #include struct Opcd : csnd::Plugin<1,1> { }; This class is perfectly correct from the viewpoint of the framework, although it does not define any specialised processing code, therefore when the opcode is used, nothing happens. In this case, the base class methods (stubs) are used, but they are simply placeholders and have no processing code. Therefore, to implement any type of action, we will need to define one or more of the following processing methods: init(), kperf(), and/or aperf(). The base class also includes the follow member variables: • • • • •

outargs: a Params object holding output arguments. inargs: input arguments (Params). csound: a pointer to the Csound engine object. offset: the starting position of an audio vector (for audio opcodes only). nsmps: the size of an audio vector (also for audio opcodes only).

The second base class in the framework is FPlugin. This is derived from Plugin, adding an extra member variable: • framecount: a member to hold a running count of spectral frames.

328

20 Plugins

OPDS 6 Plugin 6 FPlugin 6

Opcode

SpecOp Fig. 20.1: CPOF base classes (solid) and opcode subclasses (dashed).

This is used in streaming spectral-processing opcodes (see Sect. 19.4). Figure 20.1 illustrates the CPOF base classes and opcode classes that can be derived from them. Opcode inputs and outputs (inargs and outargs) are passed via the Params class, template class Param { MYFLT *ptrs[N]; ... }; which wraps the different types used by Csound (scalars, vectors, strings, spectral, arrays). This class provides a number of convenience methods that can be used to get or set the data in the desired form, as we will see in the examples below.

20.2.2 Deriving Opcode Classes As noted above, deriving a class from Plugin requires that we provide processing methods to be be called by the engine when the opcode is run. Csound has two basic action times (or passes) for opcodes, when these methods are invoked: 1. Initialisation time (i-time): processing happens once at every new opcode instantiation, or at specified re-initialisation steps. For example, when an instrument containing an opcode starts, an init-time pass is carried out (once).

20.2 Framework Design

329

2. Performance time (perf-time): following an init-time pass, opcodes are called in a loop (called the k-cycle loop) and their processing functions are invoked periodically. Code for the init-time pass should be placed in the init() function of the opcode class. The perf-time pass can be of two types: 1. Scalar: the opcode consumes/produces single values. This is called control or k-rate processing. 2. Vectorial: the opcode consumes/produces blocks of audio samples. This is cvalled audio or a-rate processing. These two modes of processing are implemented by the two other class methods, kperf() and aperf(), respectively. Opcodes can operate at i-time, k-rate, and/or a-rate processing. If their inputs and outputs are only i-time variables, then they only need to implement the init() method. If they take k-(scalar) or a-(vectorial) type variables, then they may need to implement the other functions, accordingly. In these cases, whenever an initialisation step is needed, we will have to implement the i-time method also. When registering opcodes, as discussed in Sect. 20.2.3, we define what action times the opcode class has been designed for. The following examples demonstrate the derivation of plugin classes for each of these opcode types (i, k or a).

i-time opcodes For init-time opcodes, all we need to do is provide an implementation of the init() method: struct Simplei : csnd::Plugin<1,1> { int init() { outargs[0] = inargs[0]; return OK; } }; In this simple example, we just copy the input arguments to the output once, at init-time. Each scalar input type can be accessed using array indexing. All numeric argument data is real, and declared as MYFLT, the internal floating-point type used by Csound.

k-rate opcodes For opcodes running only at k-rate (with no i-time operation), all we need to do is provide an implementation of the kperf() method:

330

20 Plugins

struct Simplek : csnd::Plugin<1,1> { int kperf() { outargs[0] = inargs[0]; return OK; } }; Similarly, in this simple example, we just copy the input arguments to the output in each k-period.

a-rate opcodes For opcodes running only at a-rate (with no i-time operation), all we need to do is provide an implementation of the aperf() method: struct Simplea : csnd::Plugin<1,1> { int aperf() { std::copy(inargs(0)+offset,inargs(0)+nsmps, outargs(0)); return OK; } }; In a Plugin-derived opcode, the number of samples in a vector is always given by the nsmps member variable, and its starting position by offset. Since audio arguments are nsmps-size vectors, we can get these as MYFLT pointers, using the overloaded operator() for the inargs and outargs objects, which takes the opcode argument number as a parameter. Alternatively, we could use the AudioSig class that wraps these raw pointers, facilitating an OOP approach to handling audio data. This class provides iterators, as well as subscript access. Objects are constructed by passing the current plugin pointer (this; see 14.5.3) and the raw parameter pointer, for example csnd::AudioSig in(this,inargs(0)); csnd::AudioSig out(this,outargs(0)); std::copy(in.begin(), in.end(), out.begin()); Note that this class encapsulates an audio signal vector completely. Therefore, we do need to refer directly to attributes such as the number of samples or the offset. We will see in some of the following examples how this can be very helpful when designing an opcode to process audio.

20.2 Framework Design

331

20.2.3 Registering Opcodes with Csound Once we have written our opcode classes, we need to inform Csound about their existence, so that they can be listed and employed in user code. For this we have the function template plugin(): template int plugin(Csound *csound, const char *name, const char *oargs, const char *iargs, uint32_t thrd, uint32_t flags = 0) Its parameters are: • csound: a pointer to the Csound object to which we want to register our opcode. • name: the opcode name as it will be used in Csound code. • oargs: a string containing the opcode output types, one identifier per argument • iargs: a string containing the opcode input types, one identifier per argument • thrd: a code to tell Csound when the opcode should be active. • flags: multithread flags (generally 0 unless the opcode accesses global resources). For opcode type identifiers, the most common types are a (audio), k (control), i (i-time), S (string), and f (fsig). For the thread argument, we have the following options, depending on the class processing methods which we want to run in a given opcode: • • • • •

thread::i: indicates init(). thread::k: indicates kperf(). thread::ik: indicates init() and kperf(). thread::a: indicates aperf(). thread::ia: indicates init() and aperf().

We instantiate and call these template functions inside the plugin dynamic library entry-point function on_load(). This function needs to implemented only once1 in each opcode library. For example, #include <modload.h> void csnd::on_load(Csound *csound){ csnd::plugin<Simplei>(csound, "simple", "i", "i", csnd::thread::i); csnd::plugin<Simplek>(csound, "simple", "k", "k", csnd::thread::k); csnd::plugin<Simplea>(csound, "simple", "a", "a", 1

The header file modload.h, where on_load() is declared, contains three boilerplate calls to Csound module C functions, required for Csound to load plugins properly. For this reason, each plugin library should also include this header only once, otherwise duplicate symbols will cause linking errors.

332

20 Plugins

csnd::thread::a); return 0; } will register the simple polymorphic opcode, which can be used with i-time, krate and a-rate variables. In each instantiation of the plugin registration template, the class name is passed as an argument to it, followed by the function call. If the class defines two specific static members, otypes and itypes, to hold the types for output and input arguments, declared as struct MyPlug : csnd::Plugin<1,2> { static constexpr char const *otypes = "k"; static constexpr char const *itypes = "ki"; ... }; then we can use a simpler overload of the plugin registration function: template int plugin(Csound *csound, const char *name, uint32_t thread, uint32_t flags = 0) For some classes, this may be a very convenient way to define the argument types. For other cases, where opcode polymorphism may be involved, we may reuse the same class for different argument types, in which case it is not desirable to define these statically in a class.

20.3 The Csound Engine Object Opcodes are run by an engine that is encapsulated by the Csound class. They all hold a pointer to this, called csound, which is needed for some of the operations invoking parameters, and for some utility methods (such as console messaging, MIDI data access, and FFT operations). The following are some useful public methods in the Csound class: • init_error(): takes a string message and signals an initialisation error. • perf_error(): takes a string message, and an instrument instance and signals a performance error. • warning(): warning messages. • message(): information messages. • sr(): returns the engine sampling rate. • _0dbfs(): returns the maximum amplitude reference. • _A4(): returns the A4 pitch reference. • nchnls(): return number of output channels for the engine. • nchnls_i(): similarly, for input channel numbers. • current_time_samples(): the current engine time in samples.

20.4 Opcode Programming

333

• current_time_seconds(): the current engine time in seconds. • midi_channel(): the MIDI channel assigned to this instrument. • midi_note_num(): the MIDI note number (if the instrument was instantiated with a MIDI NOTE ON). • midi_note_vel(): simliarly, for velocity. • midi_chn_aftertouch(),midi_chn_polytouch(), midi_chn_ctl(), midi_chn_pitchbend(): the MIDI data for this channel. • midi_chn_list(): list of active notes for this channel. • fft_setup(), rfft(), fft(): FFT operations. In addition to these, the Csound class also holds a deinit method registration function template for Plugin objects to use: template void plugin_deinit(T *p); This is only needed if the plugin has allocated extra resources using mechanisms that require deallocation (see Sect. 20.4.6). It is not employed in most cases, as we will see below. To use it, the plugin needs to implement a deinit() method and then call the plugin_deinit() method passing itself (through its this pointer) in its own init() function: csound->plugin_deinit(this);

20.4 Opcode Programming In this section, we will look at key aspects of opcode programming, exploring the various supports for typical application requirements, such as memory allocation, function table access, use of external resources, and multithreading. As part of this, we will also discuss the manipulation of several different Csound variable types in a number of programming examples. In order to keep the examples meaningful, we will re-implement some basic processing components already explored in earlier chapters.

20.4.1 Delay Line As a first example of audio signal processing, we implement here a simple delay line [31] opcode, whose delay time is set at i-time, providing a slap-back echo effect as discussed in Chapter 18. This will require us to allocate memory for the delay buffer, which will need some special attention, as the mechanism to do this needs to conform to certain conditions. In order to be efficient, and also to prevent leaks and undefined behaviour we need to leave all memory allocation to Csound and refrain from using C++ allocators

334

20 Plugins

or standard library containers that use dynamic allocation behind the scenes (e.g. std::vector). If we follow these rules, our code will work as intended and cause no problems for users. This requires us to use the AuxAlloc mechanism implemented in the Csound engine to manage memory dynamically. To access it, CPOF provides a wrapper template class (which is not too dissimilar to std::vector) for us to allocate and use as much memory as we need. This functionality is given by the AuxMem class, which has the following methods and members: • • • • •

allocate(): allocates memory (if required). operator[]: array-subscript access to the allocated memory. data(): returns a pointer to the data. len(): returns the length of the vector. begin(), cbegin() and end(), cend(): return iterators to the beginning and end of data. • iterator and const_iterator: iterator types for this class. With this in hand, we can create a class that implements a delay effect as described in Sect. 18.2. The opcode is called by Csound with the following (functional) syntax: asig = delayline(ain, idel) with ain as the input signal, and idel as the init-time delay time. The code is outlined in Listing 20.1. Listing 20.1: The delayline opcode. struct DelayLine : csnd::Plugin<1,2> { csnd::AuxMem<MYFLT> delay; csnd::AuxMem<MYFLT>::iterator iter; int init() { delay.allocate(csound, csound->sr()*inargs[1]); iter = delay.begin(); return OK; } int aperf() { csnd::AudioSig in(this, inargs(0)); csnd::AudioSig out(this, outargs(0)); std::transform(in.begin(),in.end(), out.begin(), [this](MYFLT s){ MYFLT o = *iter; *iter = s; if(++iter == delay.end()) iter = delay.begin();

20.4 Opcode Programming

335

return o;}); return OK; } }; In this example, we use an AuxMem iterator to access the delay vector. It is equally possible to access each element with an array-style subscript. Since the extra memory allocated by this class is managed by Csound, we do not need to be concerned about disposing of it. To register this opcode, we use csnd::plugin(csound,"delayline", "a", "ai", csnd::thread::ia);

20.4.2 Table-Lookup Oscillator The next example explores the principles of table lookup introduced in Chapter 13 by implementing a simple truncating oscillator. Access to Csound function tables is also facilitated by a thin wrapper class that allows us to treat it as a vector object. This is provided by the Table class, which has the following members: • • • • •

init(): initialises a table object from an opcode argument pointer. operator[]: array-subscript access to the function table. data(): returns a pointer to the function table data. len(): returns the length of the table (excluding guard point). begin(), cbegin() and end(), cend(): return iterators to the beginning and end of the function table. • iterator and const_iterator: iterator types for this class. With the support of this class, we are able to implement any type of table lookup. A typical application example is found in oscillators, as discussed in Chapter 13. The Csound syntax for such an opcode is asig = oscillator(kamp, kcps, itab) where kamp and kcps are k-rate (scalar, control) signals for the amplitude and the fundamental frequency in Hz, and itab is the i-time function table number. The opcode class is laid out in Listing 20.2. Listing 20.2: The Oscillator class. struct Oscillator : csnd::Plugin<1,3> { static constexpr char const *otypes = "a"; static constexpr char const *itypes = "kki"; csnd::Table tab; double scl; double x;

336

20 Plugins

int init() { tab.init(csound,inargs(2)); scl = tab.len()/csound->sr(); x = 0; return OK; } int aperf() { csnd::AudioSig out(this, outargs(0)); MYFLT amp = inargs[0]; MYFLT si = inargs[1] * scl; for(auto &s : out) { s = amp * tab[(uint32_t)x]; x += si; while (x < 0) x += tab.len(); while (x >= tab.len()) x -= tab.len(); } return OK; } }; The table object is initialised by passing the relevant argument pointer to it (using its data() method). Note also that, since we need to manipulate the phase index very precisely, it is hard to use iterators in this case without making the code very awkward. Therefore we employ straightforward array subscripting. The opcode is registered by csnd::plugin(csound, "oscillator", csnd::thread::ia);

20.4.3 Text Processing Text in Csound is manipulated via S(string)-type variables. Such objects are held in a STRINGDAT data structure, typedef struct { char *data; int size; } STRINGDAT; which contains a data member that holds the actual string and a size member with the allocated memory size. There are no classes to wrap strings, but translated access to string arguments is provided through the Param object str_data() member

20.4 Opcode Programming

337

function. This takes an argument index (similarly to data()) and returns a reference to the string variable, as demonstrated in this example: struct Tprint : csnd::Plugin<0,1> { static constexpr char const *otypes = ""; static constexpr char const *itypes = "S"; int init() { char *s = inargs.str_data(0).data; csound->message(s); return OK; } }; This opcode will print the string to the console. Note that we have no output arguments, and so we set the first template parameter to 0. We register it using csnd::plugin(csound, "tprint", "", "S", csnd::thread::i);

20.4.4 Spectral Processing As we have noted, for streaming spectral processing opcodes, we have a different base class with the extra facilities needed for their operation (FPlugin). Spectral data in Csound is carried by f-type variables (fsigs). These are held internally in a PVSDAT C-language data structure. To facilitate their manipulation, CPOF provides the Fsig class, derived from PVSDAT. While fsigs can carry different types of spectral data, the most common format is the phase vocoder frame, composed of amplitude–frequency bins representing equally-spaced frequency bands, as discussed in Chapter 19. The fsig type encapsulates the spectral-processing signal parameters described in Sect. 19.4. Csound also includes a special sliding mode, where the hopsize is set to 1 and frames are produced at the audio rate. To access phase vocoder bins, a container interface is provided by pv_frame (or spv_frame, for the sliding mode)2 . This holds a series of pv_bin (spv_bin for the sliding mode)3 objects, which have the following methods: • • • • • 2

amp(): returns the bin amplitude. freq(): returns the bin frequency. amp(float a): sets the bin amplitude to a. freq(float f): sets the bin frequency to f. operator*(pv_bin f): multiplies the amplitude of a pvs bin by f.amp.

pv_frame is a convenience typedef for Pvframe, whereas spv_frame is Pvframe<spv_bin> 3 pv_bin is Pvbin and spv_bin is Pvbin<MYFLT>.

338

20 Plugins

• operator*(MYFLT f): multiplies the bin amplitude by f • operator*=(): unary versions of the above. The pv_bin class can also be translated into an std::complex object if needed. This class is also fully compatible with the C complex type and an object obj can be cast into a float array consisting of two items (or a float pointer) using reinterpret_cast(obj) or reinterpret_cast (&obj). The Fsig class has the following methods: • init(): initialisation from individual parameters or from an existing fsig. Also allocates frame memory as needed. • dft_size(), hop_size(), win_size(), win_type() and nbins(), returning the PV data parameters. • count(): get and set the fsig framecount. • isSliding(): checks for sliding mode. • fsig_format(): returns the fsig data format (fsig_format::pvs, ::polar ::complex, or ::tracks). The pv_frame (or spv_frame) class contains the following methods: • • • •

operator[]: array-subscript access to the spectral frame data(): returns a pointer to the spectral frame data. len(): returns the length of the frame. begin(), cbegin() and end(), cend(): return iterators to the beginning and end of the data frame. • iterator and const_iterator: iterator types for this class. Spectral-processing opcodes run nominally at k-rate but internally use an update rate based on the analysis hopsize. For this to work, a frame count is kept and checked to make sure that we only process the input when new data is available. As an example, the class in Listing 20.3 implements a simple gain scaler for fsig variables. Listing 20.3: The PVGain class. struct PVGain : csnd::FPlugin<1, 2> { static constexpr char const *otypes = "f"; static constexpr char const *itypes = "fk"; int init() { if(inargs.fsig_data(0).isSliding()){ char *s = "sliding not supported"; return csound->init_error(s); } if(inargs.fsig_data(0).fsig_format() != csnd::fsig_format::pvs && inargs.fsig_data(0).fsig_format() != csnd::fsig_format::polar){

20.4 Opcode Programming

339

char *s = "format not supported"; return csound->init_error(s); } csnd::Fsig &fout = outargs.fsig_data(0); fout.init(csound, inargs.fsig_data(0)); framecount = 0; return OK; } int kperf() { csnd::pv_frame &fin = inargs.fsig_data(0); csnd::pv_frame &fout = outargs.fsig_data(0); uint32_t i; if(framecount < fin.count()) { std::transform(fin.begin(), fin.end(), fout.begin(), [this](csnd::pv_bin f){ return f *= inargs[1]; }); framecount = fout.count(fin.count()); } return OK; } }; Note that, as with strings, there is a dedicated method in the arguments object that returns a reference to an Fsig class (which can also be assigned to a pv_frame reference). This is used to initialise the output object at i-time and then to obtain the input and output variable Csound processing data. The framecount member is provided by the base class, as well as the format check methods. This opcode is registered using csnd::plugin(csound, "pvg", csnd::thread::ik);

20.4.5 Array Processing The array type in Csound is defined by the C data structure ARRAYDAT. In order to facilitate access to arguments of this type, CPOF provides a wrapper class, Vector. This is derived from ARRAYDAT, and includes the following members: • init(): initialises an output variable. • operator[]: array-subscript access to the vector data. • data(): returns a pointer to the vector data.

340

20 Plugins

• len(): returns the length of the vector. • begin(), cbegin() and end(), cend(): return iterators to the beginning and end of the vector. • iterator and const_iterator: iterator types for this class. • data_array(): returns a pointer to the vector data. The inargs and outargs objects in the Plugin class have a template method that can be used to get a Vector class reference. A trivial example is shown below: struct SimpleArray : csnd::Plugin<1, 1>{ int init() { csnd::Vector<MYFLT> &out = outargs.vector_data<MYFLT>(0); csnd::Vector<MYFLT> &in = inargs.vector_data<MYFLT>(0); out.init(csound, in.len()); return OK; } int kperf() { csnd::Vector<MYFLT> &out = outargs.vector_data<MYFLT>(0); csnd::Vector<MYFLT> &in = inargs.vector_data<MYFLT>(0); std::copy(in.begin(), in.end(), out.begin()); return OK; } }; Note that output arrays need to be initialised to a given length, which is done by the Vector::init() method. This opcode is registered using the following line: csnd::plugin<SimpleArray>(csound, "simple", "k[]", "k[]", csnd::thread::ik); Since MYFLT arrays are the most commonly-used in Csound, CPOF provides a myfltvec definition that instantiates the template for that variable type. Together with the Params::myfltvec_data() method, it simplifies access to opcode arguments: csnd::myfltvec &out = outargs.myfltvec_data(0); The Vector class only wraps one-dimensional arrays. For more than one dimension, the ARRAYDAT structure needs to be used directly.

20.4 Opcode Programming

341

20.4.6 External Resources Opcode classes can, in general, be composed of member variables of any type, built in or user defined. However, we have to remember that opcodes are allocated and instantiated by C code, which does not know anything about classes. A member variable of a given class will not be constructed at instantiation when the memory for it might be first allocated, therefore we need to arrange for its constructor to be invoked explicitly, if required. The mechanism to do this is the C++ placement new operator: new ( placement-parameters ) constructor ( constructor-parameters ) where the placement parameters required to be passed usually simply consist of a pointer to the existing pre-allocated memory. The placement new does not allocate memory, but uses the already existing space. So, this is a practical solution for cases where the allocation happens in C, as is the situation with Csound. This allows Csound to use C++ classes that require construction, such as ones that will allocate and use external resources (e.g. memory not allocated by Csound). While in most circumstances the advice is to avoid the use of external libraries that dynamically allocate resources outside of the control of the Csound engine, this might impose too narrow constraints on opcode developers. The solution chosen was to provide a clear mechanism for the management of external resources. To facilitate this, a template function is available to construct any member objects as needed, hiding the placement new and furnishing a standard means of calling constructors: template T *constr(T* p, Types ... args){ return new(p) T(args ...); } For instance, if we have in our opcode a member variable of type A called obj, we can construct it by placing the following line in the plugin init() method: csnd::constr(&obj,10,10.f); where the arguments are the variable address, followed by any class constructor parameters. Equally, if a class allocates any resources (which we will assume is the case unless documented otherwise), we are required to invoke its destructor explicitly. This is done through calling csnd::destr(&obj) in a deinit() method, which is the corresponding template function to access the class destructor. It is important not to miss this step, as that could lead to memory leaks and other undefined behaviour. As an example, we will look at using standard C++ library classes in opcodes. Many such objects will require explicit constructor and destructor calls through the mechanism outlined above. The class in Listing 20.4 implements an opcode that generates signals based on a Gaussian distribution, using std::normal_distribution

342

20 Plugins

defined in the random header. This opcode is overloaded for audio and control signals (the actual processing function will be selected on the basis of its output type or via type annotation). Its general form is xsig = gaussian:x(imean, idev, iseed) where x stands for the output type (a, k), and the i-type parameters are the mean, standard deviation, and seed, in that order. Listing 20.4: Gaussian opcode class. struct Gaussian : csnd::Plugin<1, 3> { std::normal_distribution<MYFLT> norm; std::mt19937 gen; int init() { csnd::constr(&norm, inargs[0], inargs[1]); csnd::constr(&gen, inargs[2]); csound->plugin_deinit(this); return OK; } int deinit() { csnd::destr(&norm); csnd::destr(&gen); return OK; } int kperf() { outargs[0] = norm(gen); return OK; } int aperf() { csnd::AudioSig out(this, outargs(0)); for (auto &sample : out) sample = norm(gen); return OK; } }; As we can see, the use of external resources requires us to construct the object in the opcode init() method, and use an explicitly-defined deinit() callback to free them.

20.5 Conclusions

343

20.4.7 Multithreading Opcodes A C-language interface for multithreading is provided by the Csound API. This is implemented via pthreads [22] on POSIX systems, or other native threading libraries in non-POSIX platforms. A support class is provided to allow object-oriented access to the underlying C interface: the Thread pure virtual class. This is designed to be subclassed and instantiated to encapsulate a separate thread. The entry point is given by a run() method that needs to be implemented in the derived class. Thread also provides join() and get_thread() methods for joining a thread and getting its handle. Opcodes that require an asynchronous operation can take advantage of this class to spawn a new thread to work in parallel with the main processing.

20.5 Conclusions This chapter has described CPOF and its fundamental characteristics. We have looked at how the base classes are constructed, how to derive from them, and how to register new opcodes in the system. The framework is designed to support the modern C++ idioms discussed in this book, and adopts the C++11 standard. All of the code examples discussed in this chapter are provided in opcodes.cpp, found in the examples/plugin directory of the Csound source codebase4 . CPOF is part of Csound and is distributed alongside its public headers. To build a plugin opcode library, we require a C++ compiler supporting the C++11 [23] standard (-std=c++11), and the Csound public headers. CPOF has no link-time dependencies. The opcodes should be built as a dynamic/shared module (.so on Linux and .dylib on MacOS). For example, on MacOS, the following command is used: $ c++ -dynamiclib -o opcode.dylib opcode.cpp \ -std=c++11 \ -I /Library/Frameworks/CsoundLib64.framework/Headers On other systems, a similar command line can be used, with adjustments to the header path and the dynamic/shared library link flag (-shared in some compilers).

Problems 20.1. Write a ring modulation opcode with two audio inputs and one output. 20.2. Write a version of your spectral processing low-pass filter AuLib class (Problem 19.2) as a Csound opcode. 4

https://github.com/csound/csound

Appendix

Appendix A

AuLib Reference

In this appendix we provide a general reference to the AuLib code. We also discuss some details of its operation that have been left out of the main body of the book.

A.1 Library-Wide Definitions A number of library-wide constants and free functions are provided in the AuLib.h header, in the AuLib namespace Versioning: these constants and functions are defined in the namespace AuLib::Info and return information on version and copyright: const uint32_t major_version const uint32_t minor_version static inline const std::string version() static inline const std::string copyright() DSP: these constants control some key signal processing attributes of the library. They include default values for vector, buffer and table sizes; sampling rates; number of channels; FFT size; decimation; π and 2π ; the minimum value for double precision floats and the maximum value for 8-byte unsigned integers; and the −120 dB full-scale constant. const const const const const const const const const

uint32_t def_vframes uint32_t def_bframes uint32_t def_tframes double def_sr double def_kr uint32_t def_nchnls uint32_t def_fftsize uint32_t def_decim double pi

© Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0

347

348

const const const const

A AuLib Reference

double twopi double db_min uint64_t ui64_max double m120dBfs

The npow2() function is also defined to return the next power of two not greater than its argument. MIDI: defined in the namespace midi, these define message status codes: const uint32_t note_on const uint32_t note_off const uint32_t ctrl_msg const uint32_t aftouch const uint32_t poly_aftouch const uint32_t prg_msg const uint32_t pitchbend enum error_codes const std::string aulib_error[] In addition to these,fft.h defines a number of spectral-processing related constants and free functions in the AuLib::fft namespace: const bool forward const bool inverse const bool polar const bool rectang const bool packed void transform(std::vector<std::complex<double>> &data, bool dir) void transform(std::vector<std::complex<double>> &out, double *in, bool pckd = packed) void transform(double *out, std::vector<std::complex<double>> &in, bool pckd = packed)

A.2 AudioBase AuLib::AudioBase is the DSP base class for AuLib. It defines a generic object with no particular processing functions (apart from basic signal arithmetic). Protected members: • uint32_t m_nchnls: number of interleaved audio channels. • uint32_t m_vframes: number of frames in the audio vector.

A.2 AudioBase

349

• std::vector<double> m_vector: the audio vector containing space for at least the number of channels times the number of frames in audio vector. • double m_sr: sampling rate. • uint32_t m_error: error state indicator. Constructor: AudioBase(uint32_t nchnls = def_nchnls, uint32_t vframes = def_vframes, double sr = def_sr) • nchnls: number of channels. • vframes: number of frames in the vector. This is set to the next power of two no greater than the requested number of frames. Objects requiring arbitrary vector sizes should explicity resize the vector using the resize_exact() method. • sr: sampling rate. Attributes: methods for setting or getting object attributes. The vector frame size is normally set to the next power of two not greater than the requested value, unless resize_exact() is used. The vector is always cleared on resizing: uint32_t vframes(uint32_t frames) uint32_t resize_exact(uint32_t frames) uint32_t vframes() const uint32_t nchnls() const uint32_t sr() const uint32_t error() const virtual const char *error_message() const Frame access: these methods provide individual or block read or write access to frames in the object vector: const AudioBase &set(const AudioBase &obj) const AudioBase &set(const double *sig) const double *set(double v) double set(double v, uint32_t p) const double *vector() const double vector(uint32_t frndx, uint32_t chn) const Iterators: standard iterators are defined for this class, providing access to the signal vector: typedef std::vector<double>::iterator iterator typedef std::vector<double>::const_iterator const_iterator iterator begin() iterator end() const_iterator cbegin() const const_iterator cend() const

350

A AuLib Reference

Overloaded operators: • Signal arithmetic: these scale and offset the vector frames by scalars or vectors (raw pointers or object references). They can be redefined in derived classes if necessary: virtual const AudioBase &operator*=(double scal) virtual const AudioBase &operator*=(const double *sig) virtual const AudioBase &operator*=(const AudioBase &obj) virtual const AudioBase &operator+=(double offs) virtual const AudioBase &operator+=(const double *sig) virtual const AudioBase &operator+=(const AudioBase &obj) • Vector access, array-like access to the object signal vector: double &operator[](uint32_t ndx) const double &operator[](uint32_t ndx) const • Streams, providing stream IO for the object signal vector: friend std::ostream &operator<<(std::ostream &os, const AudioBase &obj) friend std::istream &operator>>(std::istream &is, AudioBase &obj) • Conversion: these are conversion operators into a vector reference or raw pointer. They allow the object to be cast as one of these types: operator const std::vector<double> &() const explicit operator const double *() const

A.3 Deriving New Classes The AuLib library is designed to be easily extended. There is significant freedom for developers in this process, as there are no strict prescriptions on the form or signature of processing (or other) methods. Usually we inherit from the AudioBase class to allow easy integration with existing objects, and to avail of basic audio processing facilities provided there. The majority of the library classes do this. Supplying a processing method that consumes a vector of samples and another that reads from an AudioBase reference is the minimum necessary to allow full integration with existing classes. The recommended approach is to separate interface from the implementation, thus placing the vector-processing code in a private method and providing a means of overriding it. The interface then delegates to this

A.3 Deriving New Classes

351

as appropriate. The example in Listing A.1 is a skeleton of an AudioBase-derived class that demonstrates these ideas. Listing A.1: An AudioBase-derived class. #include namespace AuLib { class NewClass : public AudioBase { // this is the main DSP method // takes in a const pointer with the input // frames and returns // a const pointer to the vector data virtual const double *dsp(const double *sig); protected: // a processing parameter double m_par; public: // constructor takes a default parameter value // as well as the basic AudioBase attributes NewClass(double param = .5, uint32_t nchnls = def_nchnls, uint32_t vframes = def_vframes, double sr = def_sr) : m_param(param), AudioBase(nchnls, vframes, sr){}; // basic interface to DSP process const double process(const double sig) { return dsp(sig); } // overload allowing for a parameter update const double process(const double *sig, double par) { m_par = par; return dsp(sig); } // overload taking an AudioBase object reference const NewClass &process(const AudioBase &obj) { if (obj.vframes() == m_vframes &&

352

A AuLib Reference

obj.nchnls() == m_nchnls) { dsp(obj.vector()); } else m_error = AULIB_ERROR; return this; } // overload taking an AudioBase object reference // and a parameter update const NewClass &process(const AudioBase &obj, double par) { m_par = par; dsp(obj); return this; } // function-like operator const NewClass &operator()(const AudioBase &obj) { return process(obj); } // function-like operator with two parameters const NewClass &operator()(const AudioBase &obj, double par) { return process(obj, par); } }; }

A.4 Audio DSP Classes The following is a list of the existing audio DSP classes in the library, arranged in their different function categories. Signal processors: • • • • • • •

Balance: balancing of input against a comparator, envelope following. Chn: channel extractor. Delay: delay line with feedback (comb filter). AllPass: high-order all-pass filter. Fir: direct convolution, finite impulse response filter. Iir: generic second-order infinite impulse response section. LowP: second-order low-pass filter.

A.4 Audio DSP Classes

• • • • • • • • • • • • •

HighP: second-order high-pass filter. BandP: second-order band-pass filter. BandR: second-order band-reject filter. ResonR: resonator with added second-order feedforward section. ResonZ: variation on ResonR. Reson: standard resonator. Pan: panning of a mono input. PConv: fast partitioned convolution. Tap: delay line tap (truncating read). Tapi: delay line tap with linear interpolation. ToneLP: first-order low-pass filter. ToneHP: first-order high-pass filter. Rms: root mean square computation. Signal generators:

• • • • • • • • • • • • • • • •

Envel: general-purpose multi-segment envelope. Adsr: attack–decay–sustain–release envelope. Line: line generator. Expon: exponential curve generator. Oscil: truncating oscillator. Oscili: linear-interpolation oscillator. Oscilic: cubic-interpolation oscillator. BlOsc: band-limited oscillator. SawOsc: sawtooth oscillator. TriOsc: triangle oscillator. SqOsc: square oscillator. SamplePlayer: general-purpose sampling oscillator. Phasor: phase generator. TableRead: table lookup. TableReadi: linearly interpolated lookup. TableReadic: cubic interpolation lookup. Streaming spectral processing:

• Stft: short-time Fourier transform (forward or inverse). • Pvoc: phase vocoder analysis or synthesis. • SpecBase: base class for streaming spectral processing. Function tables: • • • • • •

FuncTable: general-purpose function table. EnvelTable: multi-segment envelope table. FourierTable: Fourier series. HammingTable: Hamming window. HannTable: Hanning window. SawTable: sawtooth wave.

353

354

A AuLib Reference

• TriTable: triangle wave. • SqTable: square wave. • SampleTable: generic sampled-sound table. Input and output: • SoundIn: multichannel audio input. • SoundOut: multichannel audio output.

A.5 Control Classes A key aspect of sound and music computing software is how to manage, at a higher level, the instantiation of signal processing graphs. The simplest approach is to hardcode these in the program, which works well if we do need to make significant modifications to the graph during operation. A more flexible way is to compose audio processing objects into containers and provide means to instantiate and control these. That is an important element of any audio processing engines which might be constructed with AuLib. We can take advantage of object-oriented programming devices in C++ to implement similar functionality. In AuLib, a basic mechanism for instrument composition and instantiation is provided by two classes: AuLib::Instrument and AuLib::Note. Note: AuLib::Note is a base class that is specialised to contain a signal processing graph of AuLib objects. /** Note class: \n Base class for modelling synthesiser notes */ class Note : public AudioBase { /** specialise this to contain your signal processing objects */ virtual const Note &dsp() { return *this; } /** specialise this to handle any note onset processing (e.g. envelope resets etc) */ virtual void on_note(){}; /** specialise this to handle any note termination processing (e.g. envelope releases etc)

A.5 Control Classes

355

*/ virtual void off_note(){}; /** specialise this to handle any incoming msg */ virtual void on_msg(uint32_t msg, const std::vector<double> &data, uint64_t tstamp){}; ... } In line with the rest of the library, it contains a dsp() method that is to be overridden in derived classes. This will be responsible for executing the graph and producing an output. Since Note is derived from AudioBase, it contains the basic functionality for audio processing like all other objects in the library. Note also has three fundamental virtual methods used for controlling the signal processing graph: 1. on_note(): called when the graph is supposed to start executing. 2. off_note(): called to stop execution. 3. on_msg(): called on an arbitrary message that is sent to the graph. The derived class needs to implement these to allow for instrument control. The Note class contains basic attributes such as num, vel, channel, and timestamp. The first two can be thought of as independent parameters, even though they are named after the MIDI protocol note number and note velocity. The channel is an instrument identifier, which can be used to filter control data sent to an instance. The Note public interface provides the basic functionality to control the processing graph: Note(int32_t chn = -1, uint32_t nchnls = def_nchnls, uint32_t vframes = def_vframes, double sr = def_sr) const Note &process() bool is_on() const uint64_t time_stamp() const bool note_on(int32_t chn, double num, double vel, uint64_t tstamp) bool note_off(int32_t chn, double num, double vel) bool note_off() void ctrl_msg(int32_t chn, uint32_t msg, const std::vector<double> &data, uint64_t tstamp) void set_chn(uint32_t chn) Instrument: once a Note-derived class is created, we can pass it to an Instrument object, which will be able to instantiate and control it. This is a template that takes

356

A AuLib Reference

a Note type, and any number of argument types that are defined in the Note constructor, as template parameters. The constructor will take as parameters the maximum number of note instances, and the channel they should respond to. An Instrument object can be passed to players defined by AudioBase-derived classes. Two existing types of players are: • MidiIn: realtime MIDI input, to which one or more Instrument objects can be passed. • ScorePlayer: aScore object player, which plays one or more Instrument objects.

A.5.1 MIDI Synth Example A simple example showing the use of the MidiIn player is discussed here. A similar approach can be employed with other types of instrument player. The first step is to define our note model as a Note-derived class (Listing A.2). We have quite a lot of freedom to do this, the only requirement is to provide overrides for the dsp(), on_note(), off_note(), and on_msg() methods. These will be called by instruments following controls originating from players. The dsp() method is where we place our synthesis graph, which is fairly simple: an envelope controlling the amplitude of an oscillator. We use the set() method from AudioBase to fill the output vector of this note. The other methods take in the control data and set the relevant class members. Listing A.2: Note-derived class modelling a sine wave synth note. class SineSyn : public Note { // DSP override virtual const SineSyn &dsp() { if (!m_env.is_finished()) set(m_osc(m_env(), m_cps * m_bend)); else clear(); return *this; } // note off processing virtual void off_note() { m_env.release(); } // note on processing virtual void on_note() { m_amp = m_vel / 128.; m_cps = 440. * pow(2., (m_num - 69.) / 12.); m_env.reset(m_amp * 0.2, m_ctl[m_atn] + 0.001,

A.5 Control Classes

357

m_ctl[m_dcn] + 0.001, m_ctl[m_ssn] * m_amp, m_ctl[m_rln] + 0.001); } // msg processing virtual void on_msg(uint32_t msg, const std::vector<double> &data, uint64_t tstamp) { // pitchbend; if (msg == midi::pitchbend) { int32_t bnd = (int32_t)data[1]; bnd = (bnd << 7) | (int32_t)data[0]; double amnt = (bnd - 8192.) / 16384.; m_bend = std::pow(2., (4. * amnt) / 12.); } // ctrls: att, dec, sus, rel else if (msg == midi::ctrl_msg) { uint32_t num = (uint32_t)data[0]; m_ctl[num] = data[1] / 128.; } }; protected: // control list uint32_t m_atn, m_dcn, m_ssn, m_rln; std::map m_ctl; double m_bend; double m_cps; double m_amp; // signal processing objects Adsr m_env; Oscili m_osc; public: typedef std::array ctl_list; SineSyn(int32_t chn, SineSyn::ctl_list lst) : Note(chn), m_atn(lst[0]), m_dcn(lst[1]), m_ssn(lst[2]), m_rln(lst[3]), m_ctl({{m_atn, 0.01}, {m_dcn, 0.01}, {m_ssn, 0.25},

358

A AuLib Reference

{m_rln, 0.01}}), m_bend(1.), m_env(0., m_ctl[m_atn], m_ctl[m_dcn], m_ctl[m_ssn], m_ctl[m_rln]), m_osc() { m_env.release(); }; }; With this Note-based class defined, we can derive a second one from it, which will simply vary the type of oscillator used (a sawtooth instead of a sine). It reuses most of the code, including all of the control overrides defined in its parent class. Listing A.3: Note-derived class modelling a sawtooth wave synth note. // sawtooth note class SawSyn : public SineSyn { SawOsc m_saw; // DSP override virtual const SawSyn &dsp() { if (!m_env.is_finished()) set(m_saw(m_env(), m_cps * m_bend)); else clear(); return *this; } public: SawSyn(int chn, SineSyn::ctl_list lst) : SineSyn(chn, lst){}; }; The main program in Listing A.4 demonstrates how these Note classes are used by instruments. In lines 19–23, we see two Instrument objects created, using the two different types of note. They are set to respond to channels 0 and 1, which are going to be mapped directly from MIDI channels 0 and 1. Each has 8-note polyphony, and a list of control numbers is passed as extra parameters to the Note objects. These are the control numbers for the envelope parameters of these notes. The next few code lines, 25–28, create a Reverb object, as well as audio output and the MidiIn player. The synthesis code is a single statement: out(reverb(midi.listen(sinsynth, sawsynth), midi.ctlval(-1, 91)));

A.5 Control Classes

359

where the listen method of midi takes two instruments, dispatches any MIDI messages to them, and collects the audio output. This is sent into the reverb object, which processes it, using the MIDI control 91 value as the effect amount. The reverb output is then sent to the audio card. Listing A.4: Main program using the two classes in Listings A.2 and A.3, as well as the Reverb class defined in Listing 19.9. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

// handle ctrl-c static std::atomic_bool running(true); void signal_handler(int signal) { running = false; std::cout << "\nexiting...\n"; } int main(int argc, const char *argv[]) { if(argc < 2) { std::cout << "usage: " << argv[0] << " " << std::endl; } int dev; // control numbers used: 71 - att, 74 - dec, // 84 - sus, 07 - rel // Sinewave Synthesizer - channel 0 (MIDI 1), 8 voices Instrument<SineSyn, SineSyn::ctl_list> sinsynth(8, 0, {{71, 74, 84, 7}}); // Sawtooh Synthesizer - channel 1 (MIDI 2), 8 voices Instrument<SawSyn, SineSyn::ctl_list> sawsynth(8, 1, {{71, 74, 84, 7}}); Reverb reverb(argv[1]); SoundOut out("dac", 1, 128); MidiIn midi; std::signal(SIGINT, signal_handler); std::cout << "Available MIDI inputs:\n"; for (auto &devs : midi.device_list()) std::cout << devs << std::endl; std::cout << "choose a device: "; std::cin >> dev; if (midi.open(dev) == AULIB_NOERROR) { std::cout << "running... (use ctrl-c to close)\n";

360

A AuLib Reference

39 // listen to midi on behalf of sinsynth & sawsynth 40 while (running) 41 out(reverb(midi.listen(sinsynth, sawsynth), 42 midi.ctlval(-1, 91))); 43 } else 44 std::cout << "error opening device...\n"; 45 std::cout << "...finished \n"; 46 return 0; 47 }

A.6 Other Classes A small number of classes are found outside the main AudioBase class: • Score: this class models a basic numeric score, containing a set of events, and is used by ScorePlayer. • Event: a single score event. • Score::Cmd: a score command. • Segment: a curve segment for envelope generators or tables. • TableSet: a set of wave tables used by BlOsc.

A.7 Building AuLib Programs using the AuLib library can be built in a variety of ways. If specific individual classes are used, it is possible to add the relevant implementation source files to the build. If the library is used extensively, it is simpler to link to the pre-compiled library. For this it is necessary to first build the library binary. To get the latest sources, we use git1 . With this installed, we use $ git clone https://github.com/aulib/aulib $ cd aulib The AuLib sources include a CMake2 script that can be used to build and install the library. It requires the cmake program to be installed. On MacOS, this comes as a graphical application, but can also be installed as a command-line program. With this in place, these are the steps to build and install the library: 1. The preferred way to build the library is to do this away from the source tree. For this we can create, at the top-level source directory, a new directory to hold the build, and change to it: 1 2

https://git-scm.org https://cmake.org

A.7 Building AuLib

361

$ mkdir build $ cd build 2. CMake is then run from the build directory, by passing the top-level source directory (..). $ cmake .. The cmake command allows us to define where we want to install the library. The default is in /usr/local, but we can use the option CMAKE_INSTALL_ PREFIX= to change it. For instance ,if we want to install in the user directory, we use $ cmake .. -DCMAKE_INSTALL_PREFIX=$HOME CMake will identify the system and installed toolchain, reporting problems if any components are not installed. Two optional dependencies are Portaudio and Portmidi: if these are installed then support will be provided for realtime audio and MIDI in the library. 3. Once CMake has configured the build successfully, we can call make to create the library. As we have already seen, make is a software build and maintenance utility provided by the OS: $ make 4. To install it, we run $ make install This installs the library, the headers and example programs in the requested location under the ./lib, ./include/AuLib, and ./bin directories. The library can then be used like any other in the system.

References

1. Abelson, H., Sussman, G.J.: Structure and Interpretation of Computer Programs, 2nd edn. MIT Press, Cambridge, MA (1996) 2. Abrahams, D., Gurtovoy, A.: C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ in Depth Series). Addison-Wesley Professional (2004) 3. Apple Inc.: OS X server: Advanced administration (2014). URL http://help.apple.com/ advancedserveradmin/mac/4.0/ 4. Beauchamp, J.: Introduction to MUSIC 4C. School of Music, University of Illinois at UrbanaChampaign (1996) 5. Bencina, R.: Portaudio API, v.19 (2016). URL http://portaudio.com/docs/v19-doxydocs/ 6. Blaauw, G.A., Brooks Jr., F.P.: Computer Architecture: Concepts and Evolution, 1st edn. Addison-Wesley, Boston, MA (1997) 7. Boulanger, R. (ed.): The Csound Book. MIT Press, Cambridge, MA (2000) 8. Bracewell, R.: The Fourier Transform and Its Applications. Electrical Engineering Series. McGraw-Hill, New York (2000) 9. Church, A.: The Calculi of Lambda Conversion. AM-6, Annals of Mathematics Studies. Princeton University Press, Princeton, NJ (1985) 10. Cohen, D.: On holy wars and a plea for peace. IEEE Computer 14(10), 48–54 (1981) 11. Cook, P., Scavone, G.: The Synthesis Toolkit (STK). In: Proceedings of the ICMC 99, vol. III, pp. 164–166. Berlin (1999) 12. Dannenberg, R.: Portmidi API, v.2.2 (2016). URL http://portmedia.sourceforge.net/portmidi/ doxygen/ 13. Dannenberg, R.B., Thompson, N.: Real-time software synthesis on superscalar architectures. Computer Music Journal 21(3), 83–94 (1997). URL http://www.jstor.org/stable/3681016 14. Davis, P.: The JACK audio connection kit (2003). URL http://lac.linuxaudio.org/2003/zkm/ slides/paul davis-jack/title.html 15. Dodge, C., Jerse, T.A.: Computer Music: Synthesis, Composition and Performance, 2nd edn. Schirmer, New York (1997) 16. Dolson, M.: The phase vocoder: A tutorial. Computer Music Journal 10(4), 14–27 (1986). URL http://www.jstor.org/stable/3680093 17. Forsyth, R.: Pascal at Work and Play: An Introduction to Computer Programming in Pascal. Springer (1982) 18. Fourier, J.B.: Th´eorie analytique de la chaleur. Chez Firmin Didot, P`ere et fils, Paris (1822) 19. Gardner, W.G.: Efficient convolution without input-output delay. Journal of the Audio Engineering Society 43(3), 127–136 (1995) 20. Huovilainen, A.: Non-linear digital implementation of the Moog ladder filter. In: Proceedings of the 7th International Conference on Digital Audio Effects (DAFx-04), pp. 61–64. Naples, Italy (2004) 21. IEEE: Standard for floating-point arithmetic. IEEE Std 754-2008 pp. 1–70 (2008) © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0

363

364

References

22. IEEE/Open Group: The Open Group base specifications, issue 7. IEEE Std 1003.1-2008 (2016). URL http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/pthread.h.html 23. ISO/IEC: International standard ISO/IEC 14882:2011: Information technology: Programming language C++. ISO Standards pp. 1–1314 (2011). URL https://www.iso.org/standard/50372. html 24. ISO/IEC: ISO international standard ISO/IEC 9899:2011: Information technology: Programming language C. ISO Standards pp. 1–683 (2011). URL https://www.iso.org/standard/57853. html 25. ISO/IEC: International standard ISO/IEC 14882:2014: Information technology: Programming language C++. ISO Standards (2014). URL https://www.iso.org/standard/64029.html 26. ISO/IEC/IEEE: International standard ISO/IEC 9945:2009: Information technology: Portable operating system interface (POSIX) base specifications, issue 7. ISO Standards pp. 1–3718 (2009). URL https://www.iso.org/standard/50516.html 27. Kernighan, B.W., Pike, R.: The UNIX Programming Environment. Prentice Hall Professional Technical Reference (1984) 28. Kernighan, B.W., Ritchie, D.M.: The C Programming Language, 2nd edn. Prentice Hall Professional Technical Reference (1988) 29. Knuth, D.: The Art of Computer Programming 1: Fundamental Algorithms, 3rd edn. AddisonWesley, Menlo Park, CA (1997) 30. Laakso, T.I., V¨alim¨aki, V., Karjalainen, M., Laine, U.K.: Splitting the unit delay — Tools for fractional delay filter design. IEEE Signal Processing Mag. 13(1), 30–60 (1996) 31. Lazzarini, V.: Time-domain signal processing. In: R. Boulanger, V. Lazzarini (eds.) The Audio Programming Book, pp. 463–512. MIT Press, Cambridge, MA 32. Lazzarini, V.: The SndObj sound object library. Organised Sound 1(5), 35–49 (2000) 33. Lazzarini, V.: Spectral audio programming basics: The DFT, the FFT, and convolution. In: R. Boulanger, V. Lazzarini (eds.) The Audio Programming Book, pp. 521–538. MIT Press, Cambridge, MA (2010) 34. Lazzarini, V.: The development of computer music programming systems. Journal of New Music Research 42(1), 97–110 (2013) 35. Lazzarini, V.: AuLib documentation, v.1.0 beta (2017). URL http://vlazzarini.github.io/aulib/ 36. Lazzarini, V.: Computer Music Instruments: Foundations, Design and Development. Springer (2017) 37. Lazzarini, V.: Supporting an object-oriented approach to unit generator development: The Csound plugin opcode framework. Applied Sciences 7(10) (2017) 38. Lazzarini, V., Accorsi, F.: Designing a sound object library. In: Proceedings of the XVIII Brazilian Computer Society Conference, vol. III, pp. 95–104. Belo Horizonte (1998) 39. Lazzarini, V., ffitch, J., Yi, S., Heintz, J., Brandtsegg, Ø., McCurdy, I.: Csound: A Sound and Music Computing System. Springer Verlag (2016) 40. Lopo, E.C.: Libsndfile API, v.1.0.27 (2016). URL http://www.mega-nerd.com/libsndfile/api. html 41. Mathews, M.: An acoustical compiler for music and psychological stimuli. Bell System Technical Journal 40(3), 553–557 (1961) 42. Mathews, M.: The digital computer as a musical instrument. Science 183(3592), 553–557 (1963) 43. Mathews, M., Miller, J.E.: MUSIC IV Programmer’s Manual. Bell Telephone Laboratories, Murray Hill, N.J. (1964) 44. Mathews, M., Miller, J.E., Moore, F.R., Pierce, J.R.: The Technology of Computer Music. MIT Press, Cambridge, MA 45. McAulay, R., Quatieri, T.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech and Signal Processing 34(4), 744–754 (1986) 46. McElhearn, K.: The Mac OS X Command Line: Unix Under the Hood. Wiley, New York, NY (2004) 47. MIDI Manufacturers Association: MIDI 1.0 specification (1983). URL http://www.midi.org 48. Moore, F.R.: Elements of Computer Music. Prentice-Hall, Inc., Upper Saddle River, NJ (1990)

References

365

49. Nyquist, H.: Certain topics in telegraph transmission theory. Transactions of the AIEE 47, 617–644 (1928) 50. Orlarey, Y., Fober, D., Letz., S.: Faust: An efficient functional approach to DSP programming. In: G. Assayag, A. Gerszo (eds.) New Computational Paradigms for Computer Music, pp. 1–33. Edition Delatour (2009) 51. Pampin, J.: ATS: A system for sound analysis transformation and synthesis based on a sinusoidal plus crtitical-band noise model and psychoacoustics. In: Proceedings of the International Computer Music Conference, pp. 402–405. Miami, FL (2004) 52. Park, T.: An interview with Max Mathews. Computer Music Journal 33(3), 9–22 (2009) 53. Pate, S., Bosch, F.V.D.: UNIX Filesystems: Evolution, Design and Implemenation. Wiley, New York, NY (2003) 54. Puckette, M.: The Theory and Technique of Computer Music. World Scientific Publ., New York (2007) 55. Roads, C., Mathews, M.: Interview with Tongues. Computer Music Journal 4(4), pp. 15–22 (1980) 56. Rocher, M.: Introduction to the theory of Fourier’s series. Annals of Mathematics 7(3), 81–152 (1906) 57. Serra, X., Smith, J.: Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition. Computer Music Journal 14, 12–24 (1990) 58. Shannon, C.E.: Communication in the presence of noise. Proceedings of the Institute of Radio Engineers 37(1), 10–21 (1949) 59. Shotts Jr., W.E.: The Linux Command Line: A Complete Introduction. No Starch Press, San Francisco, CA (2012) 60. Silberschatz, A., Galvin, P.B., Gagne, G.: Operating System Concepts, 8th edn. Wiley, New York, NY (2008) 61. Steiglitz, K.: A Digital Signal Processing Primer, with Applications to Digital Audio and Computer Music. Addison-Wesley Longman, Redwood City, CA (1996) 62. Stroustrup, B.: The C++ Programming Language, 2nd edn. Addison-Wesley (1991) 63. Stroustrup, B.: The C++ Programming Language, 4th edn. Addison-Wesley (2013) 64. Timoney, J., Lazzarini, V., Lysaght, T.: New SndObj classes for sinusoidal modelling. In: Proceedings of the 5th Int. Conference on Digital Audio Effects (DAFx-02), pp. 217–221. University of the Federal Armed Forces, Hamburg, Germany (2002) 65. Vercoe, B.: MUSIC 11 Reference Manual. Studio for Experimental Music, MIT (1981) 66. Widrow, B., Koll´ar, I.: Quantization Noise: Roundoff Error in Digital Computation, Signal Processing, Control, and Communications. Cambridge University Press, Cambridge, UK (2008)

Index

#define, 25, 71 #include, 13, 75 %=, 47 %, 26 &&, 40 &, 8, 34, 58, 90 *=, 47 *, 26, 58 ++, 46 +=, 47 +, 26 --, 47 -=, 47 -framework, 161 -ljack, 145, 151, 178 -lm, 78 -lportaudio, 142 -lportmidi, 168 -lsndfile, 128 -o, 15 -, 26 /=, 47 /, 26 <<, 92 <=, 39 <, 39 ==, 39 =, 24 >=, 39 >>, 92 >, 39 AuLib.h, 347 AuLib::AllPass, 274 AuLib::AudioBase, 253, 255, 256, 348 AuLib::Balance, 255 AuLib::BandP, 254 AuLib::BlOsc, 255, 258 © Springer Nature Switzerland AG 2019 V. Lazzarini, Computer Music Instruments II, https://doi.org/10.1007/978-3-030-13712-0

AuLib::Chn, 259, 262 AuLib::Delay, 255, 259, 262, 268, 273, 276 AuLib::Envel, 259 AuLib::Fir, 281, 311 AuLib::FourierTable, 255, 259 AuLib::FuncTable, 258 AuLib::Instrument, 260, 355 AuLib::LowP, 254 AuLib::MidiIn, 260, 356 AuLib::Note, 260, 354 AuLib::PConv::convolution(), 308 AuLib::PConv, 311 AuLib::Pan, 262 AuLib::Pvoc::transform(), 321 AuLib::ResonZ, 254 AuLib::Rms, 255 AuLib::SampleTable, 311 AuLib::ScorePlayer, 260, 356 AuLib::Score, 260 AuLib::Segments, 259 AuLib::SigBus, 259, 262 AuLib::SoundIn, 259, 261 AuLib::SoundOut, 259, 262 AuLib::Stft::transform(), 317 AuLib::Tapi, 259, 278 AuLib::Tap, 259, 278 AuLib::fft::transform(), 297, 299, 300 AudioGetCurrentHostTime(), 160 CoreFoundation.h, 161 CoreMidi.h, 160 EOF, 34, 50, 80, 103, 106, 110 FILE, 106 HostTime.h, 160 JackProcessCallback(), 148 JackProcessCallback, 150 367

368 MIDIGetDestination(), 161 MIDIPacketListAdd(), 161 MIDIPacketListInit(), 161 MIDISend(), 161 PATH, 15 PaDeviceInfo, 134 PaStreamParameters, 135 Pa_CloseStream(), 139 Pa_GetDefaultInputDev(), 134 Pa_GetDefaultOutputDev(), 134 Pa_GetDeviceCount(), 134 Pa_GetErrorText(), 133 Pa_GetStreamTime(), 137 Pa_Initialize(), 133 Pa_OpenStream(), 135 Pa_ReadStream(), 137 Pa_StartStream(), 136 Pa_StopStream), 139 Pa_StreamCallback(), 137 Pa_Terminate(), 139 Pa_WriteStream(), 137 PmDeviceInfo, 164 Pm_Close(), 166 Pm_CountDeices(), 164 Pm_Event, 169 Pm_GetDeviceInfo(), 164 Pm_Initialize(), 164 Pm_Message(), 166 Pm_OpenInput(), 168 Pm_OpenOutput(), 165 Pm_Poll(), 168 Pm_Read(), 168 Pm_Terminate(), 166 Pm_WriteShort(), 166 Pt_Start(), 164 SF_INFO, 122 SIGINT, 145, 264 ˆ, 90 _Atomic, 176 _Bool, 40 atof(), 82 atoi(), 82 atomic_fetch_add(), 177 atomic_fetch_sub(), 177 atomic_load(), 176 auto, 24, 69, 283 bool, 216 break, 43, 48 calloc(), 96 case, 43 char, 22, 23 class, 224 const char, 64 const, 26, 193

Index continue, 48 csnd::Param<M>, 328 csnd::Plugin, 327 csnd::constr(), 341 cstdint, 227 cstdio, 188 ctl-c, 145, 264 ctl-d, 50, 103 default, 43 delete, 191 do - while, 46 double, 23 else if, 41 else, 41 enum, 89 extern, 74 false, 216 feof(), 109 ferror(), 109 fft.h, 348 fgetc(), 107 fgetpos(), 109 fgets(), 107 float, 23 fopen(), 106 for, 47 fprintf(), 107 fputc(), 107 fputs(), 107 fread(), 108 free(), 97 friend, 224, 230 fscanf(), 107 fseek(), 109 fsetpos(), 109 ftell(), 109 fwrite(), 108 getchar(), 35 if, 40 inline, 72, 207, 209 int, 22 iostream, 229 jack.h, 146 jack_activate(), 148, 174 jack_client_close(), 149, 174 jack_client_open(), 146, 174 jack_connect(), 148, 174 jack_deactivate(), 149, 174 jack_get_sample_rate(), 147 jack_midi_event_get(), 175 jack_midi_event_t, 175 jack_port_get_buffer(), 148 jack_port_register(), 147, 174

Index jack_set_process_callback(), 148, 174 long, 22 main(), 13, 16, 81 malloc(), 96 math.h, 78, 185 memcpy(), 97 memmove(), 97 memset(), 97 namespace, 225 new, 191 open(), 105 operator(), 255 operator+=, 264 operator<<, 229 operator>>, 229 operator[], 244 operator, 228 perror(), 109 portaudio.h, 133 portmidi.h, 164 porttime.h, 164 printf(), 15, 16, 31 private, 223 protected, 224 public, 224 putchar(), 35 puts(), 35 random, 341 read(), 105 realloc(), 96 reinterpret_cast(), 299 remove(), 110 rename(), 110 return, 14, 67, 68 rewind(), 109 scanf(), 34 sf_open(), 122 sf_read_type(), 124 sf_readf_type(), 124 sf_seek(), 125 sf_write_type(), 124 sf_writef_type(), 124 short, 22 signal(), 264 sin(), 78, 185 size_t, 28 sizeof(), 28 sndfile.h, 122 snprintf(), 65 sprintf(), 65 sscanf(), 83 static, 24, 69, 74, 188 std::cerr, 229

369 std::cin, 229 std::complex, 293 std::complex_literals, 293 std::copy, 247 std::cout, 229 std::iota, 262 std::normal_distribution , 341 std::vector, 245, 261 stdatomic.h, 176 stderr, 107 stdint.h, 22 stdin, 49, 107 stdio.h, 13, 14, 105, 110 stdlib.h, 82, 95 stdout, 49, 107 std, 245 strcat(), 64 strcpy(), 64 strdup(), 96 string.h, 64, 97 strlen(), 64 strncat(), 64 strncpy(), 64 strtod(), 82 strtof(), 82 strtol(), 82 struct, 85, 188 switch, 43 template, 243 tmpfile(), 110 true, 216 typedef, 86 typename, 243 ungetc(), 107 union, 89 unistd.h, 105 unsigned, 22 usleep(), 139 va_arg(), 72 va_end(), 72 va_list, 72 va_start(), 72 virtual, 205 void*, 96 void, 68 while, 45 write(), 105 ∼, 90 abstraction, 253 access control, 222 address operator, 34, 58 all pass, 236, 273 ALSA, 133, 145

370 amplitude, 288 analogue-digital converter, 131 AND, 40 bitwise, 90 Android, 4 angle, 290 API, 131 argument, 68 main() function, 81 data structures, 87 translation, 82 variable list, 72 arithmetic operators, 26 order, 28 pointers, 60 array, 55, 96 for loop, 56 declaration, 55 dynamic, 97 index, 56 initialisation, 56, 57 pointers, 60 two-dimensional, 57, 61 ASCII, 25, 33, 52 ASCII character set, 5 assignment, 24 atomic, 170, 176 attack, 222 AudioUnit, 131 balance, 242 band pass, 235 base class, 202 bash, 6 big endian, 21, 121 binary encoding, 20 bit, 20 order, 20 bit reverse, 296 bitwise AND, 90 NOT, 90 OR, 90 shift, 92 XOR, 90 blocking, 137 Boolean expression, 40 branching, 40 buffer, 79, 126, 132 byte, 20 order, 20, 121 C language

Index character set, 12 comments, 15 entry point, 16 function, 13 ISO, 9 keywords, 12 math library, 78 standard library, 14, 77 structurers, 85 C++ language, 182 auto type, 283 closure, 282 iterator, 246 memory allocator, 191 namespaces, 224 placement new, 341 range-based for, 247 references, 210 standard library, 229, 244 structures, 188 template, 243 calc, 36 callback, 77 capture, 283 cc, 15, 36, 128, 142, 151, 161, 168, 178 channels, 118 character, 19, 23 chorus, 277 circular buffer, 176, 265, 266 class, 188, 224 access control, 222 constructor, 189, 191 member function, 188 client-server, 145 comb, 272 command, 6 c++, 188, 195, 209 cat, 7 cc, 10, 128 cd, 6 cp, 7 echo, 7 gnuplot, 51 kill, 8 killall, 8 ls, 7 make, 209 man, 9, 77 mkdir, 7 mv, 7 pipe, 50 ps, 8 pwd, 6 rm, 7

Index rmdir, 7 running in background, 8 standard IO redirection, 49 compiler, 9, 10 compiling, 10, 15 CoreMIDI program, 161 Jack program, 151, 178 libsndfile program, 128 portaudio program, 142 Portmidi program, 168 complex number, 290 complex-to-real FFT, 300 composition, 255 conditional execution, 40 conditional expression, 39 conditional operator, 42 constant, 25 constructor, 189 copy, 212, 245 move, 245 convolution, 280 convolution reverb, 310 copy assignment operator, 229, 245 constructor, 212, 245 Coreaudio, 131, 133 CoreMIDI, 160 counting variable, 46 CPOF, 326 cps, 88, 170 cross synthesis, 320 CRTP, 326 Csound, 93, 111, 325 engine object, 332 data race, 176 data structure, 85 function arguments, 87 function members, 88 member access, 86 member dereference, 88 pointers, 87 data type, 19 cast, 27 defining, 85, 188 effect on operators, 27 size, 28 decay, 222 decrement, 47 delay, 265 Csound opcode, 333 fixed, 267 multitap, 278 program, 260

371 variable, 275 dereference, 58 derived class, 202 destructor, 191, 245 DFT, 290, 293 digital signal, 49, 115 basic operations, 119 normalised range, 125 range, 117 realtime, 131 digital-analogue converter, 80, 131 direct form I, 240 direct form II, 240 Doppler effect, 277 dynamic array, 97 dynamic memory allocation, 95 encapsulation, 253 envelope, 219 environment variable, 7 PATH, 7, 11 exclusive mode, 106 exponential envelope, 221 factorial, 73 fall through, 44 fast convolution, 302 feedback, 236, 272 FFT, 282, 293 data re-order, 295 inverse, 297 real input, 298 FIFO, 176, 266 file library, 11 long listing, 7 plain text, 5 removing, 110 renaming, 110 source code, 10 temporary, 110 types, 5 file system, 4 functions, 110 finite impulse response filter, 236, 280 first-order filter, 236 flanger, 277 floating-point, 19, 22, 25 floor, 192 format string, 31 conversion specifier, 32 formatting codes, 33 length modifier, 33 pattern matching, 35

372 Fourier series, 195 fourth-order filter, 241 frame, 118 free function, 230 frequency, 288 frequency domain, 287 frequency glide, 222 fsig, 337 function arguments, 68 call semantics, 69 callbacks, 77 declaration, 70 definition, 67 inline, 72 optional parameters, 191 overloading, 190 pointers, 75 pointers in data structures, 88 recursion, 73 variable argument list, 72 virtual, 205 function table, 193, 209 gain, 119 gnuplot, 51 HAL, 131 Hermitian spectrum, 292 hexadecimal number, 25, 156 high pass, 235 higher-orders filter, 236 home directory, 4 Hz, 88, 170 IDFT, 290 imaginary part, 290 impulse response, 280 increment, 46 indirection, 58 infinite impulse response filter, 236 inheritance, 202 instantaneous frequency, 186, 320 integer, 19, 22 integration, 185 interleaving, 118 interpolation cubic, 201 linear, 200 interpreter, 10 iOS, 4 Jack Connection Kit, 132, 133, 145, 231 activating clients, 148

Index callbacks, 148 closing clients, 149 connecting ports, 148 MIDI, 174 opening clients, 146 registering ports, 147 sampling frequency, 147 starting a server, 154 kernel, 3 ksig, 329 lambda function, 282 latency, 131, 132, 260 library, 11, 14 libsndfile, 122, 230 major formats, 123 opening files, 122 reading and writing, 124 subtypes, 123 linear envelope, 220, 225 linked list, 99 linker, 10 flags, 128 Linux, 4, 9 literal, 25 little endian, 21, 52, 121 logical expression, 40 loop, 45 range-based, 247 low pass, 235 MacOS, 4, 9 frameworks, 159 macro, 25 arguments, 71 magnitude, 290 method, 188 microsecond, 139, 151 MIDI channel, 158 CoreMIDI, 159 messages, 156 Portmidi, 163 programming, 158 protocol, 155 status byte, 158 MIDI generator, 161, 167 MIDI synthesiser, 169 mixing, 119 modular programming, 74 modulo, 27 namespace, 225

Index negative spectrum, 289, 292 non-blocking, 138 non-realtime safe, 141 note number to Hz, 88, 170 object, 224 octal number, 25 one’s complement, 90 opcode a-rate, 330 argument type, 331 arguments, 328 array processing, 339 building, 343 init-time, 329 k-rate, 329 multithreading, 343 perf-time, 329 registering, 331 spectral, 337 text processing, 336 operating system, 3, 9 OR, 40 bitwise, 90 oscillators, 185, 251 Csound opcode, 335 overlap add, 305, 317 overlap save, 305 overlapped analysis, 315 overloading function, 190 operators, 228 panning, 119, 126 partitioned convolution, 305 mutliple partittions, 307 non-uniform partition size, 311 period, 48 phase, 288 phase offset, 314 phase vocoder, 320 Csound, 337 Csound processing opcode, 338 effects, 322 inverse, 321 phasor, 215 pitch shifter, 277 plot, 50 plotting, 80 pointer, 58, 96 arithmetic, 60 array equivalence, 60 declaration, 58 functions, 75

373 increment, 61 initialisation, 58 syntax, 59 polar, 290 polymorphism, 205 polynomial, 200, 201 inear, 220 Portaudio, 132, 230 asynchronous, 137 closing devices, 139 initialising, 133 listing devices, 134 opening devices, 135 stream time, 137 synchronous operation, 136 Portmidi, 163 closing devices, 166 initialising, 164 input, 168 listing devices, 164 opening devices, 165 polling input, 168 timers, 164 timestamp, 166 writing to output, 166 positive spectrum, 289 POSIX, 9, 343 precedence, 28 preprocessor, 10 printing, 31 process, 8, 13, 14, 138, 145 program compiling, 10, 15 linking, 10 running, 11, 13, 15 pthreads, 343 Pulseaudio, 133 quantising, 116 ramp, 48 reading binary streams, 108 soundfiles, 124 text streams, 107 real number, 22 real part, 290 real-to-complex FFT, 299 realtime preemption, 131 realtime safe, 138, 176, 229, 256 rectangular, 290 recursion, 73 redirection, 49 refactoring, 209, 256

374 reference type, 210 remainder, 27 resonator, 238 reuse, 253 root directory, 5 root mean square, 238, 242 sample, 52, 79, 116 format, 123 precision, 117 sampling, 115 sampling frequency, 77, 116, 123, 147 sampling increment, 186 sawtooth wave, 48, 52 scaling, 119 score generation, 111 second-order filter, 236, 238, 240 seeking, 109, 125 shell, 6 shin, 15 sine wave, 81 sinusoid, 288 sinusoidal components, 287 smearing, 292 SndObj library, 249 soundfile libsndile, 122 raw, 120 self-describing, 121 spectral filter, 320 spectrum, 287 square wave, 53 standard C library, 14, 77, 105 in C++, 188 standard C++ library, 244 standard IO, 13, 16 C++ classes, 229 redirection, 49 streams, 107 state, 186 static members, 188 STFT, 313 STK, 249 stream, 105 binary functions, 108 error reporting, 109 orientation, 105

Index position, 109 text functions, 107 streaming spectral signal analysis, 315 Csound type, 337 manipulation, 320 parameters, 313 resynthesis, 317 string, 14, 23, 26, 35, 57 pointer, 63 zero-terminated, 57 structures, 85, 188 synthesis, 48, 77, 120 synthesiser, 169 table lookup, 192, 215 template, 243 function, 244 terminal, 5 thread, 8, 138, 170, 176, 177 time domain, 265, 287 tobin, 52, 110 todac, 139 tone filter, 236 toolchain, 9 tremolo, 141 twiddle factor, 294 unary *, 58 UNIX, 4, 9 variable, 19 automatic, 24, 69 global, 24 local, 24 read-only, 26 scope, 24 vector, 193 vibrato, 277 whole number, 22 window, 278, 313 writing binary streams, 108, 120 soundfiles, 124 text streams, 107

Related Documents

Music-composer-instruments
November 2019 5
100 Computer Music Tips
October 2019 10
Instruments
October 2019 51
Instruments
June 2020 26
Instruments
May 2020 28

More Documents from ""