A Wander through GHC’s New IO library Simon Marlow
The 100-mile view • the API changes: – Unicode • putStr “A légpárnás hajóm tele van angolnákkal” works! (if your editor is set up right…) • locale-encoding by default, except for Handles in binary mode (openBinaryFile, hSetBinaryMode) hSetEncoding :: Handle -> TextEncoding -> IO () • changing the encoding on the fly hGetEncoding :: Handle -> IO (Maybe TextEncoding) data TextEncoding latin1, utf8, utf16, utf32, … :: TextEncoding mkTextEncoding :: String -> IO TextEncoding localeEncoding :: TextEncoding
The 100-mile view (cont.) • Better newline support – teletypes needed both CR+LF to start a new line, and we’ve been paying for it ever since.
hSetNewlineMode :: Handle -> NewlineMode -> IO () data Newline = LF {- “\n” –} | CRLF {- “\r\n” -} nativeNewline :: Newline data NewlineMode = NewlineMode { inputNL :: Newline, outputNL :: Newline } noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF } universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL = nativeNewline } nativeNewlineMode = NewlineMode { inputNL = nativeNewline, outputNL = nativeNewline }
The 10-mile view • Unicode codecs: – built-in codecs for UTF-8, UTF-16(LE,BE), UTF-32(LE-BE). – Other codecs use iconv on Unix systems – Built-in codecs only on Windows (no code pages) • yet…
– The pieces for building a codec are provided…
The 10-mile view • Build your own codec: API in GHC.IO.Encoding data BufferCodec from to state = BufferCodec { encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to) close :: IO () Saving and restoring state is getState :: IO state important since Handles support setState :: state -> IO () buffering, random access, and } changing encodings type TextEncoder state = BufferCodec Char Word8 state type TextDecoder state = BufferCodec Word8 Char state data TextEncoding = forall dstate estate . TextEncoding { mkTextDecoder :: IO (TextDecoder dstate) mkTextEncoder :: IO (TextEncoder estate) }
The 1-mile view Type class providing I/O device operations: close, seek, getSize, …
• Make your own Handles!
Type class providing buffered reading/writing
mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) – why mkFileHandle, => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle
Typeable, in case we need to take the Handle apart again later
not mkHandle?
For error messages
ReadMode/WriteMode/…
IODevice -- | I/O operations required for implementing a 'Handle'. class IODevice a where -- | closes the device. Further operations on the device should -- produce exceptions. close :: a -> IO () Default is for the -- | seek to the specified positing in the data. operation to be seek :: a -> SeekMode -> Integer -> IO () unsupported seek _ _ _ = ioe_unsupportedOperation -- | return the current position in the data. tell :: a -> IO Integer tell _ = ioe_unsupportedOperation -- | returns 'True' if the device is a terminal or console. isTerminal :: a -> IO Bool isTerminal _ = return False … etc …
BufferedIO class BufferedIO dev where newBuffer :: dev -> BufferState -> IO (Buffer Word8) fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) Device gets to allocate the buffer. This allows the device to choose the buffer to point directly at the data in memory, for example. 0-versions are non-blocking, non-0 versions must read or write at least one byte (but may transfer less than the whole buffer)
RawIO -- | A low-level I/O provider where the data is bytes in memory. class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int readBuf
:: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)
readBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) writeBuf
:: RawIO dev => dev -> Buffer Word8 -> IO ()
writeBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)
Example: a memory-mapped Handle • Random-access read/write doesn’t perform very well with ordinary buffered I/O. – Let’s implement a Handle backed by a memory-mapped file – We need to 1. define our device type 2. make it an instance of IODevice and BufferedIO 3. provide a way to create instances
Example: memory-mapped files 1. Define our device type data MemoryMappedFile = MemoryMappedFile { mmap_fd :: FD, mmap_addr :: !(Ptr Word8), mmap_length :: !Int, mmap_ptr :: !(IORef Int) } deriving Typeable
Ordinary file descriptor, provided by GHC.IO.FD Address in memory where our file is mapped, and its length
The current file pointer (Handles have a built-in notion of the “current position” that we have to emulate)
Typeable is one of the requirements for making a Handle
aside: Buffers module GHC.IO.Buffer ( Buffer(..), .. ) where data Buffer e = Buffer { bufRaw :: !(ForeignPtr e), bufState :: BufferState, -- ReadBuffer | WriteBuffer bufSize :: !Int, -- in elements, not bytes bufL :: !Int, -- offset of first item in the buffer bufR :: !Int -- offset of last item + 1 }
Data
bufRa w
b ufL
b ufR
bufSi ze
Example: memory-mapped files 2. (a) make it an instance of BufferedIO instance BufferedIO MemoryMappedFile where newBuffer m state = do fp <- newForeignPtr_ (mmap_addr m) return (emptyBuffer fp (mmap_length m) state) fillReadBuffer m buf = do p <- readIORef (mmap_ptr m) let l = mmap_length m if (p >= l) then do return (0, buf{ bufL=p, bufR=p }) else do writeIORef (mmap_ptr m) l return (l-p, buf{ bufL=p, bufR=l }) flushWriteBuffer m buf = do writeIORef (mmap_ptr m) (bufR buf) return buf{ bufL = bufR buf }
fillReadBuffer returns the entire file!
flush is a no-op: just remember where to read from next
Example: memory-mapped files 2. (b) make it an instance of IODevice instance IODevice MemoryMappedFile where close = IODevice.close . mmap_fd seek m mode val = do let sz = mmap_length m ptr <- readIORef (mmap_ptr m) let off = case mode of AbsoluteSeek -> fromIntegral val RelativeSeek -> ptr + fromIntegral val SeekFromEnd -> sz + fromIntegral val when (off < 0 || off >= sz) $ ioe_seekOutOfRange writeIORef (mmap_ptr m) off tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o) getSize = return . fromIntegral . mmap_length … etc …
Example: memory-mapped files 3. provide a way to create instances mmapFile :: FilePath -> IOMode -> Bool -> IO Handle mmapFile filepath iomode binary = do (fd,_devtype) <- FD.openFile filepath iomode sz <- IODevice.getSize fd addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0 ptr <- newIORef 0 let m = MemoryMappedFile { mmap_fd = fd, mmap_addr = castPtr addr, mmap_length = fromIntegral sz, mmap_ptr = ptr }
Open the file and mmap() it
Call mkFileHandle to build the Handle
let (encoding, newline) | binary = (Nothing, noNewlineTranslation) | otherwise = (Just localeEncoding, nativeNewlineMode) mkFileHandle m filepath iomode encoding newline
Demo… $ ./Setup configure Configuring mmap-handle-0.0... $ ./Setup build Preprocessing library mmap-handle-0.0... Building mmap-handle-0.0... [1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs, dist/build/System/Posix/IO/MMap.o ) Registering mmap-handle-0.0... $ ./Setup register --inplace --user Registering mmap-handle-0.0... $ ghc-pkg list --user /home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d: mmap-handle-0.0
Demo… $ cat test.hs import System.IO import System.Posix.IO.MMap import System.Environment import Data.Char main = do [file,test] <- getArgs h <- if test == "mmap" then mmapFile file ReadWriteMode True else openBinaryFile file ReadWriteMode sequence_ [ do hSeek h SeekFromEnd (-n) c <- hGetChar h hSeek h AbsoluteSeek n hPutChar h c | n <- [ 1..10000] ] hClose h putStrLn "done" $ ghc test.hs --make [1 of 1] Compiling Main Linking test ...
( test.hs, test.o )
Timings… $ time ./test /tmp/words file done 0.24s real 0.14s user 0.10s system 99% ./test /tmp/words file $ time ./test /tmp/words mmap done 0.09s real 0.09s user 0.00s system 99% ./test /tmp/words mmap $ time ./test ./words file # ./ is NFS-mounted done 10.44s real 0.20s user 0.52s system 6% ./test tmp file $ time ./test ./words mmap # ./ is NFS-mounted done 0.10s real 0.09s user 0.00s system 93% ./test tmp mmap
More examples • A Handle that pipes output bytes to a Chan • Handles backed by Win32 HANDLEs • Handle that reads from a Bytestring/text • Handle that reads from text
The -1 mile view • Inside the IO library – The file-descriptor functionality is cleanly separated from the implementation of Handles: • GHC.IO.FD implements file descriptors, with instances of IODevice and BufferedIO • GHC.IO.Handle.FD defines openFile, using FDs as the underlying device • GHC.IO.Handle has nothing to do with FDs
Implementation of Handle Existential: packs up the IODevice, BufferedIO, Typeable dictionaries, and codec state is existentially quantified data Handle__ = forall dev enc_state dec_state . (IODevice dev, BufferedIO dev, Typeable dev) => Handle__ { haDevice :: !dev, haType :: HandleType, -- read/write/append etc. haByteBuffer :: !(IORef (Buffer Word8)), haCharBuffer :: !(IORef (Buffer CharBufElem)), haEncoder :: Maybe (TextEncoder enc_state), haDecoder :: Maybe (TextDecoder dec_state), haCodec :: Maybe TextEncoding, Two buffers: one for haInputNL :: Newline, bytes, one for Chars. haOutputNL :: Newline, .. some other things .. } deriving Typeable
Where to go from here • This is a step in the right direction, but there is still some obvious ugliness – We haven’t changed the external API, only added to it – There should be a binary I/O layer • hPutBuf working on Handles is wrong: binary Handles should have a different type • in a sense, BufferedIO is a binary I/O layer: it is efficient, but inconvenient
– FilePath should be an abstract type. • On Windows, FilePath = String, but on Unix, FilePath = [Word8].
– Should we rethink Handles entirely? • OO-style layers: binary IO, buffering, encoding • Separate read Handles from write Handles? – read/write Handles are a pain