A Wander Through Ghc's New Io Library

  • Uploaded by: Don Stewart
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View A Wander Through Ghc's New Io Library as PDF for free.

More details

  • Words: 1,690
  • Pages: 22
A Wander through GHC’s New IO library Simon Marlow

The 100-mile view • the API changes: – Unicode • putStr “A légpárnás hajóm tele van angolnákkal” works! (if your editor is set up right…) • locale-encoding by default, except for Handles in binary mode (openBinaryFile, hSetBinaryMode) hSetEncoding :: Handle -> TextEncoding -> IO () • changing the encoding on the fly hGetEncoding :: Handle -> IO (Maybe TextEncoding) data TextEncoding latin1, utf8, utf16, utf32, … :: TextEncoding mkTextEncoding :: String -> IO TextEncoding localeEncoding :: TextEncoding

The 100-mile view (cont.) • Better newline support – teletypes needed both CR+LF to start a new line, and we’ve been paying for it ever since.

hSetNewlineMode :: Handle -> NewlineMode -> IO () data Newline = LF {- “\n” –} | CRLF {- “\r\n” -} nativeNewline :: Newline data NewlineMode = NewlineMode { inputNL :: Newline, outputNL :: Newline } noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF } universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL = nativeNewline } nativeNewlineMode = NewlineMode { inputNL = nativeNewline, outputNL = nativeNewline }

The 10-mile view • Unicode codecs: – built-in codecs for UTF-8, UTF-16(LE,BE), UTF-32(LE-BE). – Other codecs use iconv on Unix systems – Built-in codecs only on Windows (no code pages) • yet…

– The pieces for building a codec are provided…

The 10-mile view • Build your own codec: API in GHC.IO.Encoding data BufferCodec from to state = BufferCodec { encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to) close :: IO () Saving and restoring state is getState :: IO state important since Handles support setState :: state -> IO () buffering, random access, and } changing encodings type TextEncoder state = BufferCodec Char Word8 state type TextDecoder state = BufferCodec Word8 Char state data TextEncoding = forall dstate estate . TextEncoding { mkTextDecoder :: IO (TextDecoder dstate) mkTextEncoder :: IO (TextEncoder estate) }

The 1-mile view Type class providing I/O device operations: close, seek, getSize, …

• Make your own Handles!

Type class providing buffered reading/writing

mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) – why mkFileHandle, => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle

Typeable, in case we need to take the Handle apart again later

not mkHandle?

For error messages

ReadMode/WriteMode/…

IODevice -- | I/O operations required for implementing a 'Handle'. class IODevice a where -- | closes the device. Further operations on the device should -- produce exceptions. close :: a -> IO () Default is for the -- | seek to the specified positing in the data. operation to be seek :: a -> SeekMode -> Integer -> IO () unsupported seek _ _ _ = ioe_unsupportedOperation -- | return the current position in the data. tell :: a -> IO Integer tell _ = ioe_unsupportedOperation -- | returns 'True' if the device is a terminal or console. isTerminal :: a -> IO Bool isTerminal _ = return False … etc …

BufferedIO class BufferedIO dev where newBuffer :: dev -> BufferState -> IO (Buffer Word8) fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) Device gets to allocate the buffer. This allows the device to choose the buffer to point directly at the data in memory, for example. 0-versions are non-blocking, non-0 versions must read or write at least one byte (but may transfer less than the whole buffer)

RawIO -- | A low-level I/O provider where the data is bytes in memory. class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int readBuf

:: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)

readBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) writeBuf

:: RawIO dev => dev -> Buffer Word8 -> IO ()

writeBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)

Example: a memory-mapped Handle • Random-access read/write doesn’t perform very well with ordinary buffered I/O. – Let’s implement a Handle backed by a memory-mapped file – We need to 1. define our device type 2. make it an instance of IODevice and BufferedIO 3. provide a way to create instances

Example: memory-mapped files 1. Define our device type data MemoryMappedFile = MemoryMappedFile { mmap_fd :: FD, mmap_addr :: !(Ptr Word8), mmap_length :: !Int, mmap_ptr :: !(IORef Int) } deriving Typeable

Ordinary file descriptor, provided by GHC.IO.FD Address in memory where our file is mapped, and its length

The current file pointer (Handles have a built-in notion of the “current position” that we have to emulate)

Typeable is one of the requirements for making a Handle

aside: Buffers module GHC.IO.Buffer ( Buffer(..), .. ) where data Buffer e = Buffer { bufRaw :: !(ForeignPtr e), bufState :: BufferState, -- ReadBuffer | WriteBuffer bufSize :: !Int, -- in elements, not bytes bufL :: !Int, -- offset of first item in the buffer bufR :: !Int -- offset of last item + 1 }

Data

bufRa w

b ufL

b ufR

bufSi ze

Example: memory-mapped files 2. (a) make it an instance of BufferedIO instance BufferedIO MemoryMappedFile where newBuffer m state = do fp <- newForeignPtr_ (mmap_addr m) return (emptyBuffer fp (mmap_length m) state) fillReadBuffer m buf = do p <- readIORef (mmap_ptr m) let l = mmap_length m if (p >= l) then do return (0, buf{ bufL=p, bufR=p }) else do writeIORef (mmap_ptr m) l return (l-p, buf{ bufL=p, bufR=l }) flushWriteBuffer m buf = do writeIORef (mmap_ptr m) (bufR buf) return buf{ bufL = bufR buf }

fillReadBuffer returns the entire file!

flush is a no-op: just remember where to read from next

Example: memory-mapped files 2. (b) make it an instance of IODevice instance IODevice MemoryMappedFile where close = IODevice.close . mmap_fd seek m mode val = do let sz = mmap_length m ptr <- readIORef (mmap_ptr m) let off = case mode of AbsoluteSeek -> fromIntegral val RelativeSeek -> ptr + fromIntegral val SeekFromEnd -> sz + fromIntegral val when (off < 0 || off >= sz) $ ioe_seekOutOfRange writeIORef (mmap_ptr m) off tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o) getSize = return . fromIntegral . mmap_length … etc …

Example: memory-mapped files 3. provide a way to create instances mmapFile :: FilePath -> IOMode -> Bool -> IO Handle mmapFile filepath iomode binary = do (fd,_devtype) <- FD.openFile filepath iomode sz <- IODevice.getSize fd addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0 ptr <- newIORef 0 let m = MemoryMappedFile { mmap_fd = fd, mmap_addr = castPtr addr, mmap_length = fromIntegral sz, mmap_ptr = ptr }

Open the file and mmap() it

Call mkFileHandle to build the Handle

let (encoding, newline) | binary = (Nothing, noNewlineTranslation) | otherwise = (Just localeEncoding, nativeNewlineMode) mkFileHandle m filepath iomode encoding newline

Demo… $ ./Setup configure Configuring mmap-handle-0.0... $ ./Setup build Preprocessing library mmap-handle-0.0... Building mmap-handle-0.0... [1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs, dist/build/System/Posix/IO/MMap.o ) Registering mmap-handle-0.0... $ ./Setup register --inplace --user Registering mmap-handle-0.0... $ ghc-pkg list --user /home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d: mmap-handle-0.0

Demo… $ cat test.hs import System.IO import System.Posix.IO.MMap import System.Environment import Data.Char main = do [file,test] <- getArgs h <- if test == "mmap" then mmapFile file ReadWriteMode True else openBinaryFile file ReadWriteMode sequence_ [ do hSeek h SeekFromEnd (-n) c <- hGetChar h hSeek h AbsoluteSeek n hPutChar h c | n <- [ 1..10000] ] hClose h putStrLn "done" $ ghc test.hs --make [1 of 1] Compiling Main Linking test ...

( test.hs, test.o )

Timings… $ time ./test /tmp/words file done 0.24s real 0.14s user 0.10s system 99% ./test /tmp/words file $ time ./test /tmp/words mmap done 0.09s real 0.09s user 0.00s system 99% ./test /tmp/words mmap $ time ./test ./words file # ./ is NFS-mounted done 10.44s real 0.20s user 0.52s system 6% ./test tmp file $ time ./test ./words mmap # ./ is NFS-mounted done 0.10s real 0.09s user 0.00s system 93% ./test tmp mmap

More examples • A Handle that pipes output bytes to a Chan • Handles backed by Win32 HANDLEs • Handle that reads from a Bytestring/text • Handle that reads from text

The -1 mile view • Inside the IO library – The file-descriptor functionality is cleanly separated from the implementation of Handles: • GHC.IO.FD implements file descriptors, with instances of IODevice and BufferedIO • GHC.IO.Handle.FD defines openFile, using FDs as the underlying device • GHC.IO.Handle has nothing to do with FDs

Implementation of Handle Existential: packs up the IODevice, BufferedIO, Typeable dictionaries, and codec state is existentially quantified data Handle__ = forall dev enc_state dec_state . (IODevice dev, BufferedIO dev, Typeable dev) => Handle__ { haDevice :: !dev, haType :: HandleType, -- read/write/append etc. haByteBuffer :: !(IORef (Buffer Word8)), haCharBuffer :: !(IORef (Buffer CharBufElem)), haEncoder :: Maybe (TextEncoder enc_state), haDecoder :: Maybe (TextDecoder dec_state), haCodec :: Maybe TextEncoding, Two buffers: one for haInputNL :: Newline, bytes, one for Chars. haOutputNL :: Newline, .. some other things .. } deriving Typeable

Where to go from here • This is a step in the right direction, but there is still some obvious ugliness – We haven’t changed the external API, only added to it – There should be a binary I/O layer • hPutBuf working on Handles is wrong: binary Handles should have a different type • in a sense, BufferedIO is a binary I/O layer: it is efficient, but inconvenient

– FilePath should be an abstract type. • On Windows, FilePath = String, but on Unix, FilePath = [Word8].

– Should we rethink Handles entirely? • OO-style layers: binary IO, buffering, encoding • Separate read Handles from write Handles? – read/write Handles are a pain

Related Documents

Io
April 2020 29
Io
November 2019 42
Io
November 2019 42
Io
May 2020 27
Wander Pup
May 2020 14

More Documents from ""