Providence Salumu
A value requires a constructor, plus arguments
struct Val {
unsigned long constrno; /* constructor # */
struct Val *args[]; /* flexible array */
};
[Int], constrno might be 0 for [] and 1 for (:), where [] has 0-sized args and (:) has 2-element argsInt, constrno can be the actual integer, with no argsPoint) constrno not usedargsInts always require chasing a pointerLet's add a level of indirection to describe values
typedef struct Val {
const struct ValInfo *info;
struct Val *args[];
} Val;
struct ValInfo {
struct GCInfo gcInfo; /* for garbage collector */
enum { THUNK, CONSTRNO, FUNC, IND } tag;
union {
Exception *(*thunk) (Val *closure);
unsigned int constrno;
Val *(*func) (const Val *closure, const Val *arg);
};
};
gcInfo says how many Val *s are in args and where they aretag == CONSTRNO means constrno valid, used as on last slidetag == IND means args[0] is an indirect forwarding pointer to another Val and union is unused; useful if size of args growsA Val whose ValInfo has tag == FUNC uses the func field
Val *(*func) (const Val *closure, const Val *arg);closure is the Val whose ValInfo contains func
ValInfo/func can be re-usedarg is the function argumentTo apply function f to argument a, where both are type Val *:
f->info->func (f, a);Top-level bindings don't need closures
addOne :: Int -> Int
addOne x = x + 1
Val for function addOne can have zero-length argsLocal bindings may need environment values in closure
add :: Int -> (Int -> Int)
add n = \m -> addn m
where addn m = n + m
addn onceaddn function (with a different n) for each invocation of addaddn instance is a different Val, but all share same ValInfoargs[0] in each Val to specify value of nA Val with tag == THUNK uses the thunk field in ValInfo
Exception *(*thunk) (Val *closure);
v (turns it into non-thunk) or returns a non-NULL Exception *To evaluate a thunk:
v->info->thunk (v);args?
IND ValInfo tag--Allocate new Val, place indirect forwarding pointer in old ValA possible implementation of forcing that walks IND pointers:
Exception *force (Val **vp)
{
for (;;) {
if ((*vp)->info->tag == IND)
*vp = (*vp)->arg[0].boxed;
else if ((*vp)->info->tag == THUNK) {
Exception *e = (*vp)->info->thunk (*vp);
if (e)
return e;
}
else
return NULL;
}
}Set closure->args to head of list of previously curried args
const3 :: a -> b -> c -> a
const3 a b c = a
ValInfos and 3 functions for const3ValInfo has func = const3_1const3_1 creates Val where arg[0] is first argument (a) and info->func = const3_2const3_2 creates a Val where arg[0] is the second argument (b), arg[1] is closure, and info->func is const3_3const3_3 has access to all arguments and actually implements const3Shared arguments have common arg tails, only evaluated once
let f = const3 (superExpensive 5) -- evaluated once
in (f 1 2, f 3 4)Int has even more overhead
i->info->tag then access i->info->constrValInfo structureIdea: Have special unboxed types that don't use struct Val
union Arg {
struct Val *boxed; /* most values are boxed */
unsigned long unboxed; /* "primitive" values */
};
typedef struct Val {
const struct ValInfo *info;
union Arg *args[]; /* args can be boxed or unboxed */
} Val;
Val * argGCInfo to identify which args are and are not boxed# character--must enable with -XMagicHash optionInt#) and primitive operations on them (+#):browse GHC.Prim" in GHCI2#, 'a'#, 2## (unsigned), 2.0##Int really?
Prelude> :set -XMagicHash
Prelude> :m +GHC.Types GHC.Prim
Prelude GHC.Types GHC.Prim> :i Int
data Int = I# Int# -- Defined in GHC.Types
...
Prelude GHC.Types GHC.Prim> case 1 of I# u -> I# (u +# 2#)
3
Int contain thunk, but avoids pointer dereference once evaluatedCannot instantiate type variables with unboxed types
{-# LANGUAGE MagicHash #-}
import GHC.Prim
data FastPoint = FastPoint Double# Double# -- ok
fp = FastPoint 2.0## 2.0## -- ok
-- Error: can't pass unboxed type to polymorphic function
fp' = FastPoint 2.0## (id 2.0##)
-- Error: can't use unboxed type as type parameter
noInt :: Maybe Int#
noInt = NothingEnforced by making unboxed types a different kind of type
Prelude GHC.Types GHC.Prim> :kind Int#
Int# :: #
#seq revisitedseq :: a -> b -> b
seq a b is forced, then first a is forced, then b is forced and returnedConsider the following code:
infiniteLoop = infiniteLoop :: Char -- loops forever
seqTest1 = infiniteLoop `seq` "Hello" -- loops forever
seqTest2 = str `seq` length str -- returns 6
where str = infiniteLoop:"Hello"
seqTest1 hangs forever, while seqTest2 happily returns 6seq only forces a Val, not the arg fields of the Val
seqTest2's seq forces str's constructor (:), but not the head or tailstr in Weak Head Normal Form (WHNF)seq implementationVal *seq_2 (Val *a, Val *b)
{ /* assume seq_1 put first arg in a */
val = gc_malloc (offsetof (Val, args[2]));
val->info = &seq_info;
val->args[0] = a->args[0];
val->args[1] = b->args[0];
return val;
}
struct ValInfo seq_info = {
some_gcinfo, THUNK, .thunk = &seq_thunk
};
Exception *seq_thunk (Void *c)
{
Exception *e = force (&c->args[0]);
if (!e) {
c->info = &ind_info; /* ValInfo with tag IND */
c->args[0] = c->args[1]; /* forward to b */
}
return e;
}
Recall strictness flag on fields in data declarations
data IntWrapper = IntWrapper !Int
Int has ! before it, meaning it must be strictInt's ValInfo cannot have tag THUNK or INDInt touches only one cache line
data Int = I# Int# has only one constructortag == CONSTRNO, so know what's in ValInfoInt# is unboxedThus, once IntWrapper forced, immediately safe to access Int as
myIntWrapper.arg[0].boxed->arg[0].unboxedInt is not just a number
Int are {0, 1}64 ∪ {⊥}Note 2: !Int not a first-class type, only valid for data fields
data SMaybe a = SJust !a | SNothing -- ok, data field
strictAdd :: !Int -> !Int -> !Int -- error
type StrictMaybeInt = Maybe !Int -- errorcase statements revisitedcase statement pattern matching can force thunks
_ is irrefutablecase
undefined :: a is Prelude symbol with value ⊥, handy for testingf ('a':'b':rest) = rest
f _ = "ok"
test1 = f (undefined:[]) -- error
test2 = f ('a':undefined) -- error
test3 = f ('x':undefined) -- "ok" (didn't force tail)Adding ~ before a pattern makes it irrefutable
three = (\ ~(h:t) -> 3) undefined -- evaluates to 3newtype declarationsdata -- creates a new (boxed) type, adding overhead of a Val wrappertype -- creates an alias for an existing type, with no overheadMeters, Seconds, Grams, all implemented by Doubletype would make them all synonymous, facilitating errorsShow for each, impossible with typedata Meters = Meters Double -- but will add overheadnewtype keyword introduces new type with no overhead
data, but limited to one constructor and one fieldnewtype semanticsWhat's the semantic difference between these two declarations?
newtype NTInt = NTInt Int deriving (Show)
data SInt = SInt !Int deriving (Show)newtype semanticsWhat's the semantic difference between these two declarations?
newtype NTInt = NTInt Int deriving (Show)
data SInt = SInt !Int deriving (Show)NTInt constructor is a "fake" compile-time-only construct
newtype compiles to nothingnewtype NTInt = NTInt Int deriving (Show)
uNTInt = NTInt undefined
testNT = case uNTInt of NTInt _ -> True -- returns True
data SInt = SInt !Int deriving (Show)
uSInt = SInt undefined
testS = case uSInt of SInt _ -> True -- undefinedUNPACK pragmanewtype almost always better than data when it appliesWhat about a multi-field data type?
data TwoInts = TwoInts !Int !Int
CONSTRNO ValInfoInt#s directly into the args of a TwoInts Val?GHC provides an UNPACK pragma to do just this
data TwoInts = TwoInts {-# UNPACK #-} !Int {-# UNPACK #-} !Intnewtype, UNPACK is not always a win
-funbox-strict-fields flag unpacks all strict fields
Ptr a represents pointers to type a
Pointers are not typesafe--allow pointer arithmetic and casting
nullPtr :: Ptr a
plusPtr :: Ptr a -> Int -> Ptr b
minusPtr :: Ptr a -> Ptr b -> Int
castPtr :: Ptr a -> Ptr bClass Storable provides raw access to memory using Ptrs
class Storable a where
sizeOf :: a -> Int
alignment :: a -> Int
peek :: Ptr a -> IO a
poke :: Ptr a -> a -> IO ()
...
Bool, Int, Char, Ptr a, etc.) are StorableallocaEasiest way to get a valid Ptr is alloca:
alloca :: Storable a => (Ptr a -> IO b) -> IO b
aPtr to the spacealloca)allocaBytes :: Int -> (Ptr a -> IO b) -> IO bForeign module provides handy with utility
with :: Storable a => a -> (Ptr a -> IO b) -> IO b
with val f =
alloca $ \ptr -> do
poke ptr val
res <- f ptr
return resStorable typesForeign.C contains wrappers for C types
CInt, CUInt, CChar, CDouble, CIntPtr etc.Data.Int and Data.Word have all sizes of machine integer
Int8, Int16, Int32, Int64 -- signed integersWord8, Word16, Word32, Word64 -- unsigned integersExample: extract all the bytes from a Storable object
toBytes :: (Storable a) => a -> [Word8]
toBytes a = unsafePerformIO $
with a $ \pa -> go (castPtr pa) (pa `plusPtr` sizeOf a)
where go p e | p < e = do b <- peek p
bs <- go (p `plusPtr` 1) e
return (b:bs)
| otherwise = return []
unsafePerformIO might be okay here since toBytes pureplusPtr lets us change from Ptr a to Ptr Word8malloc and mallocForeignPtrCan also allocate longer-lived memory with malloc
malloc :: Storable a => IO (Ptr a)
mallocBytes :: Int -> IO (Ptr a)
free :: Ptr a -> IO ()
realloc :: Storable b => Ptr a -> IO (Ptr b)
reallocBytes :: Ptr a -> Int -> IO (Ptr a)
ForeignPtr lets you delegate deallocation to garbage collector
mallocForeignPtr :: Storable a => IO (ForeignPtr a)
mallocForeignPtrBytes :: Int -> IO (ForeignPtr a)ForeignPtrsForeignPtr, must convert it to Ptr
ForeignPtr in scope when using Ptr?Ptr within function that keeps reference to ForeignPtrwithForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO bCan also convert Ptrs to ForeignPtrs
type FinalizerPtr a = FunPtr (Ptr a -> IO ())
newForeignPtr :: FinalizerPtr a -> Ptr a
-> IO (ForeignPtr a)
newForeignPtr_ :: Ptr a -> IO (ForeignPtr a)
addForeignPtrFinalizer :: FinalizerPtr a -> ForeignPtr a
-> IO ()
FunPtr -- this is type wrapper for C function pointer
finalizerFree symbol conveniently provides function pointer for freeByteStringsStrings obviously not very efficientStrict ByteStrings efficiently manipulate raw bytes
import qualified Data.ByteString as S
import qualified Data.ByteString.Char8 as S8
S.head, S.tail, S.length, S.foldl, S.cons (like :), S.empty (like []), S.hPut (like hPutStr), S.readFileS.pack and S.unpack translate to/from [Word8]S8 has same functions as S, but uses Char instead of Word8--means you lose upper bits of Char (use toString from utf8-string to avoid loss)Implementation
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8)
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- lengthByteStringsSame package implements lazy ByteStrings
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.Char8 as L8
ByteString modulesS.ByteString and S8.ByteString are the same type (re-exported), and similarly for L.ByteString and L8.ByteStringS.ByteString and L.ByteString not same type, but can convert:fromChunks :: [S.ByteString] -> L.ByteString
toChunks :: L.ByteString -> [S.ByteString]ByteString implementationLazy ByteStrings are implemented in terms of strict ones
data ByteString = Empty
| Chunk {-# UNPACK #-} !S.ByteString ByteString
Chunk's first argument (S.ByteString) never nullByteStringsByteStrings?
S.ByteStrings, but not copy the data they contain)ByteStrings is cheap, reverse is not (so if a library can work efficiently on lazy ByteStrings, good to expose that functionality)