Providence Salumu
A value requires a constructor, plus arguments
struct Val {
unsigned long constrno; /* constructor # */
struct Val *args[]; /* flexible array */
};
[Int]
, constrno
might be 0 for []
and 1 for (:)
, where []
has 0-sized args
and (:)
has 2-element args
Int
, constrno
can be the actual integer, with no args
Point
) constrno
not usedargs
Int
s always require chasing a pointerLet's add a level of indirection to describe values
typedef struct Val {
const struct ValInfo *info;
struct Val *args[];
} Val;
struct ValInfo {
struct GCInfo gcInfo; /* for garbage collector */
enum { THUNK, CONSTRNO, FUNC, IND } tag;
union {
Exception *(*thunk) (Val *closure);
unsigned int constrno;
Val *(*func) (const Val *closure, const Val *arg);
};
};
gcInfo
says how many Val *
s are in args
and where they aretag == CONSTRNO
means constrno
valid, used as on last slidetag == IND
means args[0]
is an indirect forwarding pointer to another Val
and union is unused; useful if size of args
growsA Val
whose ValInfo
has tag == FUNC
uses the func
field
Val *(*func) (const Val *closure, const Val *arg);
closure
is the Val
whose ValInfo
contains func
ValInfo
/func
can be re-usedarg
is the function argumentTo apply function f
to argument a
, where both are type Val *
:
f->info->func (f, a);
Top-level bindings don't need closures
addOne :: Int -> Int
addOne x = x + 1
Val
for function addOne
can have zero-length args
Local bindings may need environment values in closure
add :: Int -> (Int -> Int)
add n = \m -> addn m
where addn m = n + m
addn
onceaddn
function (with a different n
) for each invocation of add
addn
instance is a different Val
, but all share same ValInfo
args[0]
in each Val
to specify value of n
A Val
with tag == THUNK
uses the thunk
field in ValInfo
Exception *(*thunk) (Val *closure);
v
(turns it into non-thunk) or returns a non-NULL
Exception *
To evaluate a thunk:
v->info->thunk (v);
args
?
IND
ValInfo
tag--Allocate new Val
, place indirect forwarding pointer in old Val
A possible implementation of forcing that walks IND
pointers:
Exception *force (Val **vp)
{
for (;;) {
if ((*vp)->info->tag == IND)
*vp = (*vp)->arg[0].boxed;
else if ((*vp)->info->tag == THUNK) {
Exception *e = (*vp)->info->thunk (*vp);
if (e)
return e;
}
else
return NULL;
}
}
Set closure->args
to head of list of previously curried args
const3 :: a -> b -> c -> a
const3 a b c = a
ValInfo
s and 3 functions for const3
ValInfo
has func = const3_1
const3_1
creates Val
where arg[0]
is first argument (a
) and info->func = const3_2
const3_2
creates a Val
where arg[0]
is the second argument (b
), arg[1]
is closure
, and info->func
is const3_3
const3_3
has access to all arguments and actually implements const3
Shared arguments have common arg tails, only evaluated once
let f = const3 (superExpensive 5) -- evaluated once
in (f 1 2, f 3 4)
Int
has even more overhead
i->info->tag
then access i->info->constr
ValInfo
structureIdea: Have special unboxed types that don't use struct Val
union Arg {
struct Val *boxed; /* most values are boxed */
unsigned long unboxed; /* "primitive" values */
};
typedef struct Val {
const struct ValInfo *info;
union Arg *args[]; /* args can be boxed or unboxed */
} Val;
Val *
argGCInfo
to identify which args are and are not boxed#
character--must enable with -XMagicHash
optionInt#
) and primitive operations on them (+#
):browse GHC.Prim
" in GHCI2#
, 'a'#
, 2##
(unsigned), 2.0##
Int
really?
Prelude> :set -XMagicHash
Prelude> :m +GHC.Types GHC.Prim
Prelude GHC.Types GHC.Prim> :i Int
data Int = I# Int# -- Defined in GHC.Types
...
Prelude GHC.Types GHC.Prim> case 1 of I# u -> I# (u +# 2#)
3
Int
contain thunk, but avoids pointer dereference once evaluatedCannot instantiate type variables with unboxed types
{-# LANGUAGE MagicHash #-}
import GHC.Prim
data FastPoint = FastPoint Double# Double# -- ok
fp = FastPoint 2.0## 2.0## -- ok
-- Error: can't pass unboxed type to polymorphic function
fp' = FastPoint 2.0## (id 2.0##)
-- Error: can't use unboxed type as type parameter
noInt :: Maybe Int#
noInt = Nothing
Enforced by making unboxed types a different kind of type
Prelude GHC.Types GHC.Prim> :kind Int#
Int# :: #
#
seq
revisitedseq :: a -> b -> b
seq a b
is forced, then first a
is forced, then b
is forced and returnedConsider the following code:
infiniteLoop = infiniteLoop :: Char -- loops forever
seqTest1 = infiniteLoop `seq` "Hello" -- loops forever
seqTest2 = str `seq` length str -- returns 6
where str = infiniteLoop:"Hello"
seqTest1
hangs forever, while seqTest2
happily returns 6seq
only forces a Val
, not the arg
fields of the Val
seqTest2
's seq
forces str
's constructor (:)
, but not the head or tailstr
in Weak Head Normal Form (WHNF)seq
implementationVal *seq_2 (Val *a, Val *b)
{ /* assume seq_1 put first arg in a */
val = gc_malloc (offsetof (Val, args[2]));
val->info = &seq_info;
val->args[0] = a->args[0];
val->args[1] = b->args[0];
return val;
}
struct ValInfo seq_info = {
some_gcinfo, THUNK, .thunk = &seq_thunk
};
Exception *seq_thunk (Void *c)
{
Exception *e = force (&c->args[0]);
if (!e) {
c->info = &ind_info; /* ValInfo with tag IND */
c->args[0] = c->args[1]; /* forward to b */
}
return e;
}
Recall strictness flag on fields in data declarations
data IntWrapper = IntWrapper !Int
Int
has !
before it, meaning it must be strictInt
's ValInfo
cannot have tag
THUNK
or IND
Int
touches only one cache line
data Int = I# Int#
has only one constructortag == CONSTRNO
, so know what's in ValInfo
Int#
is unboxedThus, once IntWrapper
forced, immediately safe to access Int
as
myIntWrapper.arg[0].boxed->arg[0].unboxed
Int
is not just a number
Int
are {0, 1}64 ∪ {⊥}Note 2: !Int
not a first-class type, only valid for data
fields
data SMaybe a = SJust !a | SNothing -- ok, data field
strictAdd :: !Int -> !Int -> !Int -- error
type StrictMaybeInt = Maybe !Int -- error
case
statements revisitedcase
statement pattern matching can force thunks
_
is irrefutablecase
undefined :: a
is Prelude
symbol with value ⊥, handy for testingf ('a':'b':rest) = rest
f _ = "ok"
test1 = f (undefined:[]) -- error
test2 = f ('a':undefined) -- error
test3 = f ('x':undefined) -- "ok" (didn't force tail)
Adding ~
before a pattern makes it irrefutable
three = (\ ~(h:t) -> 3) undefined -- evaluates to 3
newtype
declarationsdata
-- creates a new (boxed) type, adding overhead of a Val
wrappertype
-- creates an alias for an existing type, with no overheadMeters
, Seconds
, Grams
, all implemented by Double
type
would make them all synonymous, facilitating errorsShow
for each, impossible with type
data Meters = Meters Double
-- but will add overheadnewtype
keyword introduces new type with no overhead
data
, but limited to one constructor and one fieldnewtype
semanticsWhat's the semantic difference between these two declarations?
newtype NTInt = NTInt Int deriving (Show)
data SInt = SInt !Int deriving (Show)
newtype
semanticsWhat's the semantic difference between these two declarations?
newtype NTInt = NTInt Int deriving (Show)
data SInt = SInt !Int deriving (Show)
NTInt
constructor is a "fake" compile-time-only construct
newtype
compiles to nothingnewtype NTInt = NTInt Int deriving (Show)
uNTInt = NTInt undefined
testNT = case uNTInt of NTInt _ -> True -- returns True
data SInt = SInt !Int deriving (Show)
uSInt = SInt undefined
testS = case uSInt of SInt _ -> True -- undefined
UNPACK
pragmanewtype
almost always better than data
when it appliesWhat about a multi-field data type?
data TwoInts = TwoInts !Int !Int
CONSTRNO
ValInfo
Int#
s directly into the args
of a TwoInts
Val
?GHC provides an UNPACK
pragma to do just this
data TwoInts = TwoInts {-# UNPACK #-} !Int {-# UNPACK #-} !Int
newtype
, UNPACK
is not always a win
-funbox-strict-fields
flag unpacks all strict fields
Ptr a
represents pointers to type a
Pointers are not typesafe--allow pointer arithmetic and casting
nullPtr :: Ptr a
plusPtr :: Ptr a -> Int -> Ptr b
minusPtr :: Ptr a -> Ptr b -> Int
castPtr :: Ptr a -> Ptr b
Class Storable
provides raw access to memory using Ptr
s
class Storable a where
sizeOf :: a -> Int
alignment :: a -> Int
peek :: Ptr a -> IO a
poke :: Ptr a -> a -> IO ()
...
Bool
, Int
, Char
, Ptr a
, etc.) are Storable
alloca
Easiest way to get a valid Ptr
is alloca
:
alloca :: Storable a => (Ptr a -> IO b) -> IO b
a
Ptr
to the spacealloca
)allocaBytes :: Int -> (Ptr a -> IO b) -> IO b
Foreign
module provides handy with
utility
with :: Storable a => a -> (Ptr a -> IO b) -> IO b
with val f =
alloca $ \ptr -> do
poke ptr val
res <- f ptr
return res
Storable
typesForeign.C
contains wrappers for C types
CInt
, CUInt
, CChar
, CDouble
, CIntPtr
etc.Data.Int
and Data.Word
have all sizes of machine integer
Int8
, Int16
, Int32
, Int64
-- signed integersWord8
, Word16
, Word32
, Word64
-- unsigned integersExample: extract all the bytes from a Storable
object
toBytes :: (Storable a) => a -> [Word8]
toBytes a = unsafePerformIO $
with a $ \pa -> go (castPtr pa) (pa `plusPtr` sizeOf a)
where go p e | p < e = do b <- peek p
bs <- go (p `plusPtr` 1) e
return (b:bs)
| otherwise = return []
unsafePerformIO
might be okay here since toBytes
pureplusPtr
lets us change from Ptr a
to Ptr Word8
malloc
and mallocForeignPtr
Can also allocate longer-lived memory with malloc
malloc :: Storable a => IO (Ptr a)
mallocBytes :: Int -> IO (Ptr a)
free :: Ptr a -> IO ()
realloc :: Storable b => Ptr a -> IO (Ptr b)
reallocBytes :: Ptr a -> Int -> IO (Ptr a)
ForeignPtr
lets you delegate deallocation to garbage collector
mallocForeignPtr :: Storable a => IO (ForeignPtr a)
mallocForeignPtrBytes :: Int -> IO (ForeignPtr a)
ForeignPtr
sForeignPtr
, must convert it to Ptr
ForeignPtr
in scope when using Ptr
?Ptr
within function that keeps reference to ForeignPtr
withForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b
Can also convert Ptr
s to ForeignPtr
s
type FinalizerPtr a = FunPtr (Ptr a -> IO ())
newForeignPtr :: FinalizerPtr a -> Ptr a
-> IO (ForeignPtr a)
newForeignPtr_ :: Ptr a -> IO (ForeignPtr a)
addForeignPtrFinalizer :: FinalizerPtr a -> ForeignPtr a
-> IO ()
FunPtr
-- this is type wrapper for C function pointer
finalizerFree
symbol conveniently provides function pointer for free
ByteString
sString
s obviously not very efficientStrict ByteString
s efficiently manipulate raw bytes
import qualified Data.ByteString as S
import qualified Data.ByteString.Char8 as S8
S.head
, S.tail
, S.length
, S.foldl
, S.cons
(like :
), S.empty
(like []
), S.hPut
(like hPutStr
), S.readFile
S.pack
and S.unpack
translate to/from [Word8]
S8
has same functions as S
, but uses Char
instead of Word8
--means you lose upper bits of Char
(use toString
from utf8-string to avoid loss)Implementation
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8)
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
ByteString
sSame package implements lazy ByteString
s
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Lazy.Char8 as L8
ByteString
modulesS.ByteString
and S8.ByteString
are the same type (re-exported), and similarly for L.ByteString
and L8.ByteString
S.ByteString
and L.ByteString
not same type, but can convert:fromChunks :: [S.ByteString] -> L.ByteString
toChunks :: L.ByteString -> [S.ByteString]
ByteString
implementationLazy ByteString
s are implemented in terms of strict ones
data ByteString = Empty
| Chunk {-# UNPACK #-} !S.ByteString ByteString
Chunk
's first argument (S.ByteString
) never null
ByteString
sByteString
s?
S.ByteString
s, but not copy the data they contain)ByteString
s is cheap, reverse is not (so if a library can work efficiently on lazy ByteString
s, good to expose that functionality)