In the spirit of Huffman codes, that is, optimal lossless encoding, here is the shortest implementation of Huffman code generation I could come up with in Haskell. The signature is
huffman :: Eq a => [(a, Int)] -> [(a, String)]
where the argument is a list of (element, frequency) pairs. It returns a list of (element, binary encoding).
v1
This requires these imports:
-- Imports
import Data.List (sortBy)
import Data.Ord (comparing)
And here is the main program
data Tree a = Leaf a Int | Node Int (Tree a) (Tree a) deriving (Show)
freq (Leaf _ f) = f
freq (Node f _ _) = f
merge t1 t2 = Node (freq t1 + freq t2) t1 t2
decode t x = (x, decode' "" t)
where
decode' acc (Leaf y _) = if x == y then reverse acc else ""
decode' acc (Node _ l r) = (decode' ('0':acc) l) ++ (decode' ('1':acc) r)
combine [x] = x
combine xs = let (t1:t2:rest) = sortBy (comparing freq) xs in combine ((merge t1 t2):rest)
huffman xs = map ((decode (combine (map (uncurry Leaf) xs))) . fst) xs
This is suboptimal since we are sorting on every iteration, when a priority queue would be sufficient. I just didn’t want to use external libraries.
If we completely forget readability, and get rid of generic type a:
huffman :: [(Char, Int)] -> [(Char, String)]
287 bytes:
data T=L Char Int|N Int T T
f(L _ w)=w;f(N w _ _)=w
m t1 t2=N(f t1+f t2)t1 t2
d t x=(x,s"" t)where s a(L y _)=if x==y then reverse a else"";s a(N _ l r)=s('0':a)l++s('1':a)r
c[x]=x;c x=let(t1:t2:r)=sortBy(\a b->f a`compare`f b)x in c(m t1 t2:r)
huffman x=map((d(c(map(uncurry L)x))).fst)x
v2
data T=L Char|N T T
huffman x=[(c,e t c"")|(c,_)<-x]where
t=f$foldr ins[][(n,L c)|(c,n)<-x]
f[(n,t)]=t
f((n1,t1):(n2,t2):l)=f$ins(n1+n2,N t1 t2)l
ins n[]=[n]
ins n l@(n':ls)|fst n<=fst n'=n:l|1>0=n':ins n ls
e(L c)c' s=if c==c' then s else""
e(N l r)c s=e l c(s++"0")++e r c(s++"1")