1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
(*** raw ***)
---
layout: post
title:  "Finding anagrams with F# with FSharp.Literate"
description: "Finding all anagrams of a word in a dictionary."
date:   2020-06-25 06:07:00 +0100
categories: 
---

Anagrams

Recently someone asked me how to find all anagrams of a given word contained in a dictionary, i.e. a file containing a large numnber of words, one per line.

While I have an implementation in Python, I thought I'd try it in F#.

The basic way to solve this problem, is to find a means to determine a key/hash for each word and then storing it in a dictionary or map, along with the anagrams that match the given key.

One can use a fancy method to determine the key, e.g. create a prime number hash, based on position of each letter in the alphabet, but for now, it's as easy to sort each word, as all anagrams of a given word, sort to the same string.

This can be done easily in F#, as all the magic happens inside Array.groupBy.
This takes the array of words read from the file and creates a dictionary where the key is the sorted word (all anagrams sort to the same string) and the value is an array of all words with the same key (i.e. anagrams). They dictionary is then converted to an Sequence.

I don't believe F# has it's own sort string function, so you have to write your own.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
open System.IO

let rec listToString l =
    match l with
    | [] -> ""
    | head :: tail -> head.ToString() + listToString tail

let sortString s =
    s |> Seq.sort |> Seq.toList |> listToString

let fileName = @"C:\Users\ioan_\GitHub\furry-palm-tree\words.txt"

fileName
|> File.ReadAllLines
|> Seq.groupBy sortString
|> Seq.sortByDescending (snd >> Seq.length)
|> Seq.take 10
|> Seq.iter (fun (k, vs) -> vs |> String.concat "," |> printfn "%s : %s" k)
namespace System
namespace System.IO
val listToString: l: 'a list -> string
val l: 'a list
val head: 'a
val tail: 'a list
System.Object.ToString() : string
val sortString: s: seq<'a> -> string (requires comparison)
val s: seq<'a> (requires comparison)
module Seq from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val sort: source: seq<'T> -> seq<'T> (requires comparison)
<summary>Yields a sequence ordered by keys.</summary>
<remarks>This function returns a sequence that digests the whole initial sequence as soon as that sequence is iterated. As a result this function should not be used with large or infinite sequences. The function makes no assumption on the ordering of the original sequence and uses a stable sort, that is the original order of equal elements is preserved.</remarks>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="sort-1"><code lang="fsharp"> let input = seq { 8; 4; 3; 1; 6; 1 } Seq.sort input </code> Evaluates to a sequence yielding the same results as <c>seq { 1; 1 3; 4; 6; 8 }</c>. </example>
val toList: source: seq<'T> -> 'T list
<summary>Builds a list from the given collection.</summary>
<param name="source">The input sequence.</param>
<returns>The result list.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="tolist-1"><code lang="fsharp"> let inputs = seq { 1; 2; 5 } inputs |&gt; Seq.toList </code> Evaluates to <c>[ 1; 2; 5 ]</c>. </example>
val fileName: string
type File = static member AppendAllLines: path: string * contents: IEnumerable<string> -> unit + 1 overload static member AppendAllLinesAsync: path: string * contents: IEnumerable<string> * encoding: Encoding * ?cancellationToken: CancellationToken -> Task + 1 overload static member AppendAllText: path: string * contents: string -> unit + 1 overload static member AppendAllTextAsync: path: string * contents: string * encoding: Encoding * ?cancellationToken: CancellationToken -> Task + 1 overload static member AppendText: path: string -> StreamWriter static member Copy: sourceFileName: string * destFileName: string -> unit + 1 overload static member Create: path: string -> FileStream + 2 overloads static member CreateSymbolicLink: path: string * pathToTarget: string -> FileSystemInfo static member CreateText: path: string -> StreamWriter static member Decrypt: path: string -> unit ...
<summary>Provides static methods for the creation, copying, deletion, moving, and opening of a single file, and aids in the creation of <see cref="T:System.IO.FileStream" /> objects.</summary>
File.ReadAllLines(path: string) : string array
File.ReadAllLines(path: string, encoding: System.Text.Encoding) : string array
val groupBy: projection: ('T -> 'Key) -> source: seq<'T> -> seq<'Key * seq<'T>> (requires equality)
<summary>Applies a key-generating function to each element of a sequence and yields a sequence of unique keys. Each unique key contains a sequence of all elements that match to this key.</summary>
<remarks>This function returns a sequence that digests the whole initial sequence as soon as that sequence is iterated. As a result this function should not be used with large or infinite sequences. The function makes no assumption on the ordering of the original sequence.</remarks>
<param name="projection">A function that transforms an element of the sequence into a comparable key.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<example id="group-by-1"><code lang="fsharp"> let inputs = [1; 2; 3; 4; 5] inputs |&gt; Seq.groupBy (fun n -&gt; n % 2) </code> Evaluates to a sequence yielding the same results as <c>seq { (1, seq { 1; 3; 5 }); (0, seq { 2; 4 }) }</c></example>
val sortByDescending: projection: ('T -> 'Key) -> source: seq<'T> -> seq<'T> (requires comparison)
<summary>Applies a key-generating function to each element of a sequence and yield a sequence ordered descending by keys. The keys are compared using generic comparison as implemented by <see cref="M:Microsoft.FSharp.Core.Operators.compare" />.</summary>
<remarks>This function returns a sequence that digests the whole initial sequence as soon as that sequence is iterated. As a result this function should not be used with large or infinite sequences. The function makes no assumption on the ordering of the original sequence. This is a stable sort, that is the original order of equal elements is preserved.</remarks>
<param name="projection">A function to transform items of the input sequence into comparable keys.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="sortbydescending-1"><code lang="fsharp"> let input = ["a"; "bbb"; "cccc"; "dd"] input |&gt; Seq.sortByDescending (fun s -&gt; s.Length) </code> Evaluates to a sequence yielding the same results as <c>seq { "cccc"; "bbb"; "dd"; "a" }</c>. </example>
val snd: tuple: ('T1 * 'T2) -> 'T2
<summary>Return the second element of a tuple, <c>snd (a,b) = b</c>.</summary>
<param name="tuple">The input tuple.</param>
<returns>The second value.</returns>
<example id="snd-example"><code lang="fsharp"> snd ("first", 2) // Evaluates to 2 </code></example>
val length: source: seq<'T> -> int
<summary>Returns the length of the sequence</summary>
<param name="source">The input sequence.</param>
<returns>The length of the sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="item-1"><code lang="fsharp"> let inputs = ["a"; "b"; "c"] inputs |&gt; Seq.length </code> Evaluates to <c>3</c></example>
val take: count: int -> source: seq<'T> -> seq<'T>
<summary>Returns the first N elements of the sequence.</summary>
<remarks>Throws <c>InvalidOperationException</c> if the count exceeds the number of elements in the sequence. <c>Seq.truncate</c> returns as many items as the sequence contains instead of throwing an exception.</remarks>
<param name="count">The number of items to take.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<exception cref="T:System.ArgumentException">Thrown when the input sequence is empty and the count is greater than zero.</exception>
<exception cref="T:System.InvalidOperationException">Thrown when count exceeds the number of elements in the sequence.</exception>
<example id="take-1"><code lang="fsharp"> let inputs = ["a"; "b"; "c"; "d"] inputs |&gt; Seq.take 2 </code> Evaluates to a sequence yielding the same results as <c>["a"; "b"]</c></example>
<example id="take-2"><code lang="fsharp"> let inputs = ["a"; "b"; "c"; "d"] inputs |&gt; Seq.take 6 </code> Throws <c>InvalidOperationException</c>. </example>
<example id="take-3"><code lang="fsharp"> let inputs = ["a"; "b"; "c"; "d"] inputs |&gt; Seq.take 0 </code> Evaluates to a sequence yielding no results. </example>
val iter: action: ('T -> unit) -> source: seq<'T> -> unit
<summary>Applies the given function to each element of the collection.</summary>
<param name="action">A function to apply to each element of the sequence.</param>
<param name="source">The input sequence.</param>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="iter-1"><code lang="fsharp"> ["a"; "b"; "c"] |&gt; Seq.iter (printfn "%s") </code> Evaluates to <c>unit</c> and prints <code> a b c </code> in the console. </example>
val k: string
val vs: seq<string>
module String from Microsoft.FSharp.Core
<summary>Functional programming operators for string processing. Further string operations are available via the member functions on strings and other functionality in <a href="http://msdn2.microsoft.com/en-us/library/system.string.aspx">System.String</a> and <a href="http://msdn2.microsoft.com/library/system.text.regularexpressions.aspx">System.Text.RegularExpressions</a> types. </summary>
<category>Strings and Text</category>
val concat: sep: string -> strings: seq<string> -> string
<summary>Returns a new string made by concatenating the given strings with separator <c>sep</c>, that is <c>a1 + sep + ... + sep + aN</c>.</summary>
<param name="sep">The separator string to be inserted between the strings of the input sequence.</param>
<param name="strings">The sequence of strings to be concatenated.</param>
<returns>A new string consisting of the concatenated strings separated by the separation string.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when <c>strings</c> is null.</exception>
<example id="concat-1"><code lang="fsharp"> let input1 = ["Stefan"; "says:"; "Hello"; "there!"] input1 |&gt; String.concat " " // evaluates "Stefan says: Hello there!" let input2 = [0..9] |&gt; List.map string input2 |&gt; String.concat "" // evaluates "0123456789" input2 |&gt; String.concat ", " // evaluates "0, 1, 2, 3, 4, 5, 6, 7, 8, 9" let input3 = ["No comma"] input3 |&gt; String.concat "," // evaluates "No comma" </code></example>
val printfn: format: Printf.TextWriterFormat<'T> -> 'T
<summary>Print to <c>stdout</c> using the given format, and add a newline.</summary>
<param name="format">The formatter.</param>
<returns>The formatted result.</returns>
<example>See <c>Printf.printfn</c> (link: <see cref="M:Microsoft.FSharp.Core.PrintfModule.PrintFormatLine``1" />) for examples.</example>