1:
2:
3:
4:
5:
6:
7:
8:
|
(*** raw ***)
---
layout: post
title: "Finding anagrams with F# with FSharp.Literate"
description: "Finding all anagrams of a word in a dictionary."
date: 2020-06-25 06:07:00 +0100
categories:
---
|
Anagrams
Recently someone asked me how to find all anagrams of a given word contained in a dictionary,
i.e. a file containing a large numnber of words, one per line.
While I have an implementation in Python, I thought I'd try it in F#.
The basic way to solve this problem, is to find a means to determine a key/hash for each
word and then storing it in a dictionary or map, along with the anagrams that match the
given key.
One can use a fancy method to determine the key, e.g. create a prime number hash, based on
position of each letter in the alphabet, but for now, it's as easy to sort each word, as all
anagrams of a given word, sort to the same string.
This can be done easily in F#, as all the magic happens inside Array.groupBy.
This takes the array of words read from the file and creates a dictionary where the key is
the sorted word (all anagrams sort to the same string) and the value is an array of all
words with the same key (i.e. anagrams). They dictionary is then converted to an Sequence.
I don't believe F# has it's own sort string function, so you have to write your own.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
|
open System.IO
let rec listToString l =
match l with
| [] -> ""
| head :: tail -> head.ToString() + listToString tail
let sortString s =
s |> Seq.sort |> Seq.toList |> listToString
let fileName = @"C:\Users\ioan_\GitHub\furry-palm-tree\words.txt"
fileName
|> File.ReadAllLines
|> Seq.groupBy sortString
|> Seq.sortByDescending (snd >> Seq.length)
|> Seq.take 10
|> Seq.iter (fun (k, vs) -> vs |> String.concat "," |> printfn "%s : %s" k)
|
namespace System
namespace System.IO
val listToString: l: 'a list -> string
val l: 'a list
val head: 'a
val tail: 'a list
System.Object.ToString() : string
val sortString: s: seq<'a> -> string (requires comparison)
val s: seq<'a> (requires comparison)
module Seq
from Microsoft.FSharp.Collections
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val sort: source: seq<'T> -> seq<'T> (requires comparison)
<summary>Yields a sequence ordered by keys.</summary>
<remarks>This function returns a sequence that digests the whole initial sequence as soon as
that sequence is iterated. As a result this function should not be used with
large or infinite sequences.
The function makes no assumption on the ordering of the original
sequence and uses a stable sort, that is the original order of equal elements is preserved.</remarks>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="sort-1"><code lang="fsharp">
let input = seq { 8; 4; 3; 1; 6; 1 }
Seq.sort input
</code>
Evaluates to a sequence yielding the same results as <c>seq { 1; 1 3; 4; 6; 8 }</c>.
</example>
val toList: source: seq<'T> -> 'T list
<summary>Builds a list from the given collection.</summary>
<param name="source">The input sequence.</param>
<returns>The result list.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="tolist-1"><code lang="fsharp">
let inputs = seq { 1; 2; 5 }
inputs |> Seq.toList
</code>
Evaluates to <c>[ 1; 2; 5 ]</c>.
</example>
val fileName: string
type File =
static member AppendAllLines: path: string * contents: IEnumerable<string> -> unit + 1 overload
static member AppendAllLinesAsync: path: string * contents: IEnumerable<string> * encoding: Encoding * ?cancellationToken: CancellationToken -> Task + 1 overload
static member AppendAllText: path: string * contents: string -> unit + 1 overload
static member AppendAllTextAsync: path: string * contents: string * encoding: Encoding * ?cancellationToken: CancellationToken -> Task + 1 overload
static member AppendText: path: string -> StreamWriter
static member Copy: sourceFileName: string * destFileName: string -> unit + 1 overload
static member Create: path: string -> FileStream + 2 overloads
static member CreateSymbolicLink: path: string * pathToTarget: string -> FileSystemInfo
static member CreateText: path: string -> StreamWriter
static member Decrypt: path: string -> unit
...
<summary>Provides static methods for the creation, copying, deletion, moving, and opening of a single file, and aids in the creation of <see cref="T:System.IO.FileStream" /> objects.</summary>
File.ReadAllLines(path: string) : string array
File.ReadAllLines(path: string, encoding: System.Text.Encoding) : string array
val groupBy: projection: ('T -> 'Key) -> source: seq<'T> -> seq<'Key * seq<'T>> (requires equality)
<summary>Applies a key-generating function to each element of a sequence and yields a sequence of
unique keys. Each unique key contains a sequence of all elements that match
to this key.</summary>
<remarks>This function returns a sequence that digests the whole initial sequence as soon as
that sequence is iterated. As a result this function should not be used with
large or infinite sequences. The function makes no assumption on the ordering of the original
sequence.</remarks>
<param name="projection">A function that transforms an element of the sequence into a comparable key.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<example id="group-by-1"><code lang="fsharp">
let inputs = [1; 2; 3; 4; 5]
inputs |> Seq.groupBy (fun n -> n % 2)
</code>
Evaluates to a sequence yielding the same results as <c>seq { (1, seq { 1; 3; 5 }); (0, seq { 2; 4 }) }</c></example>
val sortByDescending: projection: ('T -> 'Key) -> source: seq<'T> -> seq<'T> (requires comparison)
<summary>Applies a key-generating function to each element of a sequence and yield a sequence ordered
descending by keys. The keys are compared using generic comparison as implemented by <see cref="M:Microsoft.FSharp.Core.Operators.compare" />.</summary>
<remarks>This function returns a sequence that digests the whole initial sequence as soon as
that sequence is iterated. As a result this function should not be used with
large or infinite sequences. The function makes no assumption on the ordering of the original
sequence.
This is a stable sort, that is the original order of equal elements is preserved.</remarks>
<param name="projection">A function to transform items of the input sequence into comparable keys.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="sortbydescending-1"><code lang="fsharp">
let input = ["a"; "bbb"; "cccc"; "dd"]
input |> Seq.sortByDescending (fun s -> s.Length)
</code>
Evaluates to a sequence yielding the same results as <c>seq { "cccc"; "bbb"; "dd"; "a" }</c>.
</example>
val snd: tuple: ('T1 * 'T2) -> 'T2
<summary>Return the second element of a tuple, <c>snd (a,b) = b</c>.</summary>
<param name="tuple">The input tuple.</param>
<returns>The second value.</returns>
<example id="snd-example"><code lang="fsharp">
snd ("first", 2) // Evaluates to 2
</code></example>
val length: source: seq<'T> -> int
<summary>Returns the length of the sequence</summary>
<param name="source">The input sequence.</param>
<returns>The length of the sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="item-1"><code lang="fsharp">
let inputs = ["a"; "b"; "c"]
inputs |> Seq.length
</code>
Evaluates to <c>3</c></example>
val take: count: int -> source: seq<'T> -> seq<'T>
<summary>Returns the first N elements of the sequence.</summary>
<remarks>Throws <c>InvalidOperationException</c>
if the count exceeds the number of elements in the sequence. <c>Seq.truncate</c>
returns as many items as the sequence contains instead of throwing an exception.</remarks>
<param name="count">The number of items to take.</param>
<param name="source">The input sequence.</param>
<returns>The result sequence.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<exception cref="T:System.ArgumentException">Thrown when the input sequence is empty and the count is greater than zero.</exception>
<exception cref="T:System.InvalidOperationException">Thrown when count exceeds the number of elements
in the sequence.</exception>
<example id="take-1"><code lang="fsharp">
let inputs = ["a"; "b"; "c"; "d"]
inputs |> Seq.take 2
</code>
Evaluates to a sequence yielding the same results as <c>["a"; "b"]</c></example>
<example id="take-2"><code lang="fsharp">
let inputs = ["a"; "b"; "c"; "d"]
inputs |> Seq.take 6
</code>
Throws <c>InvalidOperationException</c>.
</example>
<example id="take-3"><code lang="fsharp">
let inputs = ["a"; "b"; "c"; "d"]
inputs |> Seq.take 0
</code>
Evaluates to a sequence yielding no results.
</example>
val iter: action: ('T -> unit) -> source: seq<'T> -> unit
<summary>Applies the given function to each element of the collection.</summary>
<param name="action">A function to apply to each element of the sequence.</param>
<param name="source">The input sequence.</param>
<exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
<example id="iter-1"><code lang="fsharp">
["a"; "b"; "c"] |> Seq.iter (printfn "%s")
</code>
Evaluates to <c>unit</c> and prints
<code>
a
b
c
</code>
in the console.
</example>
val k: string
val vs: seq<string>
module String
from Microsoft.FSharp.Core
<summary>Functional programming operators for string processing. Further string operations
are available via the member functions on strings and other functionality in
<a href="http://msdn2.microsoft.com/en-us/library/system.string.aspx">System.String</a>
and <a href="http://msdn2.microsoft.com/library/system.text.regularexpressions.aspx">System.Text.RegularExpressions</a> types.
</summary>
<category>Strings and Text</category>
val concat: sep: string -> strings: seq<string> -> string
<summary>Returns a new string made by concatenating the given strings
with separator <c>sep</c>, that is <c>a1 + sep + ... + sep + aN</c>.</summary>
<param name="sep">The separator string to be inserted between the strings
of the input sequence.</param>
<param name="strings">The sequence of strings to be concatenated.</param>
<returns>A new string consisting of the concatenated strings separated by
the separation string.</returns>
<exception cref="T:System.ArgumentNullException">Thrown when <c>strings</c> is null.</exception>
<example id="concat-1"><code lang="fsharp">
let input1 = ["Stefan"; "says:"; "Hello"; "there!"]
input1 |> String.concat " " // evaluates "Stefan says: Hello there!"
let input2 = [0..9] |> List.map string
input2 |> String.concat "" // evaluates "0123456789"
input2 |> String.concat ", " // evaluates "0, 1, 2, 3, 4, 5, 6, 7, 8, 9"
let input3 = ["No comma"]
input3 |> String.concat "," // evaluates "No comma"
</code></example>
val printfn: format: Printf.TextWriterFormat<'T> -> 'T
<summary>Print to <c>stdout</c> using the given format, and add a newline.</summary>
<param name="format">The formatter.</param>
<returns>The formatted result.</returns>
<example>See <c>Printf.printfn</c> (link: <see cref="M:Microsoft.FSharp.Core.PrintfModule.PrintFormatLine``1" />) for examples.</example>