# String Matching: Naive Algorithm

Some programs that you use everyday need to find occurrences of a string in a text.

Think about your preferred IDE or simply any text-editing program.

When you misspelling a variable name and you want to fix this, you need to find all the part of the text where the name is present and substitute it.

String Matching algorithms are widely used in the field of Bioinformatics for detect patterns in DNA or RNA sequences since they are represented digitally as strings over a well-defined alphabet.

The string matching (or pattern matching) problem is define as follows.

Given a text T and a pattern P, determine all the occurrences of the pattern in the text.

We assume that the text T is an array of length n and the pattern P is an array of m elements.

We say that **pattern P occurs with shifts s in text T** if

**0 ≤ s ≤ n-m,**and**T[s+1 : s+m] = P[1 : m]**

A shift is valid only if pattern P occurs with shift s in text T.

We can say that the string matching problem is the problem of finding all valid shifts with which the pattern P occurs in a text T.

**The Naive Algorithm**

Let’s see now a very simple (and inefficient) algorithm for string matching, the so called **naive string-matching algorithm**.

We use a for loop from 0 to n-m (respectively, the length of the text T and the length of the pattern P) to checks if P is equal to the substring of S taken from s+1 and s+m.

# Pseudocode

**NaiveStringMatcher(T,P)**:

n = length(T)

m = length(P)

**FOR s = 0 to n-m** do:

**IF P[1 : m] == T[s+1 : s+m]**:

print "Pattern occurs with shift " s

The pattern slides over the text and if the pattern is found in the text starting from s (the if statement is true), we print out the shift s.

This is an inefficient algorithm for long text. Why?

Because for every possible n-m+1 shift we have to compare all the character of P with the m character of T[s+1 : s+m].

The **worst case running time** is **Θ((n-m+1)*m)**.

If m is equal to n/2, the running time is **Θ(n²).**

# Golang implementation

Now we see how this naive algorithm can be implemented in golang.

func NaiveStringMatching(text string, pattern string) { n := len(text)

m := len(pattern) for s := 0; s < (n-m)+1; s++ {

if pattern == text[s:s+m] {

fmt.Println("Pattern occurs with shift", s)

}

}

}

**EXAMPLE OF EXECUTION**

**With Text**

Nel mezzo del cammin di nostra vita

mi ritrovai per una selva oscura,

ché la diritta via era smarrita.

Ahi quanto a dir qual era è cosa dura

esta selva selvaggia e aspra e forte

che nel pensier rinova la paura!- Incipit of

Dante Alighieri’s Divina Commedia

And Pattern *“selva”,* the output of the above algorithm is:

`Pattern occurs with shift 56`

Pattern occurs with shift 148

Pattern occurs with shift 154

This means that we can found the string *“selva”* in the text T three time:

- From position 56 to 56+m (where m in this case is 5),
*T[56 : 56 + 5]* - From position 148 to 148+5,
*T[148 : 148 + 5]* - From 154 to 154 + 5,
*T[154 : 154 + 5]*

func main() {

text := "Nel mezzo del cammin di nostra vita mi ritrovai per una selva oscura, ché la diritta via era smarrita. Ahi quanto a dir qual era è cosa dura esta selva selvaggia e aspra e forte che nel pensier rinova la paura!"

pattern := "selva" NaiveStringMatching(text, pattern) fmt.Println("T[56:56+5] -> ", text[56:56+5])

fmt.Println("T[146:148+5] -> ", text[148:148+5])

fmt.Println("T[154:154+5] -> ", text[154:154+5])

}// OUTPUT: go run main.go

Pattern occurs with shift 56

Pattern occurs with shift 148

Pattern occurs with shift 154

T[56:56+5] -> selva

T[146:148+5] -> selva

T[154:154+5] -> selva

# And now?

There are other types of string matching algorithm that works better.

The main defect of the Naive Algorithm is that it ignores the information gained on a value of s when it consider other values of s.

In the next posts, we will see other algorithms, like the Knuth-Morris-Pratt algorithm and another one that uses finite automata.