Developing a flat file search in PHP

Building a search engine from scratch sounds awful. It's not that hard, actually.

Currently, I'm working on a website that has simple file-to-uri routing going on: /public/foo/bar.php will resolve to example.org/foo/bar. When it became clear though that the number of pages would grow significantly above the "I'll just skim trough it"-mark, the need for some kind of search function arose.
Now usually, search algorithms is a topic designers try to avoid, since it seems complex and scary at first.

This time, I thought, to hell with it. Let's do this.

Below you'll find my attempt to make flat files searchable without hassle, keeping these goals in mind:



Note: This is not a copy-paste tutorial but rather a strategy development so you can go ahead and implement your own.

Searching firstly requires two things: A query to search for in a pool of data. The query is rather obviously provided by user input, more on that later. To build our pool of searchable material though, we'll first have to create an index of all files to search, ideally by recursively iterating over a target directory. A great moment to use PHPs RecursiveDirectoryIterator:

$di = new RecursiveDirectoryIterator(
    $searchPath, // the file system path to the directory to search
    RecursiveDirectoryIterator::SKIP_DOTS // skip current directory ( . ) and parent directory ( .. )
)

$files = new RecursiveIteratorIterator($di);

That way, $files will contain a list of SPLFileInfo objects for each element in our target directory, so we can neatly iterate over these again:

foreach ($files as $file) {
    // actions to perform on each file
}

So, assuming the directory we search is full of markdown files, like this:

content/
 ├── footer.md
 ├── header.md
 ├── home.md
 ├── legal.md
 ├── manufaturing-process.md
 ├── products/
 │   ├── wood/
 │   │    ├── 23411-wooden-table.md
 │   │    └── 23412-wooden-chair-inlays.md
 │   ├── 23484-solid-marble-garden-bench.md
 │   └── 23599-cotton-pillow.md
 ├── sales.md
 └── team.md

As you've maybe noticed, there are also files named footer and header, which are partial views, so normally you wouldn't want them to turn up as a search result. Thereforce, we should introduce a list of files to exclude from the search:

if (in_array($file->getFileName(), $excludes)) continue;

Now, let's move on to the actual search. We'll want to look for a term in the pages name as well as the pages content: [Work in Progress]

Comments

Mor

#1
test

Another user with an apparently pretty long, senseless name

#2
test 2