Skip to content

Guide to search terms

Gabriel Vîjială edited this page Sep 4, 2020 · 5 revisions

The hoover search engine supports a rich syntax, borrowed from elasticsearch, to narrow down search results. Here are some examples.

Words

The most simple search, just enter a few words:

offshore company lawyer

Phrase

Several words that must appear together, in order:

"lawyer contract"

Will find sign the lawyer contract with George but not get a contract for the lawyer George.

Fuzzy word matching

Search on similar words, e.g. names that are spelled in several ways:

Gillian~

Will find both Gillian and Jillian

Proximity search

Several words that must appear close together, and the order doesn't matter:

"George Costanza"~3

Will find talk to Costanza P George about the video. The number 3 specifies how far apart the search words can be found.

Exclude terms

Filter out documents that contain a term, use ! or -:

George -Costanza
George !Costanza

Will find George Bush but not George Costanza.

Phone numbers

There are many ways to search for phone numbers, but a nice trick is to filter by country code:

Joe Smith 31*

Will turn up phone numbers from the Netherlands.

Metadata fields

During indexing, several fields are extracted, to be easily searched.

Will return emails sent by [email protected].

  • md5 - md5 checksum
  • sha1 - sha1 checksum
  • path-parts:"/path/to/folder" - the path should be exact and it will find the documents under that path, including folders and containers. Please copy/paste the "Path" variable from the directory's Meta field into the quotes.
  • path-text:"*some part/of_the path//you remember*" - search for the partial path some part/of_the path//you remember. Does not find partial words, so only search for full file names.
  • filename - filename of the document, e.g. invoice.pdf
  • lang - language (not yet available)
  • text - all the text found in the file
  • subject - email subject
  • from - email sender
  • to - email recipients (to, cc, bcc)
  • message-id - unique email identifier
  • in-reply-to - for reply emails, this is the message-id of the original email
  • thread-index - unique identifier for all emails in a thread (not used consistently)
  • references - similar to in-reply-to but contains a chain of message-id values
  • date - modification date for documents and emails
  • date-created - creation date for documents
  • ocr:true - will only select documents with OCR data present

Example: This searches for the text "Johnny Cash" in the OCR scans of .jpg files larger than 200KB.

filename:*.jpg size:>200000 ocr:true Johnny Cash

Date range

Find documents (e.g. emails or PDFs) that were created in a certain time period:

date:[2016-03-01 TO 2016-04-01}

Will return only documents created in March 2016. Square brackets ([, ]) are for closed intervals, while curly brackets ({, }) are for open intervals, so the query above includes documents from March 1 but not from April 1.

Filter by file type

Filter on file type.

filetype:email

Will search in emails only. Available options are email, pdf, doc, xls, ppt (these include the OpenOffice versions), text, html, folder.

Clone this wiki locally