Context: my father is a lawyer and therefore has a bajillion pdf files that were digitised, stored in a server. I’ve gotten an idea on how to do OCR in all of them.
But after that, how can I make them easily searchable? (Keep in mind that unfortunately, the directory structure is important information to classify the files, aka you may have a path like clientABC/caseAV1/d.pdf
Search them for words? Try pdfgrep with recursive - very easy to setup and try. If you feel like that’s taking too long, you probably need to accept some simplifications/helper structures.