Context: my father is a lawyer and therefore has a bajillion pdf files that were digitised, stored in a server. I’ve gotten an idea on how to do OCR in all of them.
But after that, how can I make them easily searchable? (Keep in mind that unfortunately, the directory structure is important information to classify the files, aka you may have a path like clientABC/caseAV1/d.pdf
I’m a fucking dolt that dabbles and picks up the gist of things pretty quick, but I’m not authority on anything, so “grain of salt”:
You’re already familiar with OCR so my naive approach (assuming consistent fields on the documents where you can nab name, case no., form type, blah blah) would be to populate a simple sqlite db with that data and the full paths to the files. But I can write very basic SQL queries, so for your pops you might then need to cobble together some sort of search form. Something for people that don’t learn
SELECT filepath FROM casedata WHERE name LIKE "%Luigi%";
because they had to manually repair their Jellyfin DB one time when a plugin made a bunch of erroneous entries >:|