Context: my father is a lawyer and therefore has a bajillion pdf files that were digitised, stored in a server. I’ve gotten an idea on how to do OCR in all of them.

But after that, how can I make them easily searchable? (Keep in mind that unfortunately, the directory structure is important information to classify the files, aka you may have a path like clientABC/caseAV1/d.pdf

    • First_Thunder@lemmy.zipOP
      link
      fedilink
      arrow-up
      4
      ·
      1 day ago

      My problem is paperless is the fact that it doesn’t preserve the directory structure, losing essential info

      • paaviloinen@sopuli.xyz
        link
        fedilink
        arrow-up
        2
        ·
        1 day ago

        If tag/classification based and automated sorting is not the thing the end-user can live with, then Paperless-ngx isn’t the solution, but if you have Nextcloud and you add both the to-be-preserved directory structure and Paperless-ngx’s consume directory as external storage, you can have both with a little manual labour.