Context: my father is a lawyer and therefore has a bajillion pdf files that were digitised, stored in a server. I’ve gotten an idea on how to do OCR in all of them.

But after that, how can I make them easily searchable? (Keep in mind that unfortunately, the directory structure is important information to classify the files, aka you may have a path like clientABC/caseAV1/d.pdf

  • lsjw96kxs@sh.itjust.works
    link
    fedilink
    Français
    arrow-up
    3
    ·
    14 hours ago

    Maybe take a look at paperless-ngx, it will take care of the OCR for you and make it searchable. Just not sure if it will show the path correctly.