• 1 Post
  • 476 Comments
Joined 2 years ago
cake
Cake day: June 15th, 2023

help-circle










  • Thanks for posting the solution!

    If you happen to be using a BTRFS or XFS file system, you might want to try duperemove. It will help you reclaim usable disk space without deleting any files, by using those filesystems’ built-in support for data deduplication and copy-on-write. In other words, it will make duplicate files point to the same data on disk, but still work as individual files. Files will appear and function exactly the same, and editing one copy will not change another (unlike with hard links, for example). That way it won’t interfere with cases like Flatpak or Python virtual environments where you really need multiple copies of the same files.




  • Generally speaking, xz provides higher compression.

    None of these are well optimized for images. Depending on your image format, you might be better off leaving those files alone or converting them to a more modern format like JPEG-XL. Supposedly JPEG-XL can further compress JPEG files with no additional loss of quality, and it also has an efficient lossless mode.

    Do any of them have the ability to recover from a bit flip or at the very least detect with certainty whether the data is corrupted or not when extracting?

    As far as I know, no common compression algorithms feature built-in error correction, nor does tar. This is something you can do with external tools, instead.

    For validation, you can save a hash of the compressed output. md5 is a bad hashing algorithm but it’s still generally fine (and widely used) for this purpose. SHA256 is much more robust if you are worried about dedicated malicious forgery, and not just random corruption.

    Usually, you’d just put hash files alongside your archive files with appropriate names, so you can manually check them later. Note that this will not provide you with information about which parts of the archive are corrupt, only that it is corrupt.

    For error correction, consider par2. Same idea: you give it a file, and it creates a secondary file that can be used alongside the original for error correction later.

    I also want the files to be extractable with just the Linux/Unix standard binutils

    That is a key advantage of this method. Adding a hash file or par file does not change the basic archive, so you don’t need any special tools to work with it.

    You should also consider your file system and media. Some file systems offer built-in error correction. And some media types are less susceptible to corruption than others, either due to physical durability or to baked-in error correction.






  • I use Koreader on Android (available on F-Droid or Google Play).

    It works. Configuring fonts is a bit confusing — every time I start a new book that uses custom fonts, I need to remind myself how to override it so it uses my prefs. But aside from that, it does what I need. Displaying text is not rocket science, after all.

    I used to like Librera, but I had to ditch it because its memory usage was out of control with very large files. Some of my epubs are hundreds of megabytes (insane, yes, but that’s reality) and Librera would lag for several seconds with every page turn. Android would kill it if I ever switched apps because it used so much memory. I had a great experience with it with “normal” ebooks though. It was just the big 'uns that caused issues.


  • That can’t be good. But I guess it was inevitable. It never seemed like Arc had a sustainable business model.

    It was obvious from the get-go that their ChatGPT integration was a money pit that would eventually need to be monetized, and…I just don’t see end users paying money for it. They’ve been giving it away for free hoping to get people hooked, I guess, but I know what the ChatGPT API costs and it’s never going to be viable. If they built a local-only backend then maybe. I mean, at least then they wouldn’t have costs that scale with usage.

    For Atlassian, though? Maybe. Their enterprise customers are already paying out the nose. Usage-based pricing is a much easier sell. And they’re entrenched deeply enough to enshittify successfully.