Looking into downloading a webcomic

Cactus_Head@programming.dev · 21 hours ago

Looking into downloading a webcomic

harmbugler@piefed.social · edit-2 13 hours ago

https://archive.org/details/Strong-Female-Protagonist-webcomic-online

You can download it as a 320MB zip, but maybe best to torrent it and seed for others.

brucethemoose@lemmy.world · edit-2 13 hours ago

The Internet Archive is a treasure.

It’s going to hurt when they annoy the wrong person and get sued out of existance.

Cactus_Head@programming.dev · 13 hours ago

Thanks you ❤️, but my country doesn’t allow any outgoing connections

Harmonics041@feddit.uk · 4 hours ago

What country is this if you are safe to say? I haven’t heard of anything like that before

Jess@lemmy.world · edit-2 10 hours ago

If you’re just after the comics themselves, then look at dosage. It looks to support this web comic.

https://github.com/webcomics/dosage/

gyrfalcon@beehaw.org · 13 hours ago

Seems like you have a good alternative that doesn’t require a script, but for tasks like this I like to recommend the book Automate the Boring Stuff with Python. It’s free to read online, and if I recall correctly has essentially exactly this task as one of the exercises.

shrek_is_love@lemmy.ml · 20 hours ago

If you view the source of the homepage, you’ll see some HTML that starts with this:

<div class="archive-dropdown-wrap">
    <ul class="archive-dropdown">                       
        <li><span class="chapter-label">Issue 1</span><ul><li><a href="https://strongfemaleprotagonist.com/issue-1/page-0/">Cover

That’s the HTML for the drop-down. Although if I were you, I’d look into taking advantage of WordPress’ JSON API, since that website uses WordPress.

For example, here’s a list of the images uploaded to the site in JSON format: https://strongfemaleprotagonist.com/wp-json/wp/v2/media?per_page=100&page=1 (Limited to 100 entries per page)

Cactus_Head@programming.dev · 14 hours ago

I can see the url for each image when opening when the json link(source_url:) and each image is labeled in the correct order as far as i can see but how do i grab the urls?

Maybe look into awk language compile a list of urls than pass them through curl?

shrek_is_love@lemmy.ml · 8 hours ago

jq is sort of like awk but specifically meant for JSON so it should be a lot easier.

eldavi@lemmy.ml · 20 hours ago

i’m presuming that you’ve tried something like curl or wget wrapped in a for loop to iterate through each page to do this and that it didn’t work somehow.

ergonomic_importer@piefed.ca · 19 hours ago

robots.txt would probably put a stop to that

eldavi@lemmy.ml · 16 hours ago

i was asking op if they used a manual approach, which wouldn’t be impacted by something like robots.txt

Cactus_Head@programming.dev · edit-2 14 hours ago

Haven’t used curl or wget,have yet to start using command-line(outside of solving some linux issue or organizing family photos) but open to learning.