There is a webcomic called strong female protagonist that i want to persevere(in case the website is ever lost) but not sure how.
The image you see above is not a webpage of the site but rather a drop-down like menu. There is a web crawler called WFDownloader(that i am using the window’s exe file inside bottles)that can grab images and can follow links, grab images “N” number of pages down but since this a drop-down menu i am not sure it will work
There also the issue of organizing the images. WFDownloader doesn’t have options for organizing.
What i am thinking about, is somehow translating the html for the drop-down menu into separate xml file based on issues/titles, run a script to download the images, have each image named after its own hyperlink and have each issue in its own folder. Later on i can create a stitch-up version of the each issues.
https://archive.org/details/Strong-Female-Protagonist-webcomic-online
You can download it as a 320MB zip, but maybe best to torrent it and seed for others.
The Internet Archive is a treasure.
It’s going to hurt when they annoy the wrong person and get sued out of existance.
Thanks you ❤️, but my country doesn’t allow any outgoing connections
What country is this if you are safe to say? I haven’t heard of anything like that before
If you’re just after the comics themselves, then look at dosage. It looks to support this web comic.
Seems like you have a good alternative that doesn’t require a script, but for tasks like this I like to recommend the book Automate the Boring Stuff with Python. It’s free to read online, and if I recall correctly has essentially exactly this task as one of the exercises.
If you view the source of the homepage, you’ll see some HTML that starts with this:
<div class="archive-dropdown-wrap"> <ul class="archive-dropdown"> <li><span class="chapter-label">Issue 1</span><ul><li><a href="https://strongfemaleprotagonist.com/issue-1/page-0/">CoverThat’s the HTML for the drop-down. Although if I were you, I’d look into taking advantage of WordPress’ JSON API, since that website uses WordPress.
For example, here’s a list of the images uploaded to the site in JSON format: https://strongfemaleprotagonist.com/wp-json/wp/v2/media?per_page=100&page=1 (Limited to 100 entries per page)
I can see the url for each image when opening when the json link(source_url:) and each image is labeled in the correct order as far as i can see but how do i grab the urls?
Maybe look into awk language compile a list of urls than pass them through curl?
jq is sort of like awk but specifically meant for JSON so it should be a lot easier.
i’m presuming that you’ve tried something like curl or wget wrapped in a for loop to iterate through each page to do this and that it didn’t work somehow.
robots.txt would probably put a stop to that
i was asking op if they used a manual approach, which wouldn’t be impacted by something like robots.txt
Haven’t used curl or wget,have yet to start using command-line(outside of solving some linux issue or organizing family photos) but open to learning.




