Wednesday, May 29th, 2013 09:56 pm
this is a surprising development
So let's say due to reasons, you let your webhost expire and kind of lost your site. And sure, you could go to wayback and get every goddamn page one by one, strip the code out, and--seriously, who does that?
If you need to do that, however, there's an easier way.
Warrick is a Linux based Perl command line program that seems to be doing just that--pulling the site, stripping the wayback code, and fixing the relative and absolute links. From what i can see, it gets html, css, all images so far, and pdfs, but as that's kind of all I had on my site, could pull other extensions. I'm still validating--four thousand files--but so far, it seems to have at minimum gotten the bulk of it, and checking the html, the wayback archive's code has been stripped out.
It's available in iOS in beta, but it's almost painfully easy to use at it's most basic level, so if you have access to a linux system, I recommend this.
The page above has some good instructions, but quick breakdown if you use a linux desktop and don't do command line.
1.) cd directorythatyoudownloadto or any directory, but for convenience, use that one.
2.) sudo wget http://warrick.googlecode.com/files/warrick_v2-2.tar.gz
3.) Decide what directory to unzip this to. I did it to my /usr for lazy reasons.
4.) sudo tar -xvzf warrick_v2-2.tar.gz -C /desired/path
5.) cd /desired/path/warrick2
6.) sudo sh ./INSTALL
7.) Wait a few minutes.
8.) sudo sh ./TEST
9.) Screen does things.
10.) Create a directory to store your website /media/mylostsite (I foolishly did not do this.)
11.) IN THE WARRICK2 DIRECTORY:
sudo warrick2.pl --target-directory=/media/mylostsite http://www.mywebsite.com
For 11, use the original domain, not the wayback extended version.
Notes: This takes a while. Start it before you go to bed.
I'm still validating the files--seriously, four thousand--but this seems to be working pretty well so far.
If anyone tries, tell me how it worked for you!
If you need to do that, however, there's an easier way.
Warrick is a Linux based Perl command line program that seems to be doing just that--pulling the site, stripping the wayback code, and fixing the relative and absolute links. From what i can see, it gets html, css, all images so far, and pdfs, but as that's kind of all I had on my site, could pull other extensions. I'm still validating--four thousand files--but so far, it seems to have at minimum gotten the bulk of it, and checking the html, the wayback archive's code has been stripped out.
It's available in iOS in beta, but it's almost painfully easy to use at it's most basic level, so if you have access to a linux system, I recommend this.
The page above has some good instructions, but quick breakdown if you use a linux desktop and don't do command line.
1.) cd directorythatyoudownloadto or any directory, but for convenience, use that one.
2.) sudo wget http://warrick.googlecode.com/files/warrick_v2-2.tar.gz
3.) Decide what directory to unzip this to. I did it to my /usr for lazy reasons.
4.) sudo tar -xvzf warrick_v2-2.tar.gz -C /desired/path
5.) cd /desired/path/warrick2
6.) sudo sh ./INSTALL
7.) Wait a few minutes.
8.) sudo sh ./TEST
9.) Screen does things.
10.) Create a directory to store your website /media/mylostsite (I foolishly did not do this.)
11.) IN THE WARRICK2 DIRECTORY:
sudo warrick2.pl --target-directory=/media/mylostsite http://www.mywebsite.com
For 11, use the original domain, not the wayback extended version.
Notes: This takes a while. Start it before you go to bed.
I'm still validating the files--seriously, four thousand--but this seems to be working pretty well so far.
If anyone tries, tell me how it worked for you!
no subject
From:(- reply to this
- link
)
no subject
From:(- reply to this
- link
)