Program to automatically save info from the web

Status
Not open for further replies.

NV30

Posts: 242   +0
Is there one? I'd like it to save a web page/image every so often, say once every hour or whenever it's changed if that's possible. Would this be feasible?
 
Under a _real_ OS like Linux/Unix, of course it's possible!

Windows... You could use the Scheduling Agent with wget or sth like that.
 
Tools -> Internet Options -> Temporary Internet Files -> Settings -> Check for newer versions of stored pages

Is this what you're talking about?
 
Thanks for the replies. Vehementi: No not what I meant. Nodsu: The Scheduling Agent with xp does not have this option, it can only run programs at certain times. Is there something you can download?
 
You can run wget say, every 10 minutes, it is smart enough not to download again things that it already has.

If you think Windows scheduler is bad, there are other schedulers out there too..
 
Got what?

Wget? I can sure help you with that..

Oh, and wget does not schedule itself, you still need another program for that..
 
GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without Xsupport, etc.

You download it, fiddle a few settings maybe, open up a console and type in the command line. Program will do the rest.

http://www.gnu.org/software/wget/wget.html
 
Well it's installed and everything, but when I open it, it appears as a DOS windows for two seconds before disappearing. It's then not available by Alt-tabbing or any other method. I am using WinXP Home.
 
Correct, wget is a console program if you run it without parameters, it will complain and exit immediately.

If you want to use it with a scheduling program, you will enter the correct parameters to the scheduler, or even better - write a .bat file.

To run this thing interactively, you will need to run "cmd" or "command" to get a console window first.


A simple example:

You want to check for a new version and download www.fluffykitties.com every 5 minutes.

(Assuming you downloaded the Windows Scheduler from the link I posted above):
Make a new event, name it Wget
Set Application by browsing to wget.exe
Set parameters as "--recursive --level=0 --timestamping www.fluffykitties.com"
Set working dir to whatever
Set schedule to Every hour/selected minutes, every 5 minutes
do save and exit, exit program

You will see Windows Scheduler on your systray

WS will run wget with given parameters every 5 minutes.

By default wget will now recursively follow _all_ links on www.fluffykitties.com and download _everything_ under a folder called "www.fluffykitties.com" in Working dir.

Because web pages usually have links to other servers too, I don't recommend using this example carelessly or you may end up having the whole internet on your HD :D

I posted the documentation link for wget before, you can also run "wget --help" in command prompt to get a short listing of parameters. And of course you can post back and ask anything.

Edited to not automatically parse URLs. Web page www.fluffykitties.com does not exist. Don't make any assumptions on my sexual alignment :p
 
Thanks for the help. Is there a way to just download a certain file? If you put the URL directing to a certain file would it download just that file? Also, can it save information from a form/cgi script?
 
Of course you can get a single file.

I don't quite understand what do you mean by downloading info from a form.

Wget just parses the URL you give it and saves whatever the server serves. Just like any browser but without displaying the thing.
 
Ah I get it now.

No such thing, sorry. I don't think any automated program can simulate click events or such for scripts.

You can complain to the makers of the website and ask them to allow URL-based queries.
 
Status
Not open for further replies.
Back