Problem with number of files in a folder

lataak · Sep 23, 2013

Hi everyone,
I am working on a simple crawler that downloads web pages and stores it on my computer on local drive. The crawler works fine for the simple task I needed. My problem is, the crawler fails to create files and save the web pages once the number of files in the folder is around 630. The crawler works normally but the files are not created. I think this has something to do with Operating System than the crawler. Disk quota is off on the computer. I run Windows 7 Professional 64 bit version on laptop. Do you see any reason why this happens?

DelJo63 · Sep 23, 2013

More likely that the script processing limit(cpu time) is reached.
Unless you have quotas enabled, the file count is not limited to this smallish number.

a simple crawler that downloads web pages

I find this to be objectionable and may easily violate DRM and Copyright laws.

lataak · Sep 24, 2013

More likely that the script processing limit(cpu time) is reached.

Not actually, the script runs continuously unless I stop it or it downloaded all the pages.

I find this to be objectionable and may easily violate DRM and Copyright laws.

It downloads only pages that are allowed to be crawled by robots protocol. How does it violate DRM and Copyright laws?

DelJo63 · Sep 24, 2013

Robots.txt is the owners control for crawler like google, yahoo, ...
but a crawler acquires the meta info from pages and does not download them to a storage device, potentially to be kept forever.
You violate copyrights for every page or file you download that is marked

(C) Copyright yyyy

look at the lower right corner of this window

© 2013 TechSpot, Inc. All Rights Reserved

lataak · Sep 24, 2013

but a crawler acquires the meta info from pages and does not download them to a storage device, potentially to be kept forever.

Did I say anywhere I will store it forever?

For info: it is downloaded to create corpus for Natural Language Processing for some East African language.

DelJo63 · Sep 24, 2013

Well that certainly is interesting

lataak · Sep 25, 2013

Thank you!

Problem with number of files in a folder

lataak

Posts: 43 +0

DelJo63

lataak

Posts: 43 +0

DelJo63

lataak

Posts: 43 +0

DelJo63

lataak

Posts: 43 +0

Similar threads

Latest posts