Problem with number of files in a folder

lataak

Posts: 43   +0
Hi everyone,
I am working on a simple crawler that downloads web pages and stores it on my computer on local drive. The crawler works fine for the simple task I needed. My problem is, the crawler fails to create files and save the web pages once the number of files in the folder is around 630. The crawler works normally but the files are not created. I think this has something to do with Operating System than the crawler. Disk quota is off on the computer. I run Windows 7 Professional 64 bit version on laptop. Do you see any reason why this happens?
 
More likely that the script processing limit(cpu time) is reached.
Unless you have quotas enabled, the file count is not limited to this smallish number.
a simple crawler that downloads web pages
I find this to be objectionable and may easily violate DRM and Copyright laws.
 
More likely that the script processing limit(cpu time) is reached.
Not actually, the script runs continuously unless I stop it or it downloaded all the pages.
I find this to be objectionable and may easily violate DRM and Copyright laws.
It downloads only pages that are allowed to be crawled by robots protocol. How does it violate DRM and Copyright laws?
 
Robots.txt is the owners control for crawler like google, yahoo, ...
but a crawler acquires the meta info from pages and does not download them to a storage device, potentially to be kept forever.
You violate copyrights for every page or file you download that is marked
(C) Copyright yyyy​
look at the lower right corner of this window

© 2013 TechSpot, Inc. All Rights Reserved
 
but a crawler acquires the meta info from pages and does not download them to a storage device, potentially to be kept forever.
Did I say anywhere I will store it forever?

For info: it is downloaded to create corpus for Natural Language Processing for some East African language.
 
Back