Problem with number of files in a folder

By lataak
Sep 23, 2013
Post New Reply
  1. Hi everyone,
    I am working on a simple crawler that downloads web pages and stores it on my computer on local drive. The crawler works fine for the simple task I needed. My problem is, the crawler fails to create files and save the web pages once the number of files in the folder is around 630. The crawler works normally but the files are not created. I think this has something to do with Operating System than the crawler. Disk quota is off on the computer. I run Windows 7 Professional 64 bit version on laptop. Do you see any reason why this happens?
  2. jobeard

    jobeard TS Ambassador Posts: 9,311   +617

    More likely that the script processing limit(cpu time) is reached.
    Unless you have quotas enabled, the file count is not limited to this smallish number.
    I find this to be objectionable and may easily violate DRM and Copyright laws.
  3. lataak

    lataak TS Rookie Topic Starter Posts: 44

    Not actually, the script runs continuously unless I stop it or it downloaded all the pages.
    It downloads only pages that are allowed to be crawled by robots protocol. How does it violate DRM and Copyright laws?
  4. jobeard

    jobeard TS Ambassador Posts: 9,311   +617

    Robots.txt is the owners control for crawler like google, yahoo, ...
    but a crawler acquires the meta info from pages and does not download them to a storage device, potentially to be kept forever.
    You violate copyrights for every page or file you download that is marked
    (C) Copyright yyyy​
    look at the lower right corner of this window

    © 2013 TechSpot, Inc. All Rights Reserved
  5. lataak

    lataak TS Rookie Topic Starter Posts: 44

    Did I say anywhere I will store it forever?

    For info: it is downloaded to create corpus for Natural Language Processing for some East African language.
  6. jobeard

    jobeard TS Ambassador Posts: 9,311   +617

    Well that certainly is interesting :)
  7. lataak

    lataak TS Rookie Topic Starter Posts: 44

    Thank you!

Similar Topics

Add New Comment

You need to be a member to leave a comment. Join thousands of tech enthusiasts and participate.
TechSpot Account You may also...