TechSpot

Removing data from text file except specified URL (data scraping help)

By H3llion
Apr 16, 2014
Post New Reply
  1. Hi guys, I have a slight problem after my first data scraping session.

    Basically I have around 500k of lines and around 50k "www.domain.com/username" urls that are in this text file. However the only part I need is the url and everything else is redundant. The URL is dynamic so it's different username everytime as well as on a random line rather then continuous every 5 lines or so.

    Is there a software out there or maybe a notepad++/sublime2 algorythim that takes out all the data or simply extracts the data I want?

    Here is what it looks like:

    Code:
    http://www.youtube.com/watch?v=RiVKDn5kyfo,,,Landon Austin - Armor - Official Music Video Download on iTunes!! - <a href="http://goo.gl/aaIY8E" target="_blank" title="http://goo.gl/aaIY8E" rel="nofollow" dir="ltr" class="yt-uix-redirect-link">http://goo.gl/aaIY8E</a> Download on Amazon!,,
    http://www.youtube.com/watch?v=z91KJ2I7j2s,,,Download on iTunes ♪ For more information ☞ smtown hompage : <a href="http://www.smtown.com" target="_blank" title="http://www.smtown.com" rel="nofollow" dir="ltr" class="yt-uix-redirect-link">http://www.smtown.com</a> ☞ smtown EXO-KÂ*...,,
    http://www.youtube.com/watch?v=52oJrdLhXlE,,,Watch the official music video by Kerbera for their hit single &quot;Counterpoints For more information on Kerbera check them out onÂ*...,,
    http://www.youtube.com/watch?v=B4hGd7EfSwM,,,Just a little video I put together cuz I was bored as hell. Trying to brush up my skills. Video is scenes from the movie 300 and musicÂ*...,,
    http://www.youtube.com/watch?v=qOaqiCBum2w,,,*** it All (Honest Final Exam Version) Music Video Like the video=) Facebook:<a href="https://www.domain.com/Leendadproductions" target="_blank" title="https://www.domain.com/Leendadproductions" rel="nofollow" dir="ltr" class="yt-uix-redirect-link">https://www.domain.com/Leendadproductions</a>Â*...,,
    http://www.youtube.com/watch?v=AT_WU-6Py1I,,,Music video for the song &quot;Pink Print&quot; by Antillectual taken from the album &quot;Perspectives & Objectives&quot;. Order the album atÂ*...,,
    http://www.youtube.com/watch?v=tmDMiUDm4rY,,,From &quot;International&quot; available June 10th 2014 on Sacred Bones Records Directed by Cali Thornhill Dewitt Shot and Edited byÂ*...,,
    ____

    I am using a Youtube scraper to get description from specific videos (music covers) and the descriptions have urls that I require (for instance, let say, Soundcloud). Now the Scrapper pulls in the WHOLE description, however all I need is the ''Soundcloud'' link.
     
    Last edited: Apr 16, 2014

Similar Topics

Add New Comment

You need to be a member to leave a comment. Join thousands of tech enthusiasts and participate.
TechSpot Account You may also...