Removing data from text file except specified URL (data scraping help)

Mister_K

Posts: 2,218   +900
Hi guys, I have a slight problem after my first data scraping session.

Basically I have around 500k of lines and around 50k "www.domain.com/username" urls that are in this text file. However the only part I need is the url and everything else is redundant. The URL is dynamic so it's different username everytime as well as on a random line rather then continuous every 5 lines or so.

Is there a software out there or maybe a notepad++/sublime2 algorythim that takes out all the data or simply extracts the data I want?

Here is what it looks like:

Code:
http://www.youtube.com/watch?v=RiVKDn5kyfo,,,Landon Austin - Armor - Official Music Video Download on iTunes!! - <a href="http://goo.gl/aaIY8E" target="_blank" title="http://goo.gl/aaIY8E" rel="nofollow" dir="ltr" class="yt-uix-redirect-link">http://goo.gl/aaIY8E</a> Download on Amazon!,,
http://www.youtube.com/watch?v=z91KJ2I7j2s,,,Download on iTunes ♪ For more information ☞ smtown hompage : <a href="http://www.smtown.com" target="_blank" title="http://www.smtown.com" rel="nofollow" dir="ltr" class="yt-uix-redirect-link">http://www.smtown.com</a> ☞ smtown EXO-KÂ*...,,
http://www.youtube.com/watch?v=52oJrdLhXlE,,,Watch the official music video by Kerbera for their hit single &quot;Counterpoints For more information on Kerbera check them out onÂ*...,,
http://www.youtube.com/watch?v=B4hGd7EfSwM,,,Just a little video I put together cuz I was bored as hell. Trying to brush up my skills. Video is scenes from the movie 300 and musicÂ*...,,
http://www.youtube.com/watch?v=qOaqiCBum2w,,,*** it All (Honest Final Exam Version) Music Video Like the video=) Facebook:<a href="https://www.domain.com/Leendadproductions" target="_blank" title="https://www.domain.com/Leendadproductions" rel="nofollow" dir="ltr" class="yt-uix-redirect-link">https://www.domain.com/Leendadproductions</a>Â*...,,
http://www.youtube.com/watch?v=AT_WU-6Py1I,,,Music video for the song &quot;Pink Print&quot; by Antillectual taken from the album &quot;Perspectives & Objectives&quot;. Order the album atÂ*...,,
http://www.youtube.com/watch?v=tmDMiUDm4rY,,,From &quot;International&quot; available June 10th 2014 on Sacred Bones Records Directed by Cali Thornhill Dewitt Shot and Edited byÂ*...,,

____

I am using a Youtube scraper to get description from specific videos (music covers) and the descriptions have urls that I require (for instance, let say, Soundcloud). Now the Scrapper pulls in the WHOLE description, however all I need is the ''Soundcloud'' link.
 
Last edited:
Back