Monthly Archives: September 2009
Using grep to scrape web pages
In preperation to scrape a number of web pages, I used grep to make a list of URLs I need to scrape. The list of URLs was in an RSS file.
grep -P “\<link><\![CDATA\[(.*?)]” hawkeye_stories.xml > hawkeye_stories_links.txt