Using grep to scrape web pages

In preperation to scrape a number of web pages, I used grep to make a list of URLs I need to scrape.  The list of URLs was in an RSS file.

grep -P “\<link><\![CDATA\[(.*?)]” hawkeye_stories.xml > hawkeye_stories_links.txt

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
This entry was posted in Linux and tagged , . Bookmark the permalink.

One Response to Using grep to scrape web pages

  1. Pingback: ELMER

Leave a Reply