Nokogiri is a hugely powerful XML (and thus HTML) parser for Ruby. I use it for consuming pretty much anything with HTML in it, and even via the excellent FeedZirra for processing feeds.
It also has great XPath support and makes partial attribute matching a breeze.
For example, to find all links to Amazon.com in a HTML document:
You can then iterate through aLinks
to do any additional filtering.
Nokogiri also supports the starts-with
and ends-width
in addition to the contains
above.
There is also an excellent snippet over here on Stackoverflow talking about how to partial match node content
values - so you could easily do a partial match on link anchor text (in the above example).