a utility class that processes HTML from a web page for Image Resources by
extracting the text (which will appear in a note initially set up
on the image page) and coding it into a WIKI-like markup, and
locating images suitable as Image Resources and identifying them.
this constructor takes an InputStream that points to the HTML
page specified by the given URL, and uses org.w3c.tidy
to create a DOM of the text, which can subsequently be harvested
for either text or list of images.