|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object uk.ac.kcl.cch.jb.pliny.imageRes.dnd.HtmlConverter
a utility class that processes HTML from a web page for Image Resources by extracting the text (which will appear in a note initially set up on the image page) and coding it into a WIKI-like markup, and locating images suitable as Image Resources and identifying them.
This uses methods provided by org.w3c.tidy
, for which thanks
is hereby given.
Nested Class Summary | |
class |
HtmlConverter.ImageData
|
Constructor Summary | |
HtmlConverter(InputStream in,
URL theURL)
this constructor takes an InputStream that points to the HTML page specified by the given URL, and uses org.w3c.tidy
to create a DOM of the text, which can subsequently be harvested
for either text or list of images. |
Method Summary | |
HtmlConverter.ImageData[] |
getImageData()
fetches information about images that were found on the given HTML page. |
String |
getTextualContents()
takes the DOM representation of the HTML page and converts the text found therein in to a WIKI-markup-like text string. |
String |
getTitle()
gets the text of the HTML title element. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public HtmlConverter(InputStream in, URL theURL)
org.w3c.tidy
to create a DOM of the text, which can subsequently be harvested
for either text or list of images.
in
- an InputStream for the HTML page.theURL
- the URL to the HTML page.Method Detail |
public HtmlConverter.ImageData[] getImageData()
public String getTitle()
public String getTextualContents()
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |