HtmlConverter

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.kcl.cch.jb.pliny.imageRes.dnd
Class HtmlConverter

java.lang.Object
  uk.ac.kcl.cch.jb.pliny.imageRes.dnd.HtmlConverter

public class HtmlConverter
extends Object

a utility class that processes HTML from a web page for Image Resources by extracting the text (which will appear in a note initially set up on the image page) and coding it into a WIKI-like markup, and locating images suitable as Image Resources and identifying them.

This uses methods provided by org.w3c.tidy, for which thanks is hereby given.

Author:: John Bradley

Nested Class Summary
`class`	`HtmlConverter.ImageData`

Constructor Summary
`HtmlConverter(InputStream in, URL theURL)` this constructor takes an InputStream that points to the HTML page specified by the given URL, and uses `org.w3c.tidy` to create a DOM of the text, which can subsequently be harvested for either text or list of images.

Method Summary
`HtmlConverter.ImageData[]`	`getImageData()` fetches information about images that were found on the given HTML page.
`String`	`getTextualContents()` takes the DOM representation of the HTML page and converts the text found therein in to a WIKI-markup-like text string.
`String`	`getTitle()` gets the text of the HTML title element.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

HtmlConverter

public HtmlConverter(InputStream in,
                     URL theURL)

this constructor takes an InputStream that points to the HTML page specified by the given URL, and uses org.w3c.tidy to create a DOM of the text, which can subsequently be harvested for either text or list of images.
Parameters:: in - an InputStream for the HTML page.; theURL - the URL to the HTML page.

Method Detail

getImageData

public HtmlConverter.ImageData[] getImageData()

fetches information about images that were found on the given HTML page.

Returns:: an array of ImageData elements.

getTitle

public String getTitle()

gets the text of the HTML title element. If not provided, returns an empty string.

getTextualContents

public String getTextualContents()

takes the DOM representation of the HTML page and converts the text found therein in to a WIKI-markup-like text string.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

uk.ac.kcl.cch.jb.pliny.imageRes.dnd Class HtmlConverter

HtmlConverter

getImageData

getTitle

getTextualContents

uk.ac.kcl.cch.jb.pliny.imageRes.dnd
Class HtmlConverter