Henry James
Concordance to
etexts on this web-site
Introduction
by Adrian Dover
This concordance indexes
all the etexts currently available
on my web-site. It shows, for each word, the total number of occurrences in
these texts and then the texts involved (by letter codes) with the separate
number of occurrences in each. The codes are arranged in chronological order
of publication by James, to help track words appearing and disappearing over
time.
So that you dont have to wait for ever on a slow Internet connection,
the concordance is divided into pages by the words initial letters
(as shown on the menu, left) and all except the most infrequent letters are
further subdivided into pages for chunks of about 100-150 words. Even so some
of the pages are up to 300K in size so you may have to be patient on a slow
link or when the server is busy!
Some of the most frequent English words are excluded from the concordance
through use of a
stop-word list,
as even their numerical distribution is deemed to be of little practical
interest.
There is a link to each text from its code letters. Please note however
that, because it would make the text files themselves so huge, it
has not been possible to link directly to an occurrence of a word in the
text. So, when you have followed a link to a text page, you should use
your browsers text search facility to find the actual word
(probably Find or Find in frame on the
Edit menu or try Ctrl-F).
Make sure your cursor is in the text frame before you search and remember that
this search will probably not be able to match special characters
such as accented letters and curved apostrophes.
I have also to point out now (November 2003) that two of the texts follow their
source New York editions quirk of inserting a space into
many contractions, such as would nt. For indexing in
the concordance these spaces are stripped out so that for
example the texts would nt will be counted under
the concordance heading wouldnt.
To get back to the concordance after you have checked out an occurrence, click
the back button on your browser: some browsers require you to do so
twice (once for the text and once for the menu) but sensible
ones, like Opera, realise that you loaded three frames with one click and
back them all with one click. Use the concordance option
on the texts menu to re-load the concordance menu and this introduction.
As an alternative to checking the contexts with the links here, you
could, having identified the text(s) you are interested in,
download the relevant ASCII file(s) and work with
it/them (one file per title). Another benefit of that
method will be that you can search for the special characters by using
their ASCII character representations explained on my
editorial page
covering downloadable texts.
It will also enable you to compile a concordance to a single text or a small set
of texts, and to do more complex text analysis, such as proximity searching. A
suitable program
for these tasks has been written by a colleague of mine here at the University
of Birmingham, Alan Reed, who also has an excellent set of
links about text processing
on his site. Other software is available try a web search for
concordance software.
Within my concordance, there are separate pages for numbers (in figures)
appearing in the text
and also for Jamess fictional names. If you are interested in the latter
particularly, dont forget that you can find further information
in published encyclopedias of Jamess fiction or characters, such
as A Henry James encyclopedia / by Robert L. Gale. - Westport,
Conn.; London : Greenwood, 1989. - ISBN 0-313-25846-5;
or, Whos who in Henry James / Glenda Leeming. - London : Elm
Tree Books, 1976. - ISBN 0-241-89425-5.
Please note that real proper names, for example
Byron or Paris, and fictional names not
invented by James, for example Hamlet, are included in the main
sequence of the concordance.
Note also that for the time being, words in the stage-directions and
character directs in the play-texts are included in the concordance.
Another problem you may enounter is that foreign words are not identified
separately. I have projects in hand to implement improvements in both these
areas once all the texts have been upgraded to a common standard which is
amenable to automatic processing.
If you want to look up a lot of words and are not worried about being able
to link to the texts from them, you can download a plain-text file,
either with or without the text-codes, from the
separate page
of links, then browse that on your own PC (or even, whisper it not, print
it out on reams of paper!)
The pages which make up this concordance are generated by another of my natty,
home-grown, Perl scripts and are updated each time an addition is made to the
set of texts available here. Id like to thank Casey Abell and Richard
Hathaway for their valuable comments on draft versions of these pages and Alan
Reed (software consultant for the
Concordance of Medieval Occitan)
for useful discussions about online concordances.
Anyone requiring details of the processing to support academic use of this
concordance is welcome to email me
with specific questions or for the algorithm used.
November 2003
this introduction
© 2003
|