Text Databases
A text database is a collection of related documents assembled into a single searchable
unit. The individual documents can be massive or minuscule, but they should bear some
relation to each other.
A database is composed of smaller units called records. In a text database, a record can
be an entire document, a section within a document, a single page or a fragment of text
within a page. When searching a database, one or more records containing information
that satisfies the query will be retrieved.
A record can contain smaller regions of data called fields. A field usually defines a
particular type of data common to several or all records within a database. For instance,
in a database of corporate memos, wherein each memo makes up a record, the
following fields might be used: TO, FROM, DATE, SUBJECT and TEXT. The scope of
a search can be narrowed by restricting it to one or more fields. Limit the search to the
FROM field when searching for a sender's name. Only those records with the specified
name in that field would be retrieved.
Stopwords
A full-text retrieval software indexes every word in a document, with the exception
of stopwords. Stopwords are those terms that is programmed to ignore during the
indexing and retrieval processes, in order to prevent the retrieval of extraneous
records. Generally, a stopword list includes articles, pronouns, adjectives, adverbs
and prepositions (the, they, very, not, of, etc.) that are most common in the English
language.
Example:
PsycCrawler
unit. The individual documents can be massive or minuscule, but they should bear some
relation to each other.
A database is composed of smaller units called records. In a text database, a record can
be an entire document, a section within a document, a single page or a fragment of text
within a page. When searching a database, one or more records containing information
that satisfies the query will be retrieved.
A record can contain smaller regions of data called fields. A field usually defines a
particular type of data common to several or all records within a database. For instance,
in a database of corporate memos, wherein each memo makes up a record, the
following fields might be used: TO, FROM, DATE, SUBJECT and TEXT. The scope of
a search can be narrowed by restricting it to one or more fields. Limit the search to the
FROM field when searching for a sender's name. Only those records with the specified
name in that field would be retrieved.
Stopwords
A full-text retrieval software indexes every word in a document, with the exception
of stopwords. Stopwords are those terms that is programmed to ignore during the
indexing and retrieval processes, in order to prevent the retrieval of extraneous
records. Generally, a stopword list includes articles, pronouns, adjectives, adverbs
and prepositions (the, they, very, not, of, etc.) that are most common in the English
language.
Example:
PsycCrawler
Comments
Post a Comment