Abstract:
Document engineering is evolving as a new discipline of specifying, designing
and implementing the electronic documents that request or provide interfaces to
business processes via Web-based services. At the basic level, document engineering
analyzes and designs methods yielding formal models that describe the information
that business processes or services require. Due to the fact that there are enormous
amounts of documents injected into a business enterprise everyday, there is a demand
to understand the relationships or links between information in those documents, often
existing in a distributed manner, in order to support better document management
systems and information retrieval processes. Existing technologies for linking
documents cannot cope with scenarios in which there are large data volumes and have
various limitations in processing them.
This research proposes an alternative model, namely DEFCA, to generate
information links for relevant documents automatically. We define DEFCA input
structure by XML. Structuring by XML, DEFCA is more open and able to work with
various domains. In DEFCA, we applied Formal Concept Analysis (FCA), which is a
data analysis technique, in order to analyze and extract relationships from a set of
documents. These relationships will be used as the rules to create document links. By
applying FCA, rules and links will be generated automatically without requiring an
expert to predefine a set of document relationships. Therefore, DEFCA is suitable for
any document management system, including one with a large data volume and
frequent updates of document sets.
We have also implemented a document management prototype using DEFCA.
We have demonstrated our experiments which aimed at verifying the correctness of
rules and links generated from DEFCA. The results of our experiments, once we
passed sample document sets through the DEFCA prototype, yielded a linkbase that
enabled us to retrieve information links among relevant documents easily and
effectively.