This writeup details BAsCET's application on bibliographic references recognition. See BAsCET for an overview.

This agent looks for the separators lying at the borders of the field corresponding to the Father node given as a parameter in order to build an instance of it.

It begins with looking in the Blackboard for an instance of the field containing the Father node. If there are none, it activates again the separator seekers surrounding this containing field. For example, if the agent looks for the a field, contained in the author field, and if this last field has no instance in the Blackboard, it activates again the separators seeker preceding and following the author field.

Next, it looks for all separators possibly candidates to field separation. That means that it also looks for the containing field separators. For example, for the a field, it looks for the a-a separators, but also for all the family -author and author-, since author is the field containing a (see Table 2 of Building the logical part of a Concept Network representing bibliographic references).

When there is not enough separators, the Father node is desactivated by 4/5, to allow the slowdown of its action, waiting for other separators to be found.

Afterwards, it looks for the separator having the best happiness (the one that has the best recognition score, and that, statistically, is best located, see Separator seeker), and retains also the best complementary separators found.

Figure 1: Choice of separators

                     |    +-+     +-+   +-+     +-+| field:doc |
                     | ...|(|ICDAR|)|...|(|75-80|)|+-----------+
                     |    +-+     +-+   +-+     +-+|
                           v       v     v       v
                       +--+-+  +--+-+  +--+-+  +--+-+
                       |70|(|  |75|)|  |80|(|  |80|)|
                       +--+-+  +--+-+  +--+-+  +--+-+

E.g., in Figure 1, there are two left separators (sep:...-volume:() and two right separators (volume-...:)). The happiness of the right ones is higher than that of the left ones, because the volume is generally located after the conference name (booktitle). Thus, the agent first choose the rightmost left separator (with a 80 happiness), and the best of the right separators to its right (there is only one). It shapes the volume field with a happiness that is the average value of that of both separators: (80 + 75) / 2 = 77. (Note: the real field is pages, but this can be discovered later, by the ad hoc field seeker).

Figure 2: Initial status before the discovery of the author field, and the hierarchical descent of the a sub-fields.

               |                                            |
                  v       v
               +-------+  +-------+
               |field:a|  |field:a|
               +-------+  +-------+

Figure 2 shows a case in which, if the field seeker discovered the author field around its two a sub-fields, it would be profitable to "descent" these descriptions of one level, in order to use this knowledge.

Figure 3: Status of the Blackboard after the hierarchical descent of the sub-fields.

             |                                            |
                v               v               v
    +--------------+  +--------------------+  +--------------+
    |sep:...-author|  |    field:author    |  |sep:author-...|
    +--------------+  +----|----------|----+  +--------------+
                           v          v
                       +-------+  +-------+
                       |field:a|  |field:a|
                       +-------+  +-------+

Figure 3 shows the Blackboard's status after this "descent". Here, instead of deleting the objects conflicting with the new object (field:author), as the agent can see that these objects were sub-fields of the object, it does not delete them, but simply move them, thus keeping the already extracted knowledge (and compatible with its hypothesis) to take advantage from it (adding a happiness proportional to their length to the already computed happiness of the field).

Like that, when objects conflict (overlaying, inclusion) with the object that the agent will build, the object to build does a score fight (cf. Building Hierarchical Structures in the Blackboard). If it wins all these fights, it deletes all the objects that are not hierarchically inferior to the Father node in the Concept Network. Then, it "descents" all the descriptions in the hierarchy of the remaining Blackboard objects, taking care to modify their relative positions, if necessary. Those descriptions can be sub-fields, but also separators appearing only in this field, like the a-a separators, separating two authors inside the authors' field.

At last, the agent inhibits oneself for three cycles and activates again the agents of the separators contained in the discovered field, in order that they could be discovered quicker. If the agent did not create an object, it inhibits oneself and desactivates its Father node, in order to avoid its re-run in the same cycle (one waits one cycle for the evolving of the knowledge in the Blackboard).

See also: Building Hierarchical Structures in the Blackboard, Building the logical part of a Concept Network representing bibliographic references, Building a Concept Network to represent bibliographic references, BAsCET's application on bibliographic references recognition, BAsCET.

Log in or registerto write something here or to contact authors.