The current form for message post is something like:
<FORM METHOD="POST"
  ENCTYPE="application/x-www-form-urlencoded"
  NAME="formcbox">
(stuff)
<a name='chatbox'></a>
<small><input type="hidden" name="op" value="message"><br /></small>
<INPUT TYPE="text" NAME="message"  SIZE=15 MAXLENGTH=255>
<INPUT TYPE="submit" NAME="message_send" VALUE="talk">
</FORM>
(please note - double quotes should be used for <a name='chatbox'></a>

Insert a random value (or sequential) in a hidden form tag. This is not overly important. This number is re-generated each page load. Thus, the form is:

<FORM METHOD="POST"
  ENCTYPE="application/x-www-form-urlencoded"
  NAME="formcbox">
(stuff)
<a name='chatbox'></a>
<small><input type="hidden" name="op" value="message"><br /></small>
<INPUT TYPE="hidden" NAME="magic" value="42">
<INPUT TYPE="text" NAME="message"  SIZE=15 MAXLENGTH=255>
<INPUT TYPE="submit" NAME="message_send" VALUE="talk">
</FORM>

In the message table, the magic number would be inserted too. This value must be unique within the table. If someone reloads the page then the same magic number is submitted. The second time, it would be rejected.

This does pose several problems:

  • Other clients would need to have this magic number submission
  • Doesn't let people hit the 'back' button and possibly send a different message if there was a typo to be corrected.

A better solution is to actually put in the message table a constraint around the person/message combination so that it must be unique.

Assume:

Name         Null?    Type
------------ -------- ----------------------------
USER         NOT NULL number
MESSAGE      NOT NULL varchar(255)
TIMESTAMP    NOT NULL datetime
In this case, a unique index (its a small enough table that gets cleaned regularly) is placed on the user/message rows:
CREATE UNIQUE INDEX no_chatbox_dups ON chatbox_tab
    ( USER, MESSAGE )
Thus, any insertion of the same user/message information will come back with the error "cannot insert duplicate value into table" which would then need to be handled properly by perl (in this case, silently fail).

Note: this takes advantage of the regular cleansing of the chatterbox by other processes that roll messages out of the table.


Session cookies and a hash work for almost every case - people use browsers in the vast majority of cases for chatting on edev. However, there are a few distinct times when this would not work.

With the current wave of E2 clients (and the existing ones that are already exist such as the java chatterbox), not ever client that can send a cookie can properly receive a cookie. Thus, sending the "hash of last message" id to the java chatterbox (or a perl command line chatterbox poster, or the #everything chatterbot relay) will not be picked up. The next message would send out the same cookie as it did before without any such hash.

Requiring this part of the cookie would meet with some resistance for much the same reason the "magic" value did - far too much work on the client side to properly pick up and interpret the information.

Furthermore, there is no 'time-limit' on this session cookie. Consider a person trying to say something when the chatterbox is dead silent.

<m_turner> Hello? Anyone out there?
(5 min pass, the chatterbox is clear again)
<m_turner> Hello? Anyone out there?

In my (probably not so humble opinion) the best solution is one that of the unique constraint upon the chatterbox table itself which is then completely done on the database side without any requirements for how the information got there.

m_turner's idea is a good start, but the magic-number-handling can all be done server-side, instead of client-side. Enter hash function. Basically, a hash function is a function in the form f(x), but the length of f(x) remains constant, no matter what x is. What that means is, you pass a value to the hash function. It returns another value, the hash digest, whose value depends on what you pass it. But, the length of what it returns is always the same, and is usually less than the length of what you pass it. So, it is possible to have two messages with the same hash digest, but it is very hard to make a message that matches to a certain hash digest.

So, what? Well, what happens is the hash digest of each message is stored with the message in the database (or in a session cookie?). If two messages have the same hash digest and are sent within an hour of each other it is almost definite that they are identical messages. If a message is deemed to be identical, it is rejected. To summarize:

  1. Bob sends a message
  2. The hash function gives us the hash digest of Bob's message
  3. The hash digest of the message is stored with the message
  4. Bob sends another message with the same hash digest, within an hour of the previous message
  5. Bob's second message is rejected
  1. Joe sends a message
  2. The hash function gives us the hash digest of Bob's message
  3. The hash digest of the message is stored with the message
  4. Joe hits the Back button and sends another message
  5. The hash function gives us the hash digest of Bob's message
  6. The hash digest of the message is stored with the message
  7. The two hash digests do not match
  8. Joe's second message is allowed
Or, like I said, we could use session cookies:
  1. Bob sends a message
  2. The hash function gives us the hash digest of Bob's message
  3. The hash digest of the message is stored in a cookie which will expire at the end of the session
  4. Bob sends another message before the session expires
  5. The hash digest of the second message is the same as the one in the session cookie
  6. Bob's message is rejected
Personally, I like session cookies, since they won't take up space in the database.
m_turner points out that not all clients that can send cookies support cookies (for example, the Java Chatterbox). However, most clients that cannot receive cookies (like the Java catbox) cannot refresh or reload or anything like that. m_turner also points out that a session cookie doesn't necessarily have a time limit. In that case, we can set the cookie to expire in, oh, say, half-an-hour from when it is saved. m_turner's final idea is to put a constraint in the database itself that doesn't allow the recipient and message to be the same for two or more messages. This works fine, but we should see which is faster: computing and then checking hash digests, or just checking the message body itself.

Log in or register to write something here or to contact authors.