In the relational model you can ask questions like “how many screws were ordered in the last month?” and get the correct answer, yet the base relations (tables) do *not* have any duplicate tuples (rows), i.e., each tuple is uniquely identified by a set of its attributes, say, order_number, order_date, product number, quantity_ordered, product_type, most likely spread over several base relations. The answer can be obtained from the relational operator SUMMARIZE, which is more or less equivalent to SQL’s GROUP BY. In case you’re interested, SUMMARIZE is derivable from other more primitive relational operators, as discussed in Appendix A of *Databases, Types, and the Relational Model*.

The reason group-by or multisets are relevant is so you can ask questions like, ‘in the last month, how many screws were ordered?’, where Screw is a Category.

My point about there being a bijection between multisets and sets was referring to the bijection between the set of multiset[T] and the set of set[T x N > 0], probably more clear in the following example:

If T is the set of the first three letters in the alphabet (A, B, C), then the following are equivalent elements, first in multiset[T] and then in set[T x N > 0]. The bijection is the relation.

{AB} {(A, 1), (B, 1)}

{AAAB} {(A, 3), (B, 1)}

{AAAABBBC} {(A, 4), (B, 3), (C, 1)}

Does that make more sense?

]]>The relational model and all the theory stemming from it is founded on “Relations are sets of tuples, which in turn are sets of attributes”. The relational model is *not* founded on multisets. I don’t know how else to explain it, but if you need more, maybe the free book An Introduction to Relational Database Theory by Hugh Darwen, can help.

As an aside, how can you talk about a bijection, which is a one-to-one correspondence between elements of two *sets*, when talking about multisets?

Kat Prevost seems to have a better grasp of things.

]]>