If you’re going to work with databases, you probably ought to know something about data. In particular, we don’t put data directly into a database; we have to encode it and represent it in a format which a machine can handle. Encoding data existed long before we had computers, even today in coding medical data using various systems the separate job for medical records and profession in and of itself.
Yet, in spite of the fact this is the fundamental concept in our trade, almost nobody gets a class on various types of scales, measurements, encoding schemes, etc. nobody learns how to design their data. Many times, they are lucky to get even a good introduction to some basic measurements and scales.
The inability of Americans to handle “the metric system” (more properly called SI for “Systemé Iinternationale du Units”) is a running joke. Comedian Wanda Sykes has a whole routine on this and it was a throwaway gag on the “Big Bang” television series, among others.
Types Of Scales
One thing that you need to consider as you are trying to set identifiers in your database systems (and really any system!) is to understand the different types of scales that are available.
In this section I will cover the different types of scales that are in use (and there are more than you might expect!)
Nominal Scales
A nominal scale simply names something. This name can be characters digits or even symbols of some sort. If you are going to get this data into a computer there area few things you would like to do with these names. The first thing is that the Unicode standards have a set of Latin letters, digits, and a few basic punctuation marks that are supposed to be available to use in any of the other character sets defined in the standard. The reason for this common subset of characters is that they can be used for the names of all of the SI units. The digits and punctuation marks allow you to express measurements in those units. It doesn’t matter if you’re working in Chinese, English, German, Aramaic Greek or whatever, you can still express data in international standard units.
I would hope it’s obvious that a nominal scale has to use characters and not numbers for its values. A number by its nature has to represent a quantity or magnitude so that it makes sense to compare them or do arithmetic with them. But it makes no sense to compare “John” the name to “Marsha” the name.
If you’re going to use digits for a name, then it’s really handy if you can keep that encoding to a fixed length. The fixed length will let you design forms and display screens with it. A fixed length also gives you a validation check.
For example, you know that five digits might be a valid ZIP Code; you know what a valid US phone number looks like from the count of the digits and the placement of punctuation marks; and so forth. This is why we like regular expressions. The first versions of SQL had a very simple “< string expression>LIKE <regular expression>”
predicate. Later, we added the “<string expression> SIMILAR TO <regular expression> [ESCAPE <character>]”
expression. I’m of the opinion. This is underused in DDL. Every time you don’t use it you should be writing some sort of validation on that column when it is used in procedural source code. Commercial system, how often do you think something like the ZIP Code appears in a form? Starting to see that there might be an advantage to doing it one way one place one time?
In theory, a nominal scale could be done with pictures. In fact, you see this in some devices or displays. Right now, I’m looking at my phone and a weather forecasting app on it. There are cartoon pictures of clouds, sunshine, rising Sun, setting sun, and so forth. It makes getting the forecast information that’s relevant to me very fast, but not very precise.
Categorical Scales
Categorical scales put something into a category. It has virtually the same naming rules that a nominal scale does, but it should be more general. By that I mean we know that “Rover” as our dog, but his category is “dogs”. There’s nothing that says categories have to be neatly organized. It’s obviously better if they were so you have some idea what you’re looking at. One of my favorite quotes is from the essay “The Analytical Language of John Wilkins” (El idioma analítico de John Wilkins), published in 1942 by Jorge Luis Borges:
“These ambiguities, redundancies, and deficiencies recall those attributed by Dr. Franz Kuhn to a certain Chinese encyclopedia called the Heavenly Emporium of Benevolent Knowledge. In its distant pages it is written that animals are divided into (a) those that belong to the emperor; (b) embalmed ones; (c) those that are trained; (d) suckling pigs; (e) mermaids; (f) fabulous ones; (g) stray dogs; (h) those that are included in this classification; (i) those that tremble as if they were mad; (j) innumerable ones; (k) those drawn with a very fine camel’s-hair brush; (l) etcetera; (m) those that have just broken the flower vase; (n) those that at a distance resemble flies.”
Probably the most common categorical scale at people have seen would be the Dewey Decimal Classification used in libraries. Specifically, this is hierarchical encoding on categorical scale. The advantage is that it’s based on a numerical hierarchy for the category names. By adding more decimal places, you can expand the categories in an orderly fashion.
It’s not always obvious which category an attribute goes into. In fact, the Dewey decimal people get into a lot of political discussions when anything changes. The flat earth, crypto zoology, astrologers, and other pseudo-sciences are always trying to get put into a “mainstream recognized” category in their field to become respectable among librarians.
Sometimes you run into something that just doesn’t fall into a category. Think about when we started exploring Australia and ran into the platypus and the echidna. The zoologists solution was to create another category for them (monotremes, or egg laying mammals). But creating a very general “miscellaneous” category will destroy information by two by putting too many dissimilar things into one category.
Absolute Scales
An absolute scale is a count of individual, distinct items in the set. Think about a dozen eggs, a ream of paper and other “traditional packaging” used in various trades. The important property is that the set is made up of interchangeable elements. You don’t really care which large white egg or which sheet of A4 paper you get from your package.
Decades ago, when the UK was converting to metric, a dairy did a promotion where they sold eggs in packs of 10. The actual cost per egg was less than it had been when they were selling the eggs by the dozen. But it was a failure because people could not conceive of getting less than a dozen eggs in a package.
People also have problems with fractions. For years McDonald’s hamburgers advertised the quarter pounder burger. The A&W fast food chain introduced a 1/3 pound burger in the 1980’s as competition. It failed because people thought ¼ is bigger than 1/3.
It used to be that a ream was a quantity of paper, consisting of 20 quires or 500 sheets (formerly 480 sheets), or 516 sheets (printer’s ream or perfect ream). Having worked the print trades, it has been decades since I saw anybody use a quire and back then it was only for mimeograph stencils.
Today, the absolute scale that was the retail standard in the beverage industry for beer and soft drinks for most of my life was the sixpack. It is almost defunct today. The packages now are 4, 8, 12 and 24 bottles or cans.
Ordinal Scales
An ordinal scale puts the attribute in a linear order. There are no operations or computations done with it. If you are paying attention, you would have noticed as I said linear order. You get into problems in a relationship is not linear.
I’m going to assume by now that everybody has played a game of rock-paper-scissors (or, if you’re a nerd and a fan of the Big Bang TV show, played a game of rock-paper-scissors-lizard-Spock). More subtle versions of the non-transitive problem show up in voting paradoxes and choices involving three or more options. If you have some time you might want to Google “Arrow’s voting paradox” for a look at some of the problems of trying to pick a winner when you have three or more candidates.
Rank Scales
Rank scales are sometimes grouped with ordinal scales. They have a linear ordering, and an origin. Having an origin prevent some of the transitivity problems.
One example would be military ranks. You cannot take three privates, add them together and get a Sargent as a result. Other examples are the Mohs scale for hardness (geology) of minerals and the Scofield scale for the strength of peppers.
Interval Scales
An interval scale is based on a standardized unit or interval. Another property of these scales is that they have no natural origin. For example, an absolute scale has the natural origin of zero items in the count. Ordinal scales have an origin where the ordering starts, a first element, as it were. Other scales have a metric function. This has nothing to do with the metric system means that you can do certain kinds of simple arithmetic on them. In particular, a metric function has the properties that
M(a, a) = 0 M(a,b) = M(b,a) M(a, b) + M(b, c) >= M(a, c)
This is essentially like two-dimensional distance function over that particular attributes being measured. In the case of an absolute scale, the difference between a dozen eggs and a dozen eggs is nothing, it doesn’t matter how you count them, and a dozen eggs plus a dozen eggs is equal to (or greater than) two dozen eggs.
A metric function doesn’t just have to be simple math. There are log interval scales, such as the Richter scale for earthquakes. Each number on the Richter scale goes up by a power, not by a simple uniform interval.
Ratio Scales
These scales have a base unit, natural origin and meaningful operations. We also allow the creation of compound units with ratio and Interval scales They called ratio scales because the units of measure are expressed as multiples or fractions of some base unit. The most obvious example of this is SI units. The prefixes that come from Latin, like “millimeter = mm”, represent fractions of a basic meter and the prefixes that come from Greek, like “kilometer = km” are multiples of that unit.
Why Is Scale Important
Somewhere in my closet. I have a T-shirt with my favorite slogan on it. It reads, “on a scale from 1 to 10, what color is your favorite letter of the alphabet?” It’s absurd on the face of it, but when people read that T-shirt, they try to answer the question. The underlying assumption that makes the joke funny is that the mix of scales can actually be used that way.
You can only convert from one scale to another, if they are of the same type. For example, to convert from yards to meters, 1 yd, 0.9144 m
We also have derived units, which have to be built out of ratio or interval scales. Look at the speedometer in your car and you see that it’s miles or kilometers per hour (distance and time). Your house is measured in square meters (distance and distance). Your tire pressure is measured in Pascals, which are defined as (1 Pa = 1 N/m²). The N is for Newton, a unit of force defined as 1 Newton = (kg × meter per sec2) (distance, time, and mass).
In the SI system you have seven basic units that are used to define the derived units. In the database, the closer you can store your data in these fundamental units, more flexible, it will be. However, it seemed a little silly to break all derived units into scales. And sometimes it’s just impossible; think about two very different size tires that have the same pressure.
The post Scales & Measurements appeared first on Simple Talk.