Linking, Equating, and Scaling: What's the Difference
When talking about student test scores, it is not uncommon to hear the words equating, linking, and scaling. Although these words play an important role in determining the score a student receives, the meaning of these words is not always clear.
Linking
Linking refers to a relationship between test scores on two different tests that are not necessarily built to have the same content or level of difficulty. When tests are purposefully built to differ in content or difficulty, linking must be conducted in order to establish a relationship between the test scores.
One common example of linking is relating ACT composite scores to SAT Critical Reading (formerly Verbal) and Math scores. Although these two assessments are both college entrance exams, they are purposefully created to be different. The SAT contains two general areas (excluding writing): Critical Reading and Math. The ACT contains four general areas (excluding writing): English, Mathematics, Reading, and Science. Although there are similarities between these two tests, they are obviously different.
So, why do we need to establish a relationship between tests when they are different? Continuing with the ACT/SAT example, one reason is that colleges and universities frequently require a certain minimum test score in order for a student to be considered for admission. If a university requires an ACT score of 22 in order for a student to be considered for admission, the university must also know the comparable SAT score in order to establish a minimum requirement for students taking only the SAT. In this example, the relationship is established by using scores from students who took both the ACT and SAT. One cautionary note is that when tests contain very different content, linking will not be adequate for all purposes.
Equating
Equating could be considered a more specific form of linking, or the strongest type of linking relationship. As discussed in a previous learning tool, equating is a process used to make test scores across different forms of the same test interchangeable. When test forms are created to be similar in content and difficulty, equating adjusts for differences in difficulty. Test forms are considered to be essentially the same, so scores on the two forms can be used interchangeably after equating has adjusted for differences in difficulty.
Scaling
Scaling is a process for transforming student raw scores (i.e., number correct) onto a different score scale. Reported scores on the SAT, which range from 400 to 1600, are scale scores rather than raw scores. What is the point of transforming these scores? The main purpose of scaling is to facilitate interpretation and understanding of test scores. Raw scores are difficult to interpret without additional information. If a student receives a raw score of 45, it is impossible to know how well the student did without knowing the total number of points and the performance of other students. Scale scores are established to provide an easy way of interpreting test scores. Once scales have been created and test users become familiar with the scale, test users can know how well an examinee did with only the information given in the scale score.
Another example of scaling is in the Item Response Theory (IRT) framework. Examinee raw scores are transformed onto a specific scale referred to as theta. On this scale, the scores of every group of examinees are scaled to have a mean of 0 and a standard deviation of 1. However, it is common for groups of examinees to differ in ability across years. If examinees in Year 2 did better on a test than in Year 1, this difference would not be evident on the theta scale. This creates the need for transforming the new theta scale onto the previous years' scale. A linear equation can be used to transform IRT parameter estimates from Year 2 to the Year 1 scale. When IRT is the underlying model for equating a test, this scaling procedure is an integral step in the overall IRT equating procedure.
Vertical Scaling or Vertical Equating
The purpose of vertical scaling or vertical equating is to place test scores from multiple levels of a test onto a single score scale. Testing programs commonly have a battery of tests that are administered to students at different grade levels. Typically, the tests measure the same constructs (e.g., geometry, algebra, numerical operations), although the levels of the tests differ to some degree in content and difficulty. For example, the content and difficulty of a 3rd grade test would be different than that of a 4th grade test, even though both tests were measuring math related constructs. When there are multiple test levels across grade levels, it is typically desirable to compare student progress from year to year. One way of doing this is by creating a single score scale across all levels of the test. Relating performance on each test level to the single score scale is referred to as vertical scaling.
So, What is the Difference, Really?
The actual procedures for conducting linking and equating are very similar. The major distinction between these two terms is conceptual in nature. Tests are linked to establish a relationship when tests have purposefully been created to be different. Tests are equated to adjust for differences in difficulty when two test forms of the same test have been built to be as similar as possible. Rather than relating scores on two tests, scaling transforms scores on one test to a different metric that is more easily interpreted and understood.

