Chinese Remainder Theorem

We can represent numbers in a variety of different ways. For instance, the number 1925 can be written in any of the following formats:

Each of these representations has strengths and weaknesses - there is no "best" way to represent a number. Indeed, if that were the case we would have long ago settled on that one representation and any others would now be little more than archeological curiosities (and there are countless representations that have become just that). That so many different representations exist is a testament to the fact that most of them still serve useful purposes. How useful? Useful enough to justify the inevitable confusion that frequently results by having so many.

So which representation should we use? That depends on what we are trying to do at the moment. The more representations we know about, know how to work with, and understand the strengths and weaknesses of, the more options available to us to accomplish our goal more efficiently - or perhaps even at all.

A cursory examination of the above representations reveals that most of them tend to reveal something about the internal structure of the number being represented. The last method tells us what the prime factors are, which is useful in a wide number of settings but wouldn't be helpful at all in determining the bit pattern stored in a computer's memory. The middle representation reveals that pattern directly but is pretty worthless for quickly determining if the number is divisible by, say, 77.

The Chinese Remainder Theorem, among other things, simply provides us with one more tool for our toolbox. One more way of representing integers that reveals certain types of information about the internal structure of the number that lends itself to making some tasks easier and of gaining a deeper insight into how and why some algorithms work.

Introduction by Way of Example

You have a box containing marbles and the only thing you know about the number of marbles in the box is that it can't be more than 1000. How many marbles are in the box?

You don't know. There are 1001 possible answers ranging from 0 to 1000 and any one of them is, in theory, as likely to be correct as any other (depending on what you know about how the marbles got in the box).

Now imagine that you have a scoop that can hold exactly 7 marbles - we'll call this a 7-scoop (clearly we are not too big on clever or imaginative names). You can transfer marbles from one box to another box but, for whatever reason, you can't count how many full scoops you transferred - perhaps it is just a matter of not being sure that you can accurately keep count since it is likely to become a pretty mind-numbing process. The only information you are able to gather is how many marbles ended up in the scoop immediately after the last full scoop (the first scoop if there were no full scoops). To enforce this point, imagine that someone else is doing the actual transfer and all they are going to tell you is how many marbles ended up in that final scoop.

In the case of a 7-scoop, you only know that the number you will be given will be one of seven numbers ranging from 0 to 6. Let's say that the number in the last scoop turned out to be 3. How many marbles are in the box? You still don't know - but you have considerably narrowed down the possibilities. Instead of 1001 possible correct answers you have eliminated essentially 6/7 of them and are left with only 143. You could easily list them and the list would look something like:

Now imagine that you take a different scoop - a 33-scoop this time. You repeat this process and end up with 14 marbles in the final scoop. From this you could make a second list of the possible correct answers based on just this piece of information and that list would contain 30 entries:

By using both lists, you can drastically narrow the range even further since you know that the correct answer has to be on both such lists. You could simply examine both lists and identify the common numbers:

It's pretty amazing that just two relatively small scoop sizes allowed us to narrow the field of possible numbers so much. We've actually eliminated 99.6% of the possible values!

Let's see if we can use just one more scoop to figure out the final answer. How big should that scoop be? Would a 2-scoop be enough? No, because we would only have two possible outcomes and that would not be enough to tell us which of the four possible answers is correct. By that line of reasoning, we need a scoop that holds at least four marbles. So let's use a 4-scoop and imagine that the result is that the last one had 1 marble in it. Based on that information alone, we could construct a list of possible answers just like we did before:

After comparing it with the prior results, however, we are left with the following possible answers:

What might not be clear is that the number 773 is equivalent to the following set of numbers:

Where the numbers are remainders written in order from the smallest scoop to the largest.

At this point you are probably thinking something like, "Okay. Kinda interesting. What's the point?" Fair enough. This admittedly contrived exampled is unlikely to result in any Eureka! moments - or even give much of a glimpse into any useful application for the concepts introduced. The point has been simply to expose you to a different way of thinking about how to count and represent things and gain a bit of experience in some of the mechanics of doing so. Nothing more.

A Bit (Hopefully Quite a Bit) More Insight

Let's now turn our attention to why we needed three scoops and how that relates to the size of the scoop. To frame this discussion, let's consider some extreme alternatives and then a number of leading questions.

How is this related to modular arithmetic?

That connection is probably pretty obvious by now. The remainder values are simply the residues after reducing the number using the scoop sizes as the moduli. In other words:

How many different scoops would we have needed had our first scoop held 1200 marbles?

Clearly we wouldn't have needed any more scoops at all. Our first scoop would have been enough - the value of R₁₂₀₀ would have been 773 and we would have been done.

So why didn't we just do that? What might have been downsides to doing so? The obvious answer is that it wouldn't have gained us anything if the whole premise was that we weren't confident that we could count up to 773 without making a mistake.

What's the largest initial scoop size that might require us to use a second scoop size?

From the previous example, it should be clear that if our initial scoop is capable of holding more than all the marbles, then we do not need a second scoop. The conclusion, therefore, is that as long as our first scoop might just barely hold all the marbles, then we might need additional scoops. So the largest initial scoop size that satisfies this requirement would a 1000-scoop. If you are thinking that the answer should be 999-scoop, that's a reasonable (but wrong) conclusion - see the next question.

Under what conditions would a 1000-scoop not have been sufficient?

What if our last scoop had no marbles in it? There are two ways that this might happen - if the box originally had no marbles in it or if the box originally had 1000 marbles in it. Remember, we are not being told how many full scoops were bailed out, just how many marbles were in the very next scoop. We would be unable to distinguish between these two cases had that occurred.

Given the situation in the previous question, would a additional 2-scoop have been sufficient to resolve puzzle?

No. In either possible case (0 marbles or 1000 marbles) the 2-scoop would have held zero marbles. In fact, no matter how many marbles the 1000-scoop ended up holding, the 2-scoop would have provided no additional information.

How can the problem encountered above be avoided?

Go back and review the "aside" section of the example. Given the two scoops we were using initially, the equation we came up with was:

Notice that the fraction 33/7 can't be reduced. What about the scoop sizes 1000 and 2? That would result in the fraction (1000/2) which can be simplified. In essence, we are losing information (or more accurately, not getting as much information as we potentially could) if that fraction permits any simplification whatsoever.

Don't worry if you don't see why this is so - the physics is in the hand waving. We're not claiming to have proven anything here, just trying to present a way of looking at things that gives at least a hint of what's going on.

Assuming we accept this explanation, then to prevent this from happening we must merely choose scoop sizes that have no factors in common. In other words, scoop sizes that are relatively prime.

How can we see that we aren't getting the most information we think we should be able to if the relatively prime constraint is violated?

If we use an N-scoop, then we expect there to be one of N possible outcomes and that our knowledge of the number being represented improves accordingly. But if the scoop sizes have any factors in common, then knowing how many marbles are left over in one scoop permits us to determine beforehand some of the constraints on how many marbles might be left over in the second.

For instance, given the scoop sizes 15 and 6, which have 3 as a common factor, if the 15-scoop yields a remainder of 4, then we know that the 6-scoop must end up with either 1 marble or 4 marbles. We don't have six viable outcomes, just two. In fact, the 6-scoop provides no more information that a 2-scoop would have.

What if we had used the 6-scoop first and it had yielded 4 marbles? Now the only possible values for the 15-scoop would have been 1, 4, 7, 10, or 13. It now provides no more information than a 5-scoop would have. In either case, the redundant factor of three is effectively lost.

The conclusion once again is that our scoop sizes should not have any redundant factors among their sizes - each should be relatively prime to all of the others.

Would a 3-scoop have been an adequate choice for a second scoop?

Since 1000 and 3 are relatively prime, we at least expect to gain additional information. It may or may not be enough to avoid needing a third scoop, but it will tell us something.

If the box held 0 marbles, then both the first scoop and the second scoop would have yielded zero marbles left over. But if the box held 1000 marbles, then the first scoop would have yielded a result of zero but the second scoop would have yielded 1 marble, permitting us to distinguish between the two cases.

What's the maximum number of marbles that the box could contain such that a 1000-scoop and a 3-scoop would have been sufficient?

There are a number of ways to get at the answer to this. One way is to imagine that, indirectly, the second scoop permits us to determine how many full scoops were transferred using the first scoop with three possible outcomes - 0, 1, or 2. That means that the largest number of marbles we could have dealt with would be two full scoops of 1000 marbles plus the largest number of marbles that would have avoided filling it a third time. The works out to 2,999 marbles. A more useful way to state this is that we could have worked with any number of marbles less than 3000.

The conclusion is that, if all of the scoop sizes are relatively prime, we can represent any number less than the product of all the scoop sizes.

Were the scoop sizes used in the solution of the puzzle truly adequate?

Scoop sizes of {4, 7, 33}, which are all relatively prime, only permit us to represent integers less than 924. While our actual number of marbles turned out to be less than that, we weren't guaranteed that would be the case. The 77 values that we couldn't have represented properly would have given us problems under two situations: had the actual number of marbles been between in the range from 924 to 1000 (inclusive) or if the number of marbles had been less than 77. A little reflection on what has already been discussed should make the reason why this latter range is troublesome apparent. Hint: If there had been 76 marbles in the box, how would we have distinguished that from 1000 marbles? In either case, our next to last list (the one we used to determine the final scoop size) would have had five values instead of four.

Assuming that we use scoops that are all relatively prime, is there a one to one mapping between the set of remainders (marbles in the last scoop of each size) and the number represented?

Given a set of scoop sizes that are all relatively prime to each other (known as being "pair-wise relatively prime"), then each possible set of remainders provides a unique representation for each non-negative integer less than the product of all the scoop sizes.

One way to at least show that it is reasonable to expect this to be the case is to consider how many unique sets of remainders exist for a given set of scoops. If we have, say, four scoops of size S₁, S₂, S₃, and S₄, (here the subscripts aren't the size of the scoops, they're just index numbers) then we should be able to represent a total of

different values (presumably ranging from 0 to T-1). The claim is that each number can be represented as a list of four values, namely the four remainders (again, the subscripts are just indices in this example):

The number of unique representations for N is simply the product of the number of unique representations for each of the remainders. For each remainder, this is equal to the scoop size and hence the number of unique representations for N is exactly equal to the total number of values that can, presumably, be represented.

However, this by itself isn't enough to state that all of the numbers from 0 to T-1 can be uniquely represented this way. What if some numbers can be represented by more than one set of remainder values? What if some numbers can't be represented at all?

We can rule out these two possibilities directly. For any given non-negative integer (regardless of whether it is smaller than, equal to, or larger than T), we can some up with a set of remainder values for it. Hence every value has at least one representation. Furthermore, any given non-negative integer has exactly one set of remainder values (for a given set of scoops) that can be associated with it. That this is the case becomes obvious when you consider how the remainder values are obtained. If there were multiple representations, then we would have to be able to take the same number and, somehow, come up with different remainder values for the same size scoop, which is not possible.

But what about the possibility that some some sets of remainder values might be able to represent more than one number? Or that some sets of remainder values might not represent numbers at all?

These can't be discounted as easily because, in fact, this is precisely the situation we would have if our scoop sizes fail to conform to the "relatively prime" rule. For instance, what if we have two scoops, a 2-scoop and a 4-scoop? We might think that we can represent eight values (0 through 7). But what we would find is that, in doing so, only four of the possible representations get used and that each of those gets used twice. What about the unused representations? Do they even represent numbers? No. For instance, one of the possible sets of remainders would indicate that the remainder of the 2-scoop is 1 while the remainder of the 4-scoop is 2. But the first requires that the number be odd while the second requires it be even. Since a number can't be both even and odd, this is not a valid representation.

So let's ask the question more precisely and in mathematical terms: What conditions permit the same set of remainder values to represent multiple numbers? Given two different numbers, N and N', that have the same set of remainder values (for simplicity, we will limit ourselves to two scoops, S₁ and S₂), R₁ and R₂, what are the resulting constraints?

If the two numbers are the same, then the pairs (K₁, K₂) and (K₁', K₂') must be the same and the above equations are all zero. But that is the uninteresting case. If the difference is non-zero, then we still know that it must be an integer and can therefore be factored into it's prime constituents as can the two scoop sizes. For simplicity, we'll assume that each scoop size is the product of two numbers - a factor, C, shared by both scoops and a factor, F_i, that is unique to that scoop.

In order to satisfy the equality, the quantity (K₁ -K₁') must be a multiple, M₂, of F₂ and the quantity (K₂ -K₂') must likewise be some multiple, M₁, of F₁:

At this point, it is clear that the two multiples M₁ and M₂ must be the same and hence can be written as just M.

The product of the scoop sizes is the nominal range of values that we hope to be able to represent. Earlier we called this quantity T. Also, remember that D represents the possible differences between two numbers that share the same representation in terms of a set of remainder values. The smallest nonzero value it can be is when M=1. This leaves us with

While this was developed for the case of just two scoops, it is a straightforward exercise to extend it to an arbitrary number of scoops with a similar result - the numerator will be T (the product of all the scoop sizes) and the denominator, C, will be the product of all the common factors between any and all pairs of scoops. For our purposes, it is sufficient to note that C will be 1 if and only if all of the scoops are relatively prime to each other. Otherwise, it will be greater than 1.

Although it has taken a while to get here, we are now at a point to make some significant observations. If the scoop sizes are all relatively prime then C=1 and, given a number, N, that is with the range 0 to (T-1), no other number, N', that is within this range can be represented by the same set of remainder values. This, finally, permits us to assert that if the scoops are relatively prime that each of the T possible representations maps to exactly one non-negative integer less than T. We've already established that each integer maps to exactly one possible representation. The combination of these two results therefore establishes that the method of using a set of remainders to represent integers is unique for the first T non-negative integers.

If we want to keep the numbers that we need to deal with to an absolute minimum, how many scoops would have been required to work with 1000 marbles.

To understand the motivation for this question, consider working with the 7-scoop and the 33-scoop. When you got to the last scoop in the first case, you could probably take a quick glance at the contents and know how many marbles were left over. That probably wouldn't have been the case with the 33-scoop. Hence it is not unreasonable to suspect that there will be advantages to keeping the scoop sizes as small as possible, even if the penalty is having to work with more scoops.

So, to answer the question, we simply need to start with the smallest prime number and keep multiplying by the next smallest until we achieve a product greater than the number we want to represent. Since 2*3*5*7 is 210 and 210*11 is 2310, we could have worked this problem using no more than five scoops with the largest of them holding only 11 marbles.

But this may or may not have been the optimal solution - in fact the goal of keeping the numbers to an absolute minimum is actually somewhat ambiguous.

Notice that the scoop sizes only have to be relatively prime. Let's say that we are comfortable working with scoop sizes of ten or less. Then, instead of a 2-scoop, we could have used an 8-scoop. Likewise, we could have used a 9-scoop instead of a 3-scoop. The addition of the 5-scoop and the 7-scoop (a total of only four scoops, all of which hold fewer than ten marbles) would have allowed us to work with any number of marbles less than 2520.

Is this method really practical if we are wanting to work with really large integers?

Notice that, like the factorial function, the range of numbers that we can represent grows very rapidly with the number of scoops. If we wanted to use this method to count the number of people on the planet, the first ten (prime numbered sized) scoops would let us deal with any number less than 6,469,693,230. This may or may not be big enough, but since the eleventh scoop (the 31-scoop) would extend that range to over 200 billion, we know that eleven scoops is more than adequate.

In cryptography, working with hundred or even thousand digit integers is not uncommon. If we use the first 53 prime numbers, then the largest individual number we have to deal with is 251 (less than one byte) but can uniquely represent integers having up to 98 decimal digits. While this requires 53 bytes to represent such a number, representing it as a pure binary number would still require 41 bytes, so the penalty isn't too severe.

Furthermore, we can remove some of that penalty by again using scoops that are only relatively prime. For instance, instead of a 2-scoop we could use a 256-scoop. That change alone extends our range to any integer with 100 or fewer decimal digits while working with values that are each less than one byte.

What should I have gotten from this so far?

One thing you shouldn't have gotten yet is any inkling of what the Chinese Remainder Theorem actually is or how you use it. Furthermore, while it has been claimed that this means of representation reveals some useful things about the structure of the number or makes certain tasks much simpler to perform, there is no reason for you to believe that you have actually seen significant evidence that those claims are justified. Instead, the claims should be sitting in the back of your mind as a set of expectations you hope to have fulfilled as we proceed further.

Development of the Chinese Remainder Theorem

The goal now is, given a set of residues, how can we easily determine the value of N?

We are looking for a single expression that incorporates all of the residues and, when evaluated, produces N.

How will we know when we have been successful? That's actually pretty simply, we know that if we take the residues of the expression we must generate the correct values.

Clearly this doesn't work, but let's see if we can't force it to satisfy the first of the above constraint relations by multiplying the last two terms by M₁.

Now, since the last two terms are multiples of M₁, we are guaranteed that their residues modulo-M₁ vanish. In essence, we have "turned off" the residues we aren't interested in.

One problem that probably isn't obvious yet (but this is as good a time as any to deal with it) is that, in general, our set of residues map to an infinite number of integers. However, all but one of them are outside our range of interest. Since we want our function to return the one, single value of N that we are interested in, we need to strip out all of the others out by reducing the final result modolu-T before returning it.

That's fine for the first constraint, but what about the second and third? What happens if we attempt the same approach to turn off those?

While we succeed in turning off of ones we don't want, the ones that remains are very unlikely to yield the correct residues.

While none of the residues are correct, notice that we have successfully isolated them from each other. The first residue is only dependent on R₁ and the others cannot affect it. The same is true for the others.

To force the first one to yield the correct answer, we simply have to remove the affect the product of M₂ and M₃. We can do this my multiplying that term by the modulo-M₁ multiplicative inverse of this product, which we'll designate I₁. To be more explicit, I_i is the modulo-M_i multiplicative inverse of T/M_i (the product of all of the moduli except M_i). We'll do the same for the other two terms using the appropriate moduli:

N = f(R₁,R₂,R₃) = (I₁M₂M₃R₁ + M₁I₂M₃R₂ + M₁M₂I₃R₃) mod T

One point that is easy to overlook - or to misinterpret - is that in order to multiply something by a multiplicative inverse in a modulo-n world, the multiplicative inverse must actually exist! If any of our moduli are not prime numbers, then there will be values that won't have multiplicative inverses in that world. This might lead us to assert that the moduli used with this theorem must be prime numbers. However, notice that the only values that we need to compute multiplicative inverses for in any given modulo-n world are the other moduli. Since we are assured that a multiplicative inverse for a number exists as long as that number is relatively prime to the modulus, we are assured that the needed values will exist provided all of our moduli are pair-wise relatively prime, which we have long since established that we need to adhere to for a variety other reasons. Hence, this has no additional impact on us.

Converting From CRT Format Back to Integer

Recall that in our puzzle at the beginning of this page we figured out the value of N by making lists and looking for common elements or by generating sets of simultaneous equations. Doing so wasn't terribly difficult, but certainly it was a bit tedious. More to the point, the amount of effort to resolve the puzzle for a different number of marbles in the box wouldn't have been significantly easier on subsequent attempts.

Let's see if we can use the Chinese Remainder Theorem to greatly simplify this - at least to the degree that we can do most of the grunt work once up front and then have a very efficient expression available for subsequent trials.

The Chinese Remainder Theorem says that we can relate the set of residues to N as follows.

N = f(R₁,R₂,R₃) = (I₁M₂M₃R₁ + M₁I₂M₃R₂ + M₁M₂I₃R₃) mod T

We already know the three moduli, now we just need to determine the three inverses.

By definition, the modulo-n multiplicative inverse of k is that value of I such that

While something like the Extended Euclidean Algorithm could be used to find the inverses, the moduli are small enough that they can be found by inspection and/or exhaustive search quite easily

Mathematics Using CRT

For a number represented using the CRT with two moduli the equation for the number is as follows

Where T is the product of the moduli associated with each of the remainders. Clearly this format can be extended to include as many moduli as desired, but two is sufficient for our purposes right now.

Since the K_i values are constant, we only need to track the R_i remainder values and can do so quite economically as a sequence of two values (known as a 2-tuple) such as {R₁, R₂}, just as we did in a few places earlier.

Addition

Thus we have a very simple way of adding n-tuples using CRT - simply add the corresponding elements of each n-tuple and reduce the result by the corresponding modulus.

Multiplication

N*N' = (K₁²(R₁R'₁) + K₁K₂(R₁R'₂+ R₂R'₁) + K₂²(R₂R'₂) ) mod T

Since any given K_i value is congruent to zero for every modulus except one, any product of two different such values will be zero for all moduli. Thus the only surviving terms are:

However, note that the additional K_i values can be reduced by the corresponding modulus and then recall that each K_i value was specifically constructed to be congruent to 1 under those conditions, then the additional K_i values vanish. Thus our multiplication rule for CRT n-tuples is

Exponentiation

As with multiplication, any terms that involve more than one remainder will involve more than on K_i value and hence be congruent to zero when reduced by any modulus. Similarly, the only ones that survive will only be multiplied by additional K_i values that are all congruent to one when reduced by the only modulus that matters for that value. Hence, without formal proof, the result is:

Limitations of CRT

Like all representations, the Chinese Remainder Theorem is not the end-all and be-all way to represent integers. It has real advantages for some tasks and serious shortcomings for other tasks. Frequently, algorithms need to perform operations for which CRT representations are good intermixed with operations for which they are poor. When this is the case, the application may need to convert back and forth between CRT and other representations with the inevitable increase in complexity and computational overhead. The algorithm designer must decide if performing such conversions is an acceptable cost for any gains achieved.

Inability to perform magnitude comparisons.

There is no simple way to determine if the integer represented by one CRT tuplet is larger than that represented by another tuplet.

Inability to represent arbitrary modulo worlds with arbitrary remainder moduli.

It is one thing to be able to represent integers up to a certain limit using CRT representations by choosing small integers for the remainder moduli that, when multiplied together, give an overall moduli larger than the needed limit. As long as all computations stay within this range, all is well. This is not unlike normal fixed-width integer representations where computations that violate the limits of the representation can produce erroneous results.

However, if the goal is to represent the residues in a particular modulo world then the remainder moduli chosen must be appropriate factors of that modulus. If the overall modulus is a prime number (or an integer power of a prime), then the CRT representation will end up being the same as the integer representation. If the overall modulus is a product of only two very large prime numbers - as is common in cryptography - then the remainder moduli will be very large numbers and much of the benefit of using CRT is lost. But, even if this is the case, the computational expense of working with enormous moduli is so high that even the relatively marginal gains achieved by exploiting CRT representations based on two semi-enormous moduli are often worth the additional overhead.

CRT Examples

Let's assume that we want to work with numbers modulo something in the vicinity of 25000. Since 2^3*3^3*5*7*11 is 27,720, we decide that this is a good enough modulus. Hence, each number in our world can be written as a sequence of five numbers (called a 5-tuple) where each number is, respectively, the number being represented modulo 5, 7, 8, 9, and 11. No single number in our system will exceed ten.

21,054

3,678

9,452

19,035

254

932

24,069

We could certainly do so directly, but let's see how we might use the Chinese Remainder Theorem to simply the majority of the work:

mod 27720	mod 5	mod 7	mod 8	mod 9	mod 11
21,054	4	5	6	3	0
3,678	3	3	6	6	4
9,452	2	2	4	2	3
19,035	0	2	3	0	5
254	4	2	6	2	1
9,32	2	1	4	5	8
24,069	4	3	5	3	1

Which would be easier to do, add up just the left hand column, or add up each of the other five columns? Keep in mind that each column must be reduced by the appropriate modulus. Without even resorting to a pencil and paper, let along a calculator, you can probably work out in your head that the result of adding the five right-most columns is {4, 4, 2, 3, 0}.

Of course, we do have to deal with the overhead of determining our coefficients in order to convert our 5-tuple back into an integer, but we only have to do this once. The result is:

Plugging the 5-tuple into the above equation yields N=23,034, which matches the result of directly adding the raw numbers.

Notice that only two of the moduli, 7 and 8, have all nonzero values for the remainders of all the numbers. These are therefore the only two remainders that will survive the multiplication process. This makes sense since the product of any list of numbers that includes one that is divisible by a particular modulus will also be divisible by that modulus.

In addition, even for a list that doesn't contain any zero remainder, as soon as the product of the remainders of any particular modulus become divisible by that modulus, the result will be zero from that point on. This can't happen for moduli that are prime, but for composite moduli, such as 8 in our example, it happens almost immediately. Thus we know before doing any significant computation at all that we will only have one modulus that has a non-zero remainder and we can calculate that one in our head by grouping remainders together in convenient sets.

So, by inspection, we can see that the result of multiplying all the numbers together is {0, 3, 0, 0, 0} and our result of 7920 is found quite easily.