Alphabet letters not used in Microsoft product keys

LAMP90 · Sep 16, 2007

On Microsoft and most other 25 character-based product keys, I know not all
26 letters of the alphabet are used, but cannot find documentation as to
which ones are suppressed. Suspect that I and O are out, as they look much
like numbers 1 and 0, for instance.
Any pointers

Paul Adare · Sep 16, 2007

On Sun, 16 Sep 2007 02:02:00 -0700, LAMP90 wrote:

> On Microsoft and most other 25 character-based product keys, I know not all
> 26 letters of the alphabet are used, but cannot find documentation as to
> which ones are suppressed. Suspect that I and O are out, as they look much
> like numbers 1 and 0, for instance.
> Any pointers

What possible use would this information be and how does it relate to
security?
I highly doubt that you're going to find any detailed documentation on
Microsoft product keys for what should be fairly obvious reasons.
--
Paul Adare
MVP - Virtual Machines
http://www.identit.ca
Any given program will expand to fill available memory.

Frank Saunders, MS-MVP OE/WM · Sep 16, 2007

"Paul Adare" . wrote in message
news:1pgg22gkvv1aa.1qfq6go3mrxq5.dlg@40tude.net...
> On Sun, 16 Sep 2007 02:02:00 -0700, LAMP90 wrote:
>
>> On Microsoft and most other 25 character-based product keys, I know not
>> all
>> 26 letters of the alphabet are used, but cannot find documentation as to
>> which ones are suppressed. Suspect that I and O are out, as they look
>> much
>> like numbers 1 and 0, for instance.
>> Any pointers
>
> What possible use would this information be and how does it relate to
> security?
> I highly doubt that you're going to find any detailed documentation on
> Microsoft product keys for what should be fairly obvious reasons.

It would be useful because the Product Keys are often printed in a
hard-to-read font.

--
Frank Saunders, MS-MVP OE/WM
Do not send mail.

Paul Adare · Sep 16, 2007

On Sun, 16 Sep 2007 06:11:01 -0500, Frank Saunders, MS-MVP OE/WM wrote:

> "Paul Adare" . wrote in message
> news:1pgg22gkvv1aa.1qfq6go3mrxq5.dlg@40tude.net...
>> On Sun, 16 Sep 2007 02:02:00 -0700, LAMP90 wrote:
>>
>>> On Microsoft and most other 25 character-based product keys, I know not
>>> all
>>> 26 letters of the alphabet are used, but cannot find documentation as to
>>> which ones are suppressed. Suspect that I and O are out, as they look
>>> much
>>> like numbers 1 and 0, for instance.
>>> Any pointers
>>
>> What possible use would this information be and how does it relate to
>> security?
>> I highly doubt that you're going to find any detailed documentation on
>> Microsoft product keys for what should be fairly obvious reasons.
>
>
> It would be useful because the Product Keys are often printed in a
> hard-to-read font.

I've done thousands of installs with PIDs and have never been blocked from
completing an install because I couldn't decipher a PID.

--
Paul Adare
MVP - Virtual Machines
http://www.identit.ca
Netnews is like yelling, "Anyone want to buy a used car?" in a crowded
theater.

Alex K. Angelopoulos \(MVP\) · Sep 16, 2007

Of the 36 Latin letters and Arabic numerals used in most Western European
languages, 24 appear to be used for current product codes and 12 are not.

The 24 used are:
2346789BCDFGHJKMPQRTVWXY

The 12 unused are:
015AEILNOSUZ

Depending on the font and letter case used, this reduced set is generally
unambiguous, but knowing which members it contains definitely can be useful.
Microsoft generally uses uppercase for printing/display of the keycodes, and
it does look like these were specifically selected to minimize chance of
confusion and also, I believe, to avoid accidentally producing
offensive-looking character sequences. The pattern is easiest if you look at
the unused characters like this:

015 AEIOU LNSZ

The numerals 015 could all be confused with letters easily. By eliminating
AEIOU, the chances of producing something that looks like a word in almost
any language using the Latin alphabet are minimal. The 4 unused consonants,
LNSZ, further eliminate possible confusion if you aren't aware of the
character set, have written them in lowercase, or have printed them in a
fairly blocky typeface - without S you won't try to write 5, and without Z
you always know that a similar shape is really 2.

By the way, being told this set really doesn't help anyone who would be
performing a brute-force crack attempt to generate key codes. It isn't just
that the base of possible codes is large (about 3.2 * 10^34). With the
length of individual key codes, if you have access to at least half a dozen
Microsoft keycodes, anyone who can do simple character sorting can not only
tell what the base character set is, but that the distribution is
approximately random.

"LAMP90" <lamp90@news.postalias> wrote in message
news:86FD6E73-1CAF-454F-9756-114EB7F60D3D@microsoft.com...
> On Microsoft and most other 25 character-based product keys, I know not
> all
> 26 letters of the alphabet are used, but cannot find documentation as to
> which ones are suppressed. Suspect that I and O are out, as they look
> much
> like numbers 1 and 0, for instance.
> Any pointers

Joan Archer · Sep 16, 2007

I've never had any problems reading them but they leave in the 24 used set
the number 8 and letter B which people are told not to confuse as they are
similar.
I would have thought they could have been left out so as not to cause
problems.
Joan

Alex K. Angelopoulos (MVP) wrote:
> Of the 36 Latin letters and Arabic numerals used in most Western
> European languages, 24 appear to be used for current product codes
> and 12 are not.
> The 24 used are:
> 2346789BCDFGHJKMPQRTVWXY
>
> The 12 unused are:
> 015AEILNOSUZ
>
> Depending on the font and letter case used, this reduced set is
> generally unambiguous, but knowing which members it contains
> definitely can be useful. Microsoft generally uses uppercase for
> printing/display of the keycodes, and it does look like these were
> specifically selected to minimize chance of confusion and also, I
> believe, to avoid accidentally producing offensive-looking character
> sequences. The pattern is easiest if you look at the unused
> characters like this:
> 015 AEIOU LNSZ
>
> The numerals 015 could all be confused with letters easily. By
> eliminating AEIOU, the chances of producing something that looks like
> a word in almost any language using the Latin alphabet are minimal.
> The 4 unused consonants, LNSZ, further eliminate possible confusion
> if you aren't aware of the character set, have written them in
> lowercase, or have printed them in a fairly blocky typeface - without
> S you won't try to write 5, and without Z you always know that a
> similar shape is really 2.
> By the way, being told this set really doesn't help anyone who would
> be performing a brute-force crack attempt to generate key codes. It
> isn't just that the base of possible codes is large (about 3.2 *
> 10^34). With the length of individual key codes, if you have access
> to at least half a dozen Microsoft keycodes, anyone who can do simple
> character sorting can not only tell what the base character set is,
> but that the distribution is approximately random.
>
>
> "LAMP90" <lamp90@news.postalias> wrote in message
> news:86FD6E73-1CAF-454F-9756-114EB7F60D3D@microsoft.com...
>> On Microsoft and most other 25 character-based product keys, I know
>> not all
>> 26 letters of the alphabet are used, but cannot find documentation
>> as to which ones are suppressed. Suspect that I and O are out, as
>> they look much
>> like numbers 1 and 0, for instance.
>> Any pointers

LAMP90 · Sep 16, 2007

Thanks, Frank. You saved the trouble of answering this post myself.

"Frank Saunders, MS-MVP OE/WM" wrote:

> "Paul Adare" . wrote in message
> news:1pgg22gkvv1aa.1qfq6go3mrxq5.dlg@40tude.net...
> > On Sun, 16 Sep 2007 02:02:00 -0700, LAMP90 wrote:
> >
> >> On Microsoft and most other 25 character-based product keys, I know not
> >> all
> >> 26 letters of the alphabet are used, but cannot find documentation as to
> >> which ones are suppressed. Suspect that I and O are out, as they look
> >> much
> >> like numbers 1 and 0, for instance.
> >> Any pointers
> >
> > What possible use would this information be and how does it relate to
> > security?
> > I highly doubt that you're going to find any detailed documentation on
> > Microsoft product keys for what should be fairly obvious reasons.
>
>
> It would be useful because the Product Keys are often printed in a
> hard-to-read font.
>
> --
> Frank Saunders, MS-MVP OE/WM
> Do not send mail.
>

LAMP90 · Sep 16, 2007

Thanks so much for your response, Alex. I do concurr with your idea of
readability and also avoiding "offensive" words in any Latin-alphabet
language. As for its random distribution, cryptographic theory says that a
good cypher output is supposed to have such a random distribution.

By the way, that is the reason it is a very bad idea to encrypt first and
then try to compress a stream of plaintext. And, it is a very good idea to
compress first and then encrypt, as compression also increases randomness of
plaintext prior to encryption.

To clarify for other participants, and as you yourself put it, my question
is actually related to security, but not in an obvious way.

Product keys are a way of access control: only those who are entitled are
supposed to get them, and they must be otherwise hard to "hack". Thus, you
have access control. And, access control is part of security, so, out of all
themes in existing microsoft newsgroups, this is the closest one I found
where it would be appropriate to ask my question.

Access control is not just about restricting those who are not entitled, but
ensure access to those who are.

It would be nice if there was already some written guidelines for this
subject. I still cannot find anything about it.

In the Engineering firm that I worked, we also had rules about not using
certain characters of the alphabet because of the possible confusion you so
aptly pointed out. Those rules were pretty well established back in the
'50s it is just that nobody was able to tell me where those rules came from.

So, suppressing certain characters seem to come from guidelines or best
practices coded long before cryptography-based product keys started to be
used by Microsoft.

As to the purpose of my question, it is mostly general knowledge.

I am building my own product key generator for my personal internal use, to
have some fun research with public-key cryptography, and to allow multiple
(more than 2) recovery agents for the key pairs I generate (PK crypt theory
says it is possible). Don't care to find valid MS product keys for any
product (unlike some hacker that recently made news with Windows Vista, that
you seem to implicitly refer to)

And, as you rightly pointed out, the letters and digits being suppressed
allow for better readability. I wanted my keys to also be at least as
user-friendly as Microsoft's, and did not feel like reinventing the wheel and
reinventing it square!

On the other hand, neither Microsoft nor anyone else can claim patents or
any other intellectual property on this, because I do have access to prior
art (in the form of engineering specifications and best practices) that are
already in the public domain and that implicitly show this readability
guideline.

Finally, I think I will stick to all digits and prune look-alike characters.
Will leave the vowels alone for now.

Again, thanks so much for your response.

P.S.: I thought MS would use a power-of-two character set for their product
keys, like 32 characters, since it maps much easier into binary than 24
characters. I guess readability trumped over convenience!

"Alex K. Angelopoulos (MVP)" wrote:

> Of the 36 Latin letters and Arabic numerals used in most Western European
> languages, 24 appear to be used for current product codes and 12 are not.
>
> The 24 used are:
> 2346789BCDFGHJKMPQRTVWXY
>
> The 12 unused are:
> 015AEILNOSUZ
>
> Depending on the font and letter case used, this reduced set is generally
> unambiguous, but knowing which members it contains definitely can be useful.
> Microsoft generally uses uppercase for printing/display of the keycodes, and
> it does look like these were specifically selected to minimize chance of
> confusion and also, I believe, to avoid accidentally producing
> offensive-looking character sequences. The pattern is easiest if you look at
> the unused characters like this:
>
> 015 AEIOU LNSZ
>
> The numerals 015 could all be confused with letters easily. By eliminating
> AEIOU, the chances of producing something that looks like a word in almost
> any language using the Latin alphabet are minimal. The 4 unused consonants,
> LNSZ, further eliminate possible confusion if you aren't aware of the
> character set, have written them in lowercase, or have printed them in a
> fairly blocky typeface - without S you won't try to write 5, and without Z
> you always know that a similar shape is really 2.
>
> By the way, being told this set really doesn't help anyone who would be
> performing a brute-force crack attempt to generate key codes. It isn't just
> that the base of possible codes is large (about 3.2 * 10^34). With the
> length of individual key codes, if you have access to at least half a dozen
> Microsoft keycodes, anyone who can do simple character sorting can not only
> tell what the base character set is, but that the distribution is
> approximately random.
>
>
> "LAMP90" <lamp90@news.postalias> wrote in message
> news:86FD6E73-1CAF-454F-9756-114EB7F60D3D@microsoft.com...
> > On Microsoft and most other 25 character-based product keys, I know not
> > all
> > 26 letters of the alphabet are used, but cannot find documentation as to
> > which ones are suppressed. Suspect that I and O are out, as they look
> > much
> > like numbers 1 and 0, for instance.
> > Any pointers
>
>

Frank Saunders, MS-MVP OE/WM · Sep 16, 2007

"Paul Adare" . wrote in message
news:t8dwvnzgxmj1$.1rkavhn0awpjq.dlg@40tude.net...
> On Sun, 16 Sep 2007 06:11:01 -0500, Frank Saunders, MS-MVP OE/WM wrote:
>
>> "Paul Adare" . wrote in message
>> news:1pgg22gkvv1aa.1qfq6go3mrxq5.dlg@40tude.net...
>>> On Sun, 16 Sep 2007 02:02:00 -0700, LAMP90 wrote:
>>>
>>>> On Microsoft and most other 25 character-based product keys, I know not
>>>> all
>>>> 26 letters of the alphabet are used, but cannot find documentation as
>>>> to
>>>> which ones are suppressed. Suspect that I and O are out, as they look
>>>> much
>>>> like numbers 1 and 0, for instance.
>>>> Any pointers
>>>
>>> What possible use would this information be and how does it relate to
>>> security?
>>> I highly doubt that you're going to find any detailed documentation on
>>> Microsoft product keys for what should be fairly obvious reasons.
>>
>>
>> It would be useful because the Product Keys are often printed in a
>> hard-to-read font.
>
> I've done thousands of installs with PIDs and have never been blocked from
> completing an install because I couldn't decipher a PID.

Never had to go back and retype the ID? I have several times.

--
Frank Saunders, MS-MVP OE/WM
Do not send mail.

Ken Zhao [MSFT] · Sep 16, 2007

Hello,

Thanks for all guys' great information sharing.

Thanks & Regards,

Ken Zhao

Microsoft Online Support
Microsoft Global Technical Support Center

Get Secure! - www.microsoft.com/security <http://www.microsoft.com/security>
====================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
====================================================
This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
| Thread-Topic: Alphabet letters not used in Microsoft product keys
| thread-index: Acf4QD3yvUcd+FWOQmSKVFY4220DKw==
| X-WBNR-Posting-Host: 207.46.193.207
| From: =?Utf-8?B?TEFNUDkw?= <lamp90@news.postalias>
| Subject: Alphabet letters not used in Microsoft product keys
| Date: Sun, 16 Sep 2007 02:02:00 -0700
| Lines: 5
| Message-ID: <86FD6E73-1CAF-454F-9756-114EB7F60D3D@microsoft.com>
| MIME-Version: 1.0
| Content-Type: text/plain
| charset="Utf-8"
| Content-Transfer-Encoding: 7bit
| X-Newsreader: Microsoft CDO for Windows 2000
| Content-Class: urn:content-classes:message
| Importance: normal
| Priority: normal
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2929
| Newsgroups: microsoft.public.security
| Path: TK2MSFTNGHUB02.phx.gbl
| Xref: TK2MSFTNGHUB02.phx.gbl microsoft.public.security:4723
| NNTP-Posting-Host: tk2msftsbfm01.phx.gbl 10.40.244.148
| X-Tomcat-NG: microsoft.public.security
|
| On Microsoft and most other 25 character-based product keys, I know not
all
| 26 letters of the alphabet are used, but cannot find documentation as to
| which ones are suppressed. Suspect that I and O are out, as they look
much
| like numbers 1 and 0, for instance.
| Any pointers
|

Alex K. Angelopoulos \(MVP\) · Sep 17, 2007

"Joan Archer" <archer_joan@NOSPAM.com> wrote in message
news:OnShoEK%23HHA.4568@TK2MSFTNGP02.phx.gbl...
> I've never had any problems reading them but they leave in the 24 used set
> the number 8 and letter B which people are told not to confuse as they are
> similar.
> I would have thought they could have been left out so as not to cause
> problems.
> Joan
>

That's the one pair that bothered me the most - B/8 really are easy to
confuse if your eyes or the font are less than perfect!
I have noticed that they seem to use a font where the 8 looks like two
stacked squares, with the top one smaller than the bottom one, possibly to
help distinguish this pair.

Alex K. Angelopoulos \(MVP\) · Sep 17, 2007

"LAMP90" <lamp90@news.postalias> wrote in message
news:BBE84F69-152B-4B29-8230-2989E0DB3C81@microsoft.com...
> Thanks so much for your response, Alex. I do concurr with your idea of
> readability and also avoiding "offensive" words in any Latin-alphabet
> language. As for its random distribution, cryptographic theory says that
> a
> good cypher output is supposed to have such a random distribution.

Yes. My reason for explicitly mentioning that was to drive home the fact
that posting this information is going to have a negligible benefit for
someone trying to crack keys.

> It would be nice if there was already some written guidelines for this
> subject. I still cannot find anything about it.

I'm not too surprised about this, but that's because I spent some time
talking about this issue with a friend who went to school in human/computer
interactions. The ability to distinguish characters is a function of the
font used and of how it is rendered - on screen or in print. On top of that,
people from different regions also have certain expectations that influence
legibility, even if using the same character set. That makes it hard to come
up with any solid rules that don't change.

> In the Engineering firm that I worked, we also had rules about not using
> certain characters of the alphabet because of the possible confusion you
> so
> aptly pointed out. Those rules were pretty well established back in the
> '50s it is just that nobody was able to tell me where those rules came
> from.

I suspect they were 'common knowledge' developed over time and specifically
oriented towards the style of handwritten block characters used by
draftsmen. You've probably figured that yourself. )

> So, suppressing certain characters seem to come from guidelines or best
> practices coded long before cryptography-based product keys started to be
> used by Microsoft.

I hadn't thought about the historicity, but that makes sense as far as
choosing the particular sets to avoid (other than the vowels). The kicker is
the different typefaces issue.

> Finally, I think I will stick to all digits and prune look-alike
> characters.
> Will leave the vowels alone for now.

that theoretically means you could restore 1, 5, and maybe 0, and kill off
the letter B.

> P.S.: I thought MS would use a power-of-two character set for their
> product
> keys, like 32 characters, since it maps much easier into binary than 24
> characters. I guess readability trumped over convenience!

I found that odd as well, but then I realized that it probably doesn't
matter much. We're not talking about a lot of data transmission and storage
here.

LAMP90 · Oct 7, 2007

You know? It occurs to me that both the font and the alphabet that MS
(Microsoft) uses in its product key is optimized for OCR (Optical Character
Recognition). Different fonts have different, say, "error clusters", sets of
letters that tend to be confused (or more exactly, with a higher probability
of OCR recognition error), like "O" and "0". Suppressing all but one member
of each "error cluster" from the alphabet would then lower OCR recognition
errors.

Since these error clusters vary from one font to another, the optimal
alphabet for each font would vary.

"Alex K. Angelopoulos (MVP)" wrote:

>
> "LAMP90" <lamp90@news.postalias> wrote in message
> news:BBE84F69-152B-4B29-8230-2989E0DB3C81@microsoft.com...
> > Thanks so much for your response, Alex. I do concurr with your idea of
> > readability and also avoiding "offensive" words in any Latin-alphabet
> > language. As for its random distribution, cryptographic theory says that
> > a
> > good cypher output is supposed to have such a random distribution.
>
> Yes. My reason for explicitly mentioning that was to drive home the fact
> that posting this information is going to have a negligible benefit for
> someone trying to crack keys.
>
> > It would be nice if there was already some written guidelines for this
> > subject. I still cannot find anything about it.
>
> I'm not too surprised about this, but that's because I spent some time
> talking about this issue with a friend who went to school in human/computer
> interactions. The ability to distinguish characters is a function of the
> font used and of how it is rendered - on screen or in print. On top of that,
> people from different regions also have certain expectations that influence
> legibility, even if using the same character set. That makes it hard to come
> up with any solid rules that don't change.
>
>
> > In the Engineering firm that I worked, we also had rules about not using
> > certain characters of the alphabet because of the possible confusion you
> > so
> > aptly pointed out. Those rules were pretty well established back in the
> > '50s it is just that nobody was able to tell me where those rules came
> > from.
>
> I suspect they were 'common knowledge' developed over time and specifically
> oriented towards the style of handwritten block characters used by
> draftsmen. You've probably figured that yourself. )
>
> > So, suppressing certain characters seem to come from guidelines or best
> > practices coded long before cryptography-based product keys started to be
> > used by Microsoft.
>
> I hadn't thought about the historicity, but that makes sense as far as
> choosing the particular sets to avoid (other than the vowels). The kicker is
> the different typefaces issue.
>
> > Finally, I think I will stick to all digits and prune look-alike
> > characters.
> > Will leave the vowels alone for now.
>
> that theoretically means you could restore 1, 5, and maybe 0, and kill off
> the letter B.

>
> > P.S.: I thought MS would use a power-of-two character set for their
> > product
> > keys, like 32 characters, since it maps much easier into binary than 24
> > characters. I guess readability trumped over convenience!
>
> I found that odd as well, but then I realized that it probably doesn't
> matter much. We're not talking about a lot of data transmission and storage
> here.
>
>

Alphabet letters not used in Microsoft product keys

LAMP90

Paul Adare

Frank Saunders, MS-MVP OE/WM

Paul Adare

Alex K. Angelopoulos \(MVP\)

Joan Archer

LAMP90

LAMP90

Frank Saunders, MS-MVP OE/WM

Ken Zhao [MSFT]

Alex K. Angelopoulos \(MVP\)

Alex K. Angelopoulos \(MVP\)

LAMP90