The Kibitzer, Tim Harding. ChessCafe.com

This month, I want to talk about the PGN standard, which is widely used — and sometimes misused — on Internet chess sites. I apologise in advance to those readers who already know it all. I imagine that most readers are familiar with the basics of {GN, but for those who don’t, I begin with a quick overview.

PGN stands for Portable Game Notation. As the name suggests, it is designed to facilitate moving chess game-scores between different computer programs, and between computer programs and human beings. How it works will be explained later in the article.

The acronym PGN may refer either to the particular way in which a chess game-score is formatted to be readable by computer, or to a computer file which conforms to that standard. The suffix “.pgn” is used to indicate to computer software that the file it is reading is (or should be) a file conforming to the standard.

Any .pgn file should be a standard text-file, containing one or more PGN-conformant game-scores, and must have the suffix .pgn instead of .txt. The contents of the file are just algebraic notation game-scores, but with both the moves and header information (players’ names etc.) specified in a particular way. As most players beyond the beginner stage are familiar with algebraic notation, this makes PGN fairly easy to learn.

When a file’s contents conform to the standard but its name does not, it may be sufficient to rename the file. However, a PGN file cannot contain any font formatting so a file of PGN games that is currently in .DOC or .RTF format should be resaved as plain text with the .pgn extension.

Computer software that is programmed to handle PGN files (and that is most chess software nowadays) will then be able to process the file — i.e. it will then be able to display the chess game graphically or process it in some other way. For example, the excellent Palview3 program from (http://www.palamede.com) converts PGN files to Javascriptdriven HTML pages that display the games graphically. Most, if not all, Java Game Viewers found on the Internet take as their input a PGN file and display the game(s) in it on a dynamically-created web page.

The situation nowadays is much better than it was in the early days of chess software for desktop PCs (i.e. less than 20 years ago). The first ChessBase came out in the mid- 1980s, soon followed by NicBase, TascBase and Chess Assistant. There were also various “engine” programs that played chess, e.g. Fritz, Chess Genius, Hiarcs, Chessmaster and so on. However, in the early days it was very hard, if not impossible, to exchange game data and analysis between the various programs. Each software house used its own proprietary binary format. This kept file-sizes small, so that search operations (for example) would work quickly, but did not facilitate interchange of data.

PGN was created as a response to a demand among chessplayers and software-writers for a simple method, not tied to any commercial company, of exchanging chess games. As I understand it, the discussions evolved on the newsgroup rec.games.chess and the definition of the PGN standard was arrived at, but I am not sure exactly when.

If you do an Internet search for “PGN standard”, you will find various URLs where you can read and download the 1994 revision of the standard, which I believe is still currently valid. It is quite a long document, parts of which are now out of date, but the early sections defining the essentials of PGN are the important ones. You will probably also come across one or more pages making proposals to extend the format, a topic which I will come back to later in the article.

To save you searching, here is one URL where you can find the standard document: http://www.nmt.edu/~chess/doc/PGNStandard

However, a lot of this is technical and unless you intend to program PGN software, you don’t need to study it.

PGN has certainly been very successful. I see that attempts have been made to adapt it for other games, e.g. draughts and bridge, although I don’t know if these have been widely adopted. Bridge certainly presents more problems than chess as there are four players and each hand has a different starting position, but the proposed PBN (Portable Bridge Notation) standard seems to cope. There seems no reason in principle why any game that proceeds by discrete moves (as opposed to continuous play) could not be described by some kind of PGN formatting system.

A Simple Example

A PGN computer file consists of one or more game-scores conforming to the PGN standard. Each game is separated by a blank line. Within the file, each game is formatted as:

Header| Blank line| Game-score (in algebraic notation) and the game ends with the result specification: 1-0, 0-1, 1/2- 1/2, or an asterisk (*) in the case of an incomplete or unfinished game or a game where the result is unknown.

The result given in the headers should of course match the result at the end of the moves, which is also serves as a delimiter for the game. If the two results do not match, the processing of the file will depend on the software: it may crash, not crash but refuse to process the game, or it may take one result or the other as correct, or report no result.

Here is the PGN for a short recent game, showing the minimum information required to conform to the strict (“export”) standard of the PGN format.

[Event "Corus"]
[Site "Wijk aan Zee"]
[Date "2003.01.18"]
[Round "6"]
[White "Ivanchuk, Vassily"]
[Black "Anand, Viswanathan"]
[Result "1/2-1/2"]
1. c4 Nf6 2. d4 e6 3. Nf3 b6 4. g3 Ba6 5. Nbd2 Bb4
6. a3 Bxd2+ 7. Nxd2 Bb7 8.Nf3 d5 9. cxd5 Bxd5 10. Bg2 O-O 11. O-O Nbd7
12. Bf4 c5 13. dxc5 1/2-1/2

The game score part of this is easy enough to grasp. Long algebraic notation specifying the start square of the moved piece (e.g. 1. c2-c4) is allowable but not required. However, note White’s 5 th move. To avoid ambiguity, where two pieces of the same type can move to the same square, the initial file of the piece moved is given (Nbd2 not N1d2). If the file were the same then the rank would be given, i.e. for knights on b1 and b3, then it would be N1d2 or N3d2 as the case may be.

Note also, the way that captures and checks are indicated. Capture moves are denoted by the lower case letter "x" immediately prior to the destination square; pawn captures include the file letter of the originating square of the capturing pawn immediately prior to the "x" character. Pawn promotions are indicated by the equal sign "=" immediately following the destination square with a promoted piece letter (indicating one of knight, bishop, rook, or queen) immediately following the equal sign.

The capital letter O, not the numeric digit zero (0) is used for castling — for reasons explained in the definition document — but most software accepts 0 also. This is because the PGN standard is specified in two levels of strictness: the PGN export standard (which programs are expected to meet) and the more lax import standard (to allow for human error).

This means that PGN files whose content fails in various minor ways to meet the export standard can still be correctly read by much software, although this does vary from program to program. It is obviously desirable, when creating PGN files, to meet the stricter criteria of the export format if possible in order to achieve complete portability without loss of data. Later in this article, I shall look at some common mistakes that can cause a PGN file to fail altogether or (worse) lead to incorrect data getting into databases.

The PGN Headers

The header section looks complicated at first glance. It consists of seven or more “Tags” (pieces of data about the game-score that follows), and these should be on separate lines.

The Ivanchuk-Anand example above shows the Seven Tag Roster, i.e. the seven header items that every PGN game should have; supplemental (optional) tags are also quite common; some defined by the standard and others introduced as extensions in recent years.

A common cause of a PGN game score failing is the absence of one or more of the Seven Tag Rosters, usually the Round tag. Palview3, for example, cannot process a game where any of these is absent. If there is no data for a tag, a question mark should be placed between the quote marks.

The format for these header tags is:

Open square bracket| Tag name| space| Open double quote | data| End double quote| Close square bracket| New line.

Frequent causes of error in human-created PGN files are to begin a tag line with a space or indent. If the first character on the line is not an open square bracket, software will not read that line as a tag. Other common mistakes are to use the wrong kind of bracket, or to have no bracket at all, or to miss out the quote marks and use a colon (:) instead.

The most commonly used optional tags are ECO (the opening code), Annotator, WhitElo and BlackElo (for the ratings of the players) and EventDate (the start date of a tournament, which may differ from the compulsory Date information). There are many other possible ones, many of which are ignored by some software.

Some of the tags accept free-form data, i.e. any text which will not be read as a delimiter. Avoid bracket characters within the quotes, for example.

Other tags such as Result, Date and EventDate require appropriate data. Dates should be specified as the IvanchukAnand example, using question marks if the appropriate dates are unknown.

How It Works

Imagine for a moment that you are a computer program trying to make sense of a PGN file.

You have to perform a grammatical operation, called parsing, to make sense of that is essentially a string of unformatted characters.

Once you have done that, you “understand” what is in the PGN file and can do processing (e.g. look for games by Karpov in the file and generate a list of them). Or you can create output, which may be a graphic display of the game, a new PGN file that you can send to somebody else by email or on diskette, a printout of the game or (if you are Palview or a Java viewer program) an HTML page, or (if you a program like ChessBase) a record of the game in your own binary format.

PGN files may contain text that is not part of a chess game. For example, somebody may send me an email that includes one or more PGN scores. In my email program, I save it as a text file with a .pgn suffix. Then I open that PGN file in ChessBase.

ChessBase ignores all the non-chess headers and text like “Dear Tim, Here is my latest brilliant loss...”. Eventually it either encounters the end of file, in which case it has no games to display, or it finds a line beginning [Event “

If it finds such a line, then it reads all the header tags until the last square bracket is followed by moves instead of a new open bracket. Then it proceeds to read the moves and see if they are legal. If the game score is complete, with no ambiguities or illegal moves, then it displays the game correctly. If there are mistakes in the game score, it does its best. (See below for examples of what can go wrong.) Then it moves on to see if there are any more games in the file and repeats the process until it reaches the end of file.

The software program cannot exhibit any intelligence. It just follows the rules specified by its programmer who (we hope) has studied and implemented the PGN standard correctly. A good programmer will have made the program as fault-tolerant as possible, e.g. ChessBase can sometimes cope even if only a few of the Seven Tag Rosters are present, and if there are mistakes in the algebraic it will usually display the wrong moves as text notes so you can see that something has gone wrong

If you look the game-list of a PGN file in ChessBase 8, it won’t tell you if the games have notes or not. So if you import a PGN file into your database, look afterwards in the Chessbase (or equivalent) game list to see if there are notes where you don’t expect them, or if games are shown with no result that should have a result, or whether there are games with fewer moves indicated than you would expect. These are all symptoms that some truncation of the game has occurred, due to the database program encountering an illegal or ambiguous move or some other kind of formatting error.

Advantages and Limitations of PGN

A consequence of the fact that PGN files are basically text is that they can be opened and worked on not just in chess programs but also in word processor and text editor programs. This is a major advantage of PGN, especially for game archivists. This possibility is particularly useful when you want to do some “search-and-replace” operations to standardize header information, or perhaps you receive a file in Spanish or German algebraic and need to change the piece letters to K, Q, B, R and N.

However, it must be done with great care because you can easily mess up Tag names and Player/Event names when doing this kind of operation! Since it is not hard to make a fatal error when doing over-ambitious PGN editing, always make a back-up first!

An algebraic game score can be converted to PGN by cuttingand-pasting in a pro-forma set of headers above the moves, filling in the fields and then saving as PGN. This is fairly quick to do if there are no notes. ChessBase 8 users however can just cut-and-paste the moves from an email or website into an empty game window and then save, entering the name information; this is usually quicker than converting to PGN.

Another advantage of PGN is that there is no limit to the length of names. The International Correspondence Chess Federation (ICCF) uses PGN for its Online Game Archive partly for this reason. Long names, including country designations, can be stored in full. Thus the 12 th CC World Champion Sanakoev has the first name and patronymic Grigory Konstantinovich. But in ChessBase 8 (and earlier versions) there is no room to write Grigory Konstantinovich in full. If you import one of his games from PGN into ChessBase, his second name gets truncated. A removal of this limitation is badly needed: ChessBase programmers, please take note!

One of the limitations of PGN was just referred to. It expects English algebraic, which may not suit people in other countries. It is possible that some people have created, for example, Spanish-, German-, Italian, or French-language PGN reader software that expects the appropriate piece initials instead but the standard specifies English and portability (the essence of PGN) is lost if people use systems other than K, Q, R, B, N.

If we replace the names of Anand and Ivanchuk in the file above by hyphenated names, what happens?

[Event "Fictitious game"]
[Site "Moscow"]
[Date "1924.??.??"]
[Round "?"]
[White "Ilin-Genevsky, Alexander"]
[Black "Znosko-Borovsky"]
[Result "1/2-1/2"]
1. c4 Nf6 2. d4 e6 3. Nf3 b6 4. g3 Ba6 5. Nbd2 Bb4 6. a3
Bxd2+ 7. Nxd2 Bb7 8. Nf3 d5 9. cxd5 Bxd5 10. Bg2 O-O
11. O-O Nbd7 12. Bf4 c5 13. dxc5 1/2-1/2

Thus works OK in ChessBase, but when I receive PGN files that have been created by other programs (e.g. NicBase) and converted by non-PGN utilities, I have found cases where names have been corrupted, e.g. the hyphen in White’s surname has been treated as the separator between the names of the players and you end up with a game Ilin versus Genevsky. Doing format conversion through PGN instead of directly should avoid this problem, but I still try to avoid hyphens in my databases (e.g. my new MegaCorr3) to reduce the risk of this kind of accident. Conversion can also lead to corruption of names involving umlauts and other accents, if the creator of the file and the recipient have different languages and use different ASCII code pages on their computers.

The final main limitation of PGN is the file sizes. Because they are basically text files, they are unwieldy when they contain large number of games. A PGN file with 8,000 games (the largest size that the free ChessBase Light program will accept) can easily weigh in at 5 Megabytes — more if a lot of the games include notes. So large PGN files are less portable and they are also slow to work with. Chessbase, for example, allows some operations with PGN files but the full range of operations is only possible when they are converted to the program’s own binary format. Some software may not be able to handle large PGN files at all. So while PGN is all right for a relatively small archive such as ICCF’s, it is impracticable for databases with many tens of thousands, or hundreds of thousands of games.

Annotations in PGN

Annotations in PGN are tricky to implement “by hand”.

The standard permits three types of note to be added to chess moves: symbolic notes, text notes and recursive variations.

You cannot have a question mark or exclamation mark after a move in strict PGN. Symbolic notes are handled by Numeric Annotation Glyphs (NAGs) which consist of the dollar sign followed by a number which refers to the symbol in question. So 1. e2-e4 $1 gives an exclamation mark, 1 ... b7-b5 $2 awards Black’s move a question mark. All the common annotation symbols for moves and positions, and a lot of uncommon ones, have a corresponding NAG. See the definition document for 139 specified NAGs!

The second type of annotation is text comments. Text notes are placed after a move within curly brackets, thus:

1 ... b5 $2 {John mixed this game up with another one where his opponent had opened 1 d4.} 2 Bxb5 {Naturally I grabbed the pawn.}

Finally, move sequences (variations) must be enclosed in rounded parentheses, which can themselves contain the other types of note, thus:

1. e4 b5 (1 ... e6 $1 {is of course better.})

It is important to nest notes correctly. The round parenthesis for the variation begins, then after the NAG comes the text comment. Then the text comment is closed before the variation is closed.

We had better read what the definition document has to say about this important part of PGN:

An RAV (Recursive Annotation Variation) is a sequence of movetext containing one or more moves enclosed in parentheses. An RAV is used to represent an alternative variation. The alternate move sequence given by an RAV is one that may be legally played by first unplaying the move that appears immediately prior to the RAV. Because the RAV is a recursive construct, it may be nested.

Apart from the fact that it should be “A RAV” not “An RAV”, that explains it fairly well. Simple notes like the one above are easy to handle, because there is only one level of variation. However, when the nesting becomes complex, and/or variations as well as main moves also have text comments and symbolic glyphs, it can be hard to put all the right types of bracket and parenthesis in the right place.

Some software may find it hard to handle nested variations too. Palview3 was the first freely-available PGN software, that I am aware of, that could create with variations in HTML pages, and it is restricted to two levels of notes (three levels counting the actual moves of the game). For most purposes, this is sufficient. Most Java Viewers can only show the actually played moves, except as text notes, but playing through a Palview3 web page is close to the experience of playing it through in a program like ChessBase: a major advance for chess webmasters.

Unfortunately it is the sad lot of a chess writer/editor/archivist, to have to turn into PGN a game that is found as text. Not infrequently does the harassed chess author receive from a player (or find on a website) an annotated game that he would like to have in his database/publication/website. Alas, it is not supplied in ChessBase or PGN format, but rather is a long text file with complex notes, which are too detailed just to re-key into a database program. Unfortunately it can take almost as long to turn such a file into valid PGN that can be correctly readable by ChessBase as it would to enter the game ‘by hand’.

There is only one worse thing: that is to receive in the post a game with complex notes on paper, and see the dreaded words “ChessBase printout” or “Fritz printout” on top. If only the kind contributor had thought to send the Fritz or ChessBase electronic version by email or diskette, I lament as I reach for the whiskey bottle...

Creating a complex PGN game record with many recursive variations is really a task for a computer. ChessBase 8 can turn the most complex labyrinth of variations and comments into a correct PGN output in milliseconds. A human being should normally not attempt to do it.

It is usually simpler to make a copy of the file, strip out all the notes and then cut-and-paste the bare game score into ChessBase. Then add the most important notes by hand.

Typical Errors in PGN

There follows an example of practically everything that can — and does — go wrong with PGN files, especially when players who do not understand what is required make the attempt to create PGN. The following is based on a game report sent in by an ICCF Email Champions League player to his team captain recently, and forwarded to me.

Names have been changed because I have introduced a few extra errors for demonstration purposes, but I can assure you that most of the mistakes were present in the original! The others that I have added are also extremely common.

So this a particularly good example of pseudo-PGN. See how many mistakes you can find!

[Event "EM/CL/Q19-2"]
[White "Silva, ABC (BRA)"]
[Black "Player, Riccardo (ITA)"]
(Result "1/2 - ½)
1.d2-d4 ; Ng8-f6 2.c2-c4; e7-e6 3.Nb1c3; Bf8-b4 4.e2-e3; 0-
0 5.B1-d3;d7-d5 6.cd5, ed 7.Ng1-f3; Tf8-e8.0-0 ; Nb8-d7
9.Qd1-b3; Bb4xNc3 10.b2xc3;Cd7-b6 11.a2-a4; a7-a5
12.Nf3-e5; Nf6-g4 13.Ne5xg4;Bc8xg4 14.f2f3; Bg4-e6
15.Qb3-c2; Qd8-h4 16.e3-e4; Be6-d7 17.Bc1-e3; Te8-e7
18.Rf1-b1; Bd7-c6$1 9.Be3-f2;Qh4-g5 20.e4xd5; Nb6xd5
21.Bd3xh7+;Kg8-h8 22.h2-h4; Qg5-f4 23.Rb1-e1;Ra8-e8
24.Re1xe7; Re8xRe7 25.Bh7-e4;Nd5-e3 26.Qc2-e2;Ne3-g4
27.g2-g3; Qf4-d6 28.Qe2-d3 1-0

The following things are wrong with this score, which was evidently created “by hand” and not by software: There is a space at the start of the Event tag.

  • The Site, Round and Date tags are all missing.
  • The Result tag is in round parentheses instead of square brackets.
  • The Result tag is missing its closing quote marks.
  • For draws, half characters are not allowed; 1/2 must be used.
  • There should not be any spaces in 1/2 - 1/2 (or other results), either in the Result field or at the end of the score.
  • The result of the game differs in the header and the game score
  • There should be a blank line between the headers and the game score.
  • English initials should be used and certainly other language initials should not be mixed in (e.,g. Black’s 7th, 10th and 17th moves).
  • All White moves must be numbered. Thus in “Tf8- e8.0-0”, the number 8 is doing double duty as the destination square (e8) for the rook move and the move number for White’s reply. No software can cope with this.
  • Semicolons should not be used.
  • Other punctuation should not be used in the gamescore section either (just full stops after White move numbers, though most software tolerates the omission of these.)
  • All moves should be separated by spaces.
  • Where long algebraic is used, the hyphen is needed: 14.f2f3 should have been 14. f2-f3.
  • Redundant or incorrect characters cause problems, in.this case 5.B1d3 instead of 5 Bd3 causes ChessBase to go wrong after Black’s 4th move.
  • In the case of a capture, the captured piece should not be identified, e.g. Re8xRe7 at Black’s 24th. This should be just Re8xe7 or Rxe7.
  • The short form of pawn captures (see White and Black’s 6th) cannot be read by software either. These should read 6. cxd5 exd5.
  • NAGs must be separated by spaces from the move they refer to, so Black’s 17th move “Bd7-c6$1” should read “Bd7-c6 $1”.

I think that is a very impressive collection of mistakes — especially when there are no real notes in the game at all! You can imagine what chaos would result if the creator of that PGN had tried to include variations and text comments too!

Can We Extend/Improve/Replace PGN?

Different types of user require different things from their chess software. They may require different optional PGN tags or use them differently. Some commercial programs support more of the optional tags than others, or have different implementations.

For example, game archivists want to be able to store as much information as possible about a game. Especially, they would like to have the full names of the players and to have all game splayed in the one event together when the database is sorted.

People who work, as I do, primarily with correspondence games may see different issues from people who work primarily with games played face-to-face in OTB tournaments, or Internet events. Some PGN users are involved in the presentation of games on the Internet in realtime. They naturally want different things from what I want.This use of PGN was not foreseen in 1994 when the standard was last revised.

Some people think it is important to keep time control information, or want a separate field for the country or team of the player (though much software doesn’t support this yet). When a game gets converted from ChessBase (or similar program) to PGN, some of this background information gets lost, and when a very detailed set of nonstandard PGN headers comes into ChessBase, some of that information might not be read.

One example of what I mean is the EventDate optional tag where one can specify the date that play begins in a match or tournament which consists of more than one game per player. I have noted that some advocates of changing the PGN standard do not see the need for a separate EventDate field, so I shall explain the point of it in some detail.

This can be useful for both correspondence and other types of event. ChessBase 8, for example, when sorting by date looks at this field first. You can enter it directly in ChessBase when saving (or replacing) a game by opening the Details window; the date set here corresponds to EventDate in PGN, whereas the date in the main “save” window corresponds to the normal “Date” field, so the two dates can be different.

Sets of games with the same event title and same event date are kept together, but if the EventDate is absent then the sorting is done by the main date field. My normal practice with corrspondence games (which can take months or years to complete) is to set the event date (in full or at least the year) to when the game started and the main date to the same year. Other archivists prefer to put the date the game is finished (or result reported) into the main date field, which works OK for ChessBase but in other software can lead to tournaments becoming fragmented in a database.

Even with OTB tournaments, typically played over a weekend, week or fortnight, the separate EventDate is valuable. You can understand that on any particular weekend, numerous tournaments are being started around the world and thousands of games are being played. If there were no EventDate to keep the games of an event together, but the actual playing date of a game was entered in full, then large databases would soon become badly disorganized. Under Saturday February 1, 2003 all the games played at Wijk aan Zee would be together with all games played on the same date in other events in Germany, USA, England etc. etc. Then the games of the next round in Wijk aan Zee would follow, separated from the previous round by hundreds of unrelated games.

So I say, please keep the EventDate field and use it as intended!

If your are interested in possible extensions/ improvements to the PGN standard (which, after all) has not been revised for almost a decade, you may like to go the following URL.

There you can read a document from 2001 entitled “An Extension to the Pgn Standard: Portable Game Notation Specification and Implementation Guide (Supplement...Final draft).

The authors of this document are named as: Alan Cowderoy (Palamede), Ben Bulsink (DGT Projects), Andrew Templeton (Palamede/Palview), Eric Bentzen (Enpassant.dk, Palamede), Mathias Feist (Chessbase), Victor Zakharov (Chess Assistant). To make comments, please contact alan@cowderoy.com with a copy to me c/o ChessCafe.com.

The above document, and Mr Cowderoy’s website, also mention that various attempts have been made with XML to extend or replace the PGN standard, but so far nobody has shown much interest in those. Extending/improving PGN does seem to be the way to go, but it has to be done carefully in order not to “break” current software that relies on implementing the 1994 standard. Since Steven Edwards and the other people responsible for drafting the standard originally may no longer be able to help this work, I thought it would be a good idea to mention this debate here. However, I am not a programmer so won’t comment on technical issues.

Footnote

This column, the 84th edition of The Kibitzer, means that the series has now been running for seven years. In that time I have dealt with a wide variety of topics: opening theory (especially 1 e4 e5 openings), chess history and politics, correspondence chess and chess on the Net, some chess personalities and a few other matter besides.

In the beginning, GM Hans Ree and I were the only monthly columnists at ChessCafé.com but the site has grown and succeeded so well that there are now many columnists, covering a wide variety of subjects, and it is getting harder and harder to find a subject that has not been covered by somebody else already! I also detect a tendency for the columns to get longer and longer, which is perhaps not altogether desirable.

Nevertheless, The Kibitzer is not signing off just yet. I am always willing to listen to suggestions to future columns from readers, so if you have an idea, by all means please send it in. I actually do have in hand one good suggestion from Gerald Grimsley PhD to write on the Sokolsky Opening (1 b4) and I do intend to cover that later in the year. So if you have played some interesting 1 b4 games, please send them in soon.

However, I don’t want The Kibitzer to be just an openings column, so I am looking for other kinds of topic too.

Copyright 2003 Tim Harding. All rights reserved.