This page is no longer maintained: I no longer have access to real DOS or Win9x.

The information here is obsolete - don't even try to apply it to Win9x or NT series batch files.

Condensed Code

This is not a "How To" - I don't have enough use for condensed code to have developed a personal style, and after all, this book is really about my personal approach to batch programming. Rather than a tutorial, this is a bit of history and comment.

In the old days - the days of floppies, small slow hard drives, enhancement rather than loss of functionality with each new OS version, and slow CPU chips - techniques were devised to improve the extremely slow execution speed of batch programs. Most of the techniques focused on combining commands on the same lines and other techniques that reduced disk activity. Modern equipment, and (multiple and incompatible) operating systems that use disk caching have removed most of the incentive for compressing code beyond removal of inactive lines. In fact, a high percentage of the tricks once used are now liabilities because the code is unnecessarily broken in one or more of the current operating systems.

Condensing code was made desirable and possible because of the way COMMAND.COM reads batch files and parses what it has read. The read process is an open-read-close cycle repeated for each line in the file. Without disk caching, this can be very slow - with disk caching, it can be quite fast. The parsing routines had/have so many bugs and oddities that most versions allow using the warts and bugs to put multiple commands on the same line. Most of these are undocumented and are therefore unreliable across versions.

Recently, I responded to a question about condensing code with this message (note that there a few Win9x and NT only features, particular the recommendation to use EXIT - the user was asking about techniques in Win95):

First, a few thoughts about condensing code - they are almost certain to spark some sort of religious objections from some people, but you did ask for theory.

There are several styles used to compress the most functionality into the fewest lines - few of them work in all current "DOS" versions. In the old days of floppy based PCs and small but slow HDDs, condensing code bought considerable improvements in speed and were often well worth the effort. Now, in the days of fast HDDs and automatic disk caching, there is little or no gain in performance, but the cost in difficulty writing the code (and understanding once it gets cold) remains the same. On modern computers, condensed code is often written more as a hobby then as a productivity enhancement (the milliseconds saved are often overwhelmed by the minutes or hours required to make highly condensed code work right - if it takes you an hour to write a condensed program that takes ten minutes to write in simple form, and the condensation reduces the execution time by one second, the program has to be used 3000 times to break even). In my work, I simply can't spare the time to condense much beyond the point where the condensation reflects the logical structure of the program. Performance seldom matters to me because I use multitasking operating systems and run big tasks on a machine other than my primary desktop system, or the performance hit is spread over many different users.

The approaches to condensation usually involve one or more of these techniques: using FOR to implement repetitive commands, combining multiple commands on the same line, using recursion to reduce the number of files required by placing subroutines in the same file as the main program (usually combined with FOR). Less often used techniques involve what amounts to self modifying code and using all the possible values of a variable as labels. There are many other techniques that can be used to tweak the program to remove a few more bytes.

The usual object of the exercise is to reduce the number of disk accesses: COMMAND.COM reads the file one line at a time (in most cases) and closes the program between reads. If the program and data files will all fit in the cache, little is to be gained by reducing the number of lines and next to nothing by small changes in the number of bytes. On the other hand, if that is not the case, then condensing the program to make it fit in the cache does save at least one disk access (assuming a modern, fast processor, of course - microseconds seldom matter very much, but they can be saved by taking advantage of details in the way the command interpreter parses lines, but at the expense of portability to other command interpreters and other "DOS" versions).

Using my preferred style, your example would be something along the lines of

 @echo off
 if %1!==}{! goto %2
 for %%a in (10 11 12 13 14 15 16) do call %0 }{ pass2 %%a
 goto end
 :pass2
 find /c "h%3" }a{ > nul
 if not errorlevel 1 goto end
   code to process the case where the string is found
 exit
 :end
Explanation:

The program is recursive: for each item in the FOR list, the program (the command that launched it is %0) is reinvoked with a recursion marker argument ( "}{" ), the name of the subroutine ( "pass2" ) that does the processing, and the value to be processed. There is a performance hit with recursion that can be significant if the file is large since the entire file will have to be read for each pass - if it is in cache, this is usually minor, but it can be avoided by placing the subroutine in a separate file, providing that the second file will also remain in cache.

FOR is used to manage the list of items - just the variable part is used in the list. The program calls itself once for each item in the list so in effect, it calls a subroutine to process the variable.

In the subroutine, the only things I have compressed are the amount of stuff written to STDOUT, and therefore the number of bytes redirected to the bit bucket (the "/c" switch to FIND), and the ERRORLEVEL test (since ERRORLEVEL 0 is always true, there is no point in testing for it).

The EXIT command in the subroutine terminates the batch program after processing the first match. It doesn't have to be in exactly that place, it just has to be executed before the :end label is reached, that is, at then end of the code that is used if a match is found but before the end of the file. If it's not present, the program will continue to process the remaining items until the FOR command completes and the following GOTO end is encountered.

That approach to condensation is the one that emphasis logical structure, portability, clarity, and small file size rather than maximum execution speed. Condensation for flat-out speed would necessarily sacrifice all of those because it would require that the loop be unrolled into what amounts to repeating the subroutine code once for each test, and would be based on version specific parsing behavior. What you had, after removal of the unnecessary test for ERRORLEVEL 0, would be pretty close to fastest code if we assume that disk caching is in effect, but it's OS specific and somewhat difficult to read.

Extending that FOR to the complete list would also likely make it OS specific, but that can be handled by making that FOR (using numbers 0-9) part of a higher level subroutine that works on a list of first digits, and combining them as %3%%a in the second FOR.

  ** Copyright 1995, 1996, 1997, 1998, 1999, 2000, 2001 Ted Davis - see License, included by reference. ** 

Input and feedback from readers are welcome. NOTE: the subject of the message must contain the word "batch" for the message to get past the spam filter.

Back to the Table of Contents page

Back to my personal links page - back to my home page