Deleting Files by Date

This page is a supplement to AWK Scripts: Date and Time. It deals with the very large problem of deleting files based on elapsed time or by specified date. These examples are built around free clones of the AWK language - see Multilingual Batch Programs for information on versions and availability. All of the AWK scripts here work with the GAWK version, most also work with MAWK. Where the command in a batch file uses AWK, either should work; where GAWK is used, MAWK may not (or definitely doesn't) work. With extreme difficulty in pure batch language because it involves arithmetic operation on or magnitude comparison of dates, and batch language does not lend itself to that sort of computation, though some higher level languages do.

One way is to convert the dates into Julian date format and operate on that. This is non-trivial, but a couple of simplifications can be made if certain assumptions are valid:
Since we are dealing with PCs, no files have dates earlier than 1 Jan 1980
The user has a Y2K compatible operating system
Date stamps are in two digit year format
Nobody will use this code after 31 Dec 2079
The simplifications are possible if we limit the range to dates to those between 1 Jan 1980 and 31 Dec 2079 - this removes the need to deal with centuries that are not leap years and allows the assumption that years between 00 and 79 are actually 2000 through 2079. It's still a lot of work. If we have the target date in date format (rather than as a relative date) we can simplify it even more.

Julian dates are defined differently in different contexts. In the computer world the term is often used to refer to the elapsed time in seconds since some reference date (epoch) such as 1 Jan 1970 or 1 Jan 1980. Since for our immediate purposes a resolution of anything smaller than a day is overkill, and we are free to redefine the calendar to make things easier for comparisons, fake Julian dates in yyyymmdd format can be compared directly, as if all months had 99 days and all years had 99 months - the missing days and months cause no trouble; their values are just wasted entries in a sparse one dimensional ordered list.

Another approach is to sort directory entries on the basis of date and, depending on the direction of ordering, start or stop deleting when the specified date, or an older or newer date appears. If it could be guaranteed that a file with every single date would be present, it would not be necessary to use the difficult greater-than or less-than comparison. Because of differences in DIR format listings, this has problems dealing with all the target operating systems with the same code.

XCOPY can be used to make copies of files with dates newer than a given one, and everything else in the original directory erased (or everything erased and the saved files copied back). The latter variant increases the risk of file corruption, and both methods can consume more disk space than may be available. Both of these can be accomplished in pure batch language with little difficulty, but considerable risk if the files take up much room and the drive is approaching full capacity. If XCOPY would copy to the null device, we could just do that and redirect the listing into a list of files not to copy. Unfortunately, it won't.

By now, you should have a feel for the scale of the problem when you don't have the right tools available. The best we can do with the tools used in these essays is a combination of the fake Julian date and sorted DIR listing approach, and that will break in the next section when we consider dealing with relative dates instead of absolute dates (so many days old instead of older than a specific date).

The first part of this essay dealt with formatting today's date, but that is hardly ever what is wanted for file deletion. We must now get into what is really text manipulation, but is included here because of the date aspect of the strings to be manipulated.

We can let an AWK script split a directory listing into fields using the standard field separator, extract the filename and date fields, split up the date using whatever separator is used in the directory listing to separate yy, mm, and dd, then put the pieces back together with some other text to generate a batch file that contains a delete command for each file we want to delete based on its date. We do have to deal with the 00 rollover and DIR listing format differences though.

Given a string that is a date, it is easy enough to split it up into yy, mm, and dd fragments if we know the format it's in. I've dealt with this in "Web Pages and Other Internet Services", but will use a slightly different approach here: the script will automatically locate the date field in the directory listing.

Once you have the date, converting it to yyyymmdd format is a bit complicated because of the 00 rollover, so I'll deal with that separately here. Given a date (FileDate) in mm-dd-yy format (where '-' is any character that is not a number) this AWK code
    gsub( /[^0-9]/, "+", FileDate )
    Fields = split( FileDate, Array, "+" )
    yy = Array[ 3 ]
    if( yy < 80 ) yy += 100
    FileDate = yy Array[ 1 ] Array[ 2 ]
will reformat it to yyymmdd with leading zero suppressed (it's a number now, not a string). Since the usable lifetime of the code is less than one century, there is no need to use more than three digits for the year - that code adds 100 to any two-digit year less than 80 (any such yy in a PC file date must be after 1980, because that's the starting epoch for PC clocks, so it must really be a larger number than 80 - adding 100 to it makes it larger, which is all that is required here). The intermediate variable yy is used for convenience and to reduce the probability of mistyping "Array[ 3 ]" since it would otherwise occur three times.

That code replaces everything in the date string that is not a number with "+", then splits the string into fields using '+' as the field delimiter - this neatly does away with the problem of different delimiters in different OSs and human language OS versions by normalizing all possible delimiters to a known, but arbitrary one.

The next complication to address is finding the date field in the DIR listing - it is the one that comes just ahead of the time, and the time field contains a colon which is something that can't occur in any other field. This AWK fragment searches through the fields until it finds one with a colon, subtracts 1 from that field's index number and saves the result as the index to the date field.
    for( i = 1; i <= NF; i++ ){
        if( $i ~ /:/ ) {
            DateField = i - 1
            break
        }
    }
The break statement terminates the search at the first colon - it's not really necessary, but it is good practice.

One other complication of multiple DIR listing formats is finding the file name: it will be either the first or the first two fields, or will be a long name at the end of the line. Since for file deletion purposes, either a short name or a long name can be used, and since the short name is in the same place everywhere but in NT, we can just detect which format is in use and act accordingly. Since the future relationship between format and the OS environment variable is unknowable, we will use the position of the date field as the format indicator: it is the first field in NT format, and something else in the other formats. In the NT format, the file name is everything following the third field; in the other formats the short name is everything preceding the field ahead of the date field, with the second field being the extension if the date field in fourth field (there is no extension if the date is the third field). This AWK fragment identifies and extracts the file name.
    if( DateField == 1 ) {
        Fname = ""
        for( i = 4; i <= NF; i++ ) Fname = Fname " " $i
        sub( /^ /, "", Fname )
        Fname = "\"" Fname "\""
    }
    else {
        Fname = $1
        if( DateField == 4 ) Fname = Fname "." $2
    }
The NT code (the first block of the IF statement) splices the various fields of the filename (space delimited by definition) together with spaces in such a way that there is an unwanted leading space - the sub() function removes that, and the next line puts it in quotes, which may or may not be needed but are always acceptable in NT. Note that it is necessary to insure that Fname is null before beginning the building process to avoid splicing the current name onto the remains from the previous line in the directory listing.

The "else" block deals with short names in the first or first and second fields - if the date is in the fourth field, there is an extension, otherwise there isn't. Since this is a short name, no quotes are needed, and if present would prevent the name from being used in Real DOS, so none are added.

The last major part of the AWK script is to generate the batch file that will actually delete the files. This is pretty trivial, just a print command with the delete command and the filename.
    print "del " Fname
Putting it all together with a few other bits we get this complete script. The target date is assumed to be in the environment variable TDATE in the same format as the file date stamp. Code has been added to ignore all lines in the DIR listing that are not file listings - directories are to be eliminated with the /a:-d switch to DIR, file listings will contain a colon but not a backslash, while no other line passes that test.

DATEDEL.AWK

BEGIN{
    TargetDate = ENVIRON[ "TDATE" ]
    gsub( /[^0-9]/, "+", TargetDate )
    Fields = split( TargetDate, Array, "+" )
    yy = Array[ 3 ]
    if( yy < 80 ) yy += 100
    TargetDate = yy Array[ 1 ] Array[ 2 ]
    delete Array[ 1 ]
    delete Array[ 2 ]
    delete Array[ 3 ]
}

{
    if( $0 ~ /:/ ) if( $0 !~ /\\/ ) {    
        for( i = 1; i <= NF; i++ ){
            if( $i ~ /:/ ) {
                DateField = i - 1
                break
            }
        }
        FileDate = $DateField
        gsub( /[^0-9]/, "+", FileDate )
        Fields = split( FileDate, Array, "+" )
        yy = Array[ 3 ]
        if( yy < 80 ) yy += 100
        FileDate = yy Array[ 1 ] Array[ 2 ]
        if( DateField == 1 ) {
            Fname = ""
            for( i = 4; i <= NF; i++ ) Fname = Fname " " $i
            sub( /^ /, "", Fname )
            Fname = "\"" Fname "\""
        }
        else {
            Fname = $1
            if( DateField == 4 ) Fname = Fname "." $2
        }
        if( FileDate < TargetDate ) print "del " Fname
    }
}
Note that the < test in the last line causes the script to generate a delete command for each file that is older than the target date, but not for those that have the same date as the target - change < to <= to make the program delete files that have the same date or are older.

Now we need a batch program to manage all this. The script is too long and too unlikely to be used just once to warrant creating it with the batch file, so just copy/paste the above script into a file named DATEDEL.AWK. The following batch file expects the target date to be given to it as a command line argument in the same format as the date stamp on the files (except that the delimiter need not be the same: 10/16/99 and 10-16-99 are equivalent, as are such oddities as 10x16$99 - anything except a number, a command line argument delimiter, or a redirection character should work to separate the three elements. Note that years are in two-digit format to match the ones in the file date stamp and that the order of the three elements much match the order in the date stamp. The script must be modified to change the order of the elements from the default mm dd yy order - there is no way to determine from an arbitrary date what order is used, though this can be determined in some special cases (day numbers above 12).

This is quite difficult and is best done with operating system specific third-party utilities. None the less, there are several ways to do it using a batch program and a secondary language to process directory listings. If you want a pure batch solution in a large number of lines, don't need it for NT, can work within it's other limits, can get it to work, and don't need to understand it, Tom Lavedas has a piece of Real Magic. The first problem is to define n: sometimes it means a certain number of calendar days, but other times it means a specified number of business days, with holidays, weekends, and other closed days disregarded. Sometimes it means weekdays. Solutions to all but the first of those meanings simply cannot be provided without knowledge of exactly what is meant, perhaps in the form of a lookup table. None the less, there are solutions to the real problem that are fairly simple but don't even address the task of date computation - these will be dealt with first. For a fairly simple implementation of a script that returns an ERRORLEVEL telling whether the file is more than n days old, relative to the current second, you can skip the more theoretical (and more exact) solutions below and jump directly to OLDER.BAT.

In the case where the program is to be run once each day that matters, a set of programs can be used: the one that runs today builds or modifies the rest of the set so that on each day, the one that runs is the one that was constructed on the day that matches the files that are to be deleted, and just deletes files with its own date. This can be done in pure batch, but it is easier to do with some other language to get the date into the file. It is, however, necessary to adjust the format of the date used to match the format of file date stamps as reported by DIR.

The general approach requires that today's file create a new one for the end of the series, and to arrange for the other files to move up one in the list while deleting the last. There are some tricks using files named with the date that suggest themselves, but all fail to work in all the operating systems without special effort. One approach that does work in all involves a starter batch file that contains the name of the file to be launched next and that manages the series of files, combined with separate daily files to do the actual work. This is rather complicated. In outline This works well only if the number of days files are to live is fairly small because there must be a separate command to rename each file. This example code fragment doesn't actually do anything useful, it just illustrates cycling a series of four trivial files.
 del 1.bat
 ren 2.bat 1.bat
 ren 3.bat 2.bat
 ren 4.bat 3.bat
 echo. > 4.bat
Obviously the file would become unwieldy if the number of files (days) were large, but it does have the virtue of being simple.

There are several problems with this approach, especially the chance that the program will be run more than once on a given day and the near certainty that it will be omitted at least once. The second of those can be dealt with by running the program a second time when the error is noted, but the protection needed against accidental second runs gets in the way. We can protect against multiple runs by adding code to abort if a file with today's date already exists and we can bypass that protection with a switch. There is one other gotcha that really hurts: the program must be run after the last target file is with a given date has been created. The solution to that is also the solution to many other problems: make it so that it deletes not just files with the target date, but also those that are older as well, though this does introduce additional complications - major complications.

Neglecting some of the problems, the next step is to incorporate some code to acquire the date of a file and the target date beyond which we want to keep files, that is, to delete files older than the target date. This takes us back to the running series of files mentioned earlier, but not developed very far. This will involve a script for extracting dates from file listings. This can be done in pure batch, but why bother when it's so much easier and cleaner to do it with a piece of an AWK script we already have.

GETFDATE.AWK
{
    if( $0 ~ /:/ ) if( $0 !~ /\\/ ) {    
        for( i = 1; i <= NF; i++ ){
            if( $i ~ /:/ ) {
                DateField = i - 1
                break
            }
        }
        FileDate = $DateField
        gsub( /[^0-9]/, "+", FileDate )
        Fields = split( FileDate, Array, "+" )
        yy = Array[ 3 ]
        mm = Array[ 1 ]
        dd = Array[ 2 ]
        print "set FDATE=" mm "-" dd "-" yy
    }
}
The batch program GETFDATE.BAT
 @echo off
 dir %1 | gawk -fgetfdate.awk > }{.bat
 call }{
 del }{.bat 
sets the environment variable FDATE to the date stamp of whatever filename is given to the batch program as an argument.

Given a method of putting the date stamp of a file into the environment gives us a method of managing the running list of files so that extra runs don't happen: if the newest file in the sequence has today's date, don't do anything. This program manages deletion of files in a specified directory to leave four days worth in place and delete everything else.

CYCLE.BAT
 @echo off
 set TDATE=
 set target=foo
 REM Get today's date
 dir }{.bat | gawk -fgetfdate.awk > }{.bat
 call }{
 del }{.bat 
 set thisdate=%FDATE%
 if not exist f1.bat goto cont0
 REM Get date of newest file
 dir f1.bat | gawk -fgetfdate.awk > }{.bat
 call }{
 del }{.bat
 REM Test for already having run today
 if %thisdate%!==%FDATE%%! goto end
 :cont0
 REM File series management
 if not exist f3.bat goto cont1
 REM First delete the too old files
 call f3.bat
 REM Remove the file just used from the series
 del f3.bat
 :cont1
 REM rename the remaining files up to larger numbers
 if exist f2.bat ren f2.bat f3.bat
 if exist f1.bat ren f1.bat f2.bat
 REM Create a new lowest number file
 echo set TDATE=%thisdate%> f1.bat
 REM If there was no f3.bat, skip the deletion process
 if %TDATE%!==! goto end
 REM Invoke the script to create a batch file to delete the older files
 dir /a:-d %target% | gawk -fdatedel.awk > }{.bat
 cd %target%
 REM Call the deletion file
 call }{
 REM Delete it - it's no longer needed
 del }{.bat
 REM Done
 :end
Obviously that can be expanded to older cut off dates by adding more files and changing the number of the file to actually call, but the program becomes unwieldy if the series of files is too large: each file means an additional line of code.

The most obvious drawbacks are that the program is limited to a single directory and that its own directory cannot be the target (otherwise it would delete itself after three days, and the AWK script after no more than that). There are a couple of ways around this - perhaps the sneakiest is to TOUCH all the files used by the program every time it runs: get a command line TOUCH utility (http://www.simtel.net/pub/simtelnet/msdos/dirutl/touch.zip, for example), unzip it and put it in the path, and use this version of CYCLE. CYCLE.BAT
 @echo off
 set TDATE=
 REM Get today's date
 dir }{.bat | gawk -fgetfdate.awk > }{.bat
 call }{
 del }{.bat 
 set thisdate=%FDATE%
 if not exist f1.bat goto cont0
 REM Get date of newest file
 dir f1.bat | gawk -fgetfdate.awk > }{.bat
 call }{
 del }{.bat
 REM Test for already having run today
 if %thisdate%!==%FDATE%%! goto end
 :cont0
 REM File series management
 if not exist f3.bat goto cont1
 REM First delete the too old files
 call f3.bat
 REM Remove the file just used from the series
 del f3.bat
 :cont1
 REM rename the remaining files up number
 if exist f2.bat ren f2.bat f3.bat
 if exist f1.bat ren f1.bat f2.bat
 REM Create a new lowest number file
 echo set TDATE=%thisdate%> f1.bat
 REM If there was no f3.bat, skip the deletion process
 if %TDATE%!==! goto end
 touch -d %thisdate% cycle.bat
 touch -d %thisdate% datedel.awk
 REM Invoke the script to create a batch file to delete the older files
 dir /a:-d | gawk -fdatedel.awk > }{.bat
 REM Call the deletion file
 call }{
 REM Delete it - it's no longer needed
 del }{.bat
 REM Done
 :end
to keep the date stamps of the two critical files always current.

The other main gotcha (aside from the program devouring itself and multiple runs getting ahead of themselves) is the loss of synchrony when, for whatever reason the program is not run as often as it should be. Note that any gaps propagate through the series of files as the list is updated - the above four business day program will clear any gaps on the fourth run: a two week vacation and a weekend will both clear out when the program deletes Monday and earlier on Friday.


The conceptually correct approach to n day old deletion (where n is in calendar days) involves converting the dates into a kind of Julian date: elapsed time since some starting point, the epoch, earlier than any date of interest - days when the program didn't run have to be counted. This criterion can easily lead to loss of files before they are even looked at: if you have an older than two day test and take a weeks vacation, then run the program first thing Monday morning, you will loose everything up to Friday.

A form of fake Julian date was used in a preceding section of deleting files older than a given date - this can be used with the rotating files method for deleting files after so many invocations of the program (presumably once per day that should be counted), but not for the arbitrary "n calendar days old" case where the program would be invoked with an arbitrary number in the expectation that files that many days old and newer would be kept, or that those that many days old and older would be deleted. For the latter case, we need a real Julian day number program - one that takes into account the number of days in various months and also of leap years. The sheer amount of data that must be incorporated into the program makes this very much non-trivial, and rather bulky. Fortunately, we don't need to convert back, just to Julian dates. For convenience (it's a leap year and also the oldest date every given to a DOS file), we will use 1 Jan 1980 as the epoch.

The algorithm is
subtract 1980 from the four-digit year
integer divide by 4 - the result is the number of leap years
multiply each completed month by its length in days, accounting for leap years
add the leap years, years completed * 365, days in completed months, and days in specified month
The third step is the hard one.

A simpler method of obtaining the date n days ago, where it can be used, is to obtain the current systime() number (number of seconds since 0 Jan 1970), subtract 86400 (the number of seconds in an ordinary day) times the desired number of days from it, then convert that into a date string with strftime():
 @echo off
 gawk-w32 "BEGIN{print strftime(\"set yesterday=%%m%%d%%y\", systime() - (86400 * %1));exit}" > }{.bat
 for %%a in (call del) do %%a }{.bat
which sets yesterday to the the current date minus the number of days given in the batch file's command line argument (in mmddyy format).

This AWK script appears to work. It reports the day number in the 1980 epoch (1 Jan 1980 = 1). The date is passed as a command line argument. Remember that we are still working with two digit year numbers. You can play with it by pasting it into a file named JDATE.AWK and issuing the command
gawk -fjdate.awk mm-dd-yy
where mm-dd-yy is the date to test.

JDATE.AWK
BEGIN{
# Fill in the look-up table array with days per month values.  Note that Feb
# is given its normal year value.
    MonthArray[ 1 ]  = 31
    MonthArray[ 2 ]  = 28
    MonthArray[ 3 ]  = 31
    MonthArray[ 4 ]  = 30
    MonthArray[ 5 ]  = 31
    MonthArray[ 6 ]  = 30
    MonthArray[ 7 ]  = 31
    MonthArray[ 8 ]  = 31
    MonthArray[ 9 ]  = 30
    MonthArray[ 10 ] = 31
    MonthArray[ 11 ] = 30
    MonthArray[ 12 ] = 31
# Define the epoch.
    Epoch = 1980
    DateInWork = ARGV[ 1 ]
    gsub(/[^0-9]/, "+", DateInWork )
    split( DateInWork, DateArray, "+" )
    Years  = DateArray[ 3 ]
# Subtract one from the month to allow for the fractional month in the date.
    Months = DateArray[ 1 ] - 1
    Days   = DateArray[ 2 ]
# Assume first two century digits are 19.
    Years += 1900
# Test the assumption - if the value is less than the Epoch, then the year
# number is less than 80 and the century digits are 20.
    if( Years < Epoch )  Years += 100
# Normalize year 0 to the Epoch year.  This has the same effect on years as
# subtracting 1 from the month has on months: it discounts the 
# year being tested
    Years -= Epoch
# Before correction for leap years, each year has 365 days.  
# Multiplying gives the number of days between the epoch and 
# the year of our test date.
    Days += Years * 365
# Correct by adding one day for each leap year.
    LeapDays = int( Years / 4 )
    Days += LeapDays
# If the year of our test date is a leap year, the number is 1 too large
# because that year has not been completed.
    if( ( LeapDays * 4 ) == Years ) {
        MonthArray[ 2 ]++
        Days--
    }
# Add up all the days in the months that have been completed and add
# them to the running total
    for( i = 0; i <= Months; i++ ) Days += MonthArray[ i ]
# The number we have is the days that have elapsed since 1 Jan of the 
# Epoch year, not the year number: 1 Jan 1980 has the value 0.  Add 1
# to make the numbers into day numbers.
    DayNumber = Days += 1
    print DayNumber
    exit    
}
There are several approaches to using that sort of code - the two most interesting (to me) are a single script that reads a directory listing and creates a deletion batch file based on the relationship between the file date and the date of a specified file and invoking a script, once for each file, that returns an ERRORLEVEL indicating the relationship. The former is interesting because it allows us to delete files that are n calendar days older than some other file (occasionally useful), and the latter because of it's simplicity and versatility. Both will be developed here.

The first example is much like the specific date deletion program above, except that it compares file dates in pseudo-Julian form. The program is given two files as arguments - the first containing the target date, the second containing directory entries of files to test and, if needed, generate delete commands for. There are several ways to tell whether the script is processing the first or second file - the one used here is to compare the total number of records processed (NR) with the number processed in the current file (FNR). Obviously, those two values are the same for the first file and different for any except the first.

I said in the last paragraph that the first file contained the date, and that the second contained directory listings. That is exactly what I mean: the first file must contain at least one date (the first one encountered will be used), but the date need not be in a directory listing - the first field encountered that contains exactly three delimited sections composed of exactly two numbers each, but not containing a colon will be taken as the date to use. The thing to watch out for here is that it is possible in a directory listing containing the date, that the name of the file might mimic an acceptable date format without looking like a date: "12a34b56" does not look like a date, but passes the test for one. If no date is found in the first file, the program will abort with an ERRORLEVEL of 254, i.e., not 0.

The number of days between the given date and the target date (the "n" in "n days") is passed as a command argument of the form days=n. Warning: this is case sensitive and position in the argument list sensitive - an error message and termination with an ERRORLEVEL of 253 will result if the variable does not get a valid value.

This script introduces user functions and exit codes (ERRORLEVELs), the former because a large part of the code is used in two places and a desire to remove error handling code from the main program to reduce clutter; the latter to allow a batch program easily to determine that the program failed for some reason.

A bit of additional utility is introduced by making the command to be printed a variable simply replacing each instance of a marker with the file name. This allows using the program entirely from a remote directory and also allows using whatever command (MOVE, for example) is actually wanted. Note that you have to provide any needed quotes in the command string, but you can include the file name more than once if you wish, and you can include a directory name within the quotes. The marker to be replaced is defined in the MARKER environment variable, and the command line in the OSTRING variable, both of which you mist set. If either is undefined, you get an error message and an ERRORLEVEL of 252 - the variable names may be case sensitive in some OS versions.

That's a lot of new stuff for a single example, but this file is already too large and it's still not done. NDAYSDEL.AWK
BEGIN{
# Fill in the look-up table array with days per month values.  Note that Feb
# is given its normal year value.
    MonthArray[ 1 ]  = 31
    MonthArray[ 2 ]  = 28
    MonthArray[ 3 ]  = 31
    MonthArray[ 4 ]  = 30
    MonthArray[ 5 ]  = 31
    MonthArray[ 6 ]  = 30
    MonthArray[ 7 ]  = 31
    MonthArray[ 8 ]  = 31
    MonthArray[ 9 ]  = 30
    MonthArray[ 10 ] = 31
    MonthArray[ 11 ] = 30
    MonthArray[ 12 ] = 31
# Define the epoch.
    Epoch = 1980
    TargetDate = -1
    OutputString = ENVIRON[ "OSTRING" ]
    Marker = ENVIRON[ "MARKER" ]
    if( OutputString == "" ) VarCrash()
    if(Marker == "" ) VarCrash()
    Ecode = 1
}



{
    if( NR == FNR ) {
# First file: the one with the pattern date
        if( days == "" ) dayCrash() 
        for( i = 1; i <= NF; i++ ) {
# For each field in each line, look for a field that consists of exactly three
# two-digit numbers separated by single characters that are not numbers.
            if( $i ~ /^[0-9][0-9][^0-9][0-9][0-9][^0-9][0-9][0-9]$/ ) {
# Then make sure it does not contain a colon (and so cannot be a time).
                if( $i !~ /:/ ) {
# Obtain the pseudo-Julian date for that field and reduce it by the desired
# number of days older.
                    TargetDate = Jdate( $i ) - days
# Closing the current file terminates processing of it and begins processing 
# of the next file. This prevents processing additional lines in that file.
# nextfile closes the current file and precedes to the next.
                    nextfile
# Break terminates the for loop to prevent looking at additional fields on 
# the current line.  Between nextfile and break, nothing in the first file
# following the first field that looks like a date will be evaluated.
                    break
                }
            }
        }

    }
    else {
# Second file: the directory listing
# First, check to see if a reasonable date value exists for the target date.
        if( TargetDate < 0 ) DateCrash()
        if( $0 ~ /:/ ) if( $0 !~ /\\/ ) {    
            for( i = 1; i <= NF; i++ ){
                if( $i ~ /:/ ) {
                    DateField = i - 1
                    break
                }
            }
            FileDate = Jdate( $DateField )
            if( DateField == 1 ) {
                Fname = ""
                for( i = 4; i <= NF; i++ ) Fname = Fname " " $i
                sub( /^ /, "", Fname )
            }
            else {
                Fname = $1
                if( DateField == 4 ) Fname = Fname "." $2
            }
            if( FileDate < TargetDate ) {
# Note: it is necessary to work on a copy of OutputString because whatever
# string is used is changed permanently and must be recreated.  If quotes
# are needed around the file name, they must be provided in the OSTRING
# environment variable in the appropriate places.
                os = OutputString ""
                gsub( Marker, Fname, os )
                print os
                Ecode = 0
            }
        }
    }
}

END{ exit( Ecode ) }

function Jdate( DateInWork,   Days, Years, Months, LeapDays) {
# The variables (above) following the extra space are local to this function.
    gsub(/[^0-9]/, "+", DateInWork )
    split( DateInWork, DateArray, "+" )
    Years  = DateArray[ 3 ]
# Subtract one from the month to allow for the fractional month in the date.
    Months = DateArray[ 1 ] - 1
    Days   = DateArray[ 2 ]
# Assume first two century digits are 19.
    Years += 1900
# Test the assumption - if the value is less than the Epoch, then the year
# number is less than 80 and the century digits are 20.
    if( Years < Epoch )  Years += 100
# Normalize year 0 to the Epoch year.  This has the same effect on years as
# subtracting 1 from the month has on months: it discounts the 
# year being tested
    Years -= Epoch
# Before correction for leap years, each year has 365 days.  
# Multiplying gives the number of days between the epoch and 
# the year of our test date.
    Days += Years * 365
# Correct by adding one day for each leap year.
    LeapDays = int( Years / 4 )
    Days += LeapDays
# If the year of our test date is a leap year, the number is 1 too large
# because that year has not been completed.
    if( ( LeapDays * 4 ) == Years ) {
        MonthArray[ 2 ]++
        Days--
    }
# Add up all the days in the months that have been completed and add
# them to the running total
    for( i = 0; i <= Months; i++ ) Days += MonthArray[ i ]
    return( Days )
}

function dayCrash(){
    print "Variable \"days\" invalid or empty" > "CON"
    print "The name must be lower case and \"days=n\" must " > "CON"
    print "be between the script and the files" > "CON"
    exit(253)
}

function DateCrash(){
    print "No valid value was found for the target date" > "CON"
    exit(254)
}

function VarCrash() {
    print "The MARKER or OSTRING environment variable, or both," > "CON"
    print "is/are undefined" > "CON"
    exit(252)    
}
One of the advantages of the method chosen for obtaining the target date from which the number of days older will be calculated is that if the directory listing used as the second file is sorted in date order (newest first) then that same file can be used as the first file and the action will be to delete files more than n days older than the newest file in the list. Normally that will be the file into which the directory listing was redirected if everything is done in the target directory. If it is, then it will be necessary to TOUCH the script and batch files as in a previous example to avoid having them eventually delete themselves, however, a separate directory will be used here.

The program returns ERRORLEVEL 0 if there was at least one file older than the target date minus the "days older than" value. This makes it possible to use the same script for both approaches to "n days old" mentioned above. It also lets us avoid CALLing an empty batch file to delete no files.

This example batch file deletes all the file in the test directory off the default directory that are more than three days older than the current date using the one pass method.

NDAYSDEL.BAT
 @echo off
 set MARKER=:file:
 set OSTRING=del "test\:file:"
 dir test\*.* /a:-d /o:-d > }{.dat
 set s=Nothing to do
 gawk -fndaysdel.awk days=3 }{.dat }{.dat > }{.bat
 if errorlevel 1 goto done
 set s=Found files to delete
 call }{
 :done
 echo %s%
 set s=
while this one calls the script once for each file in the directory. This example is much slower, but allows applying an entire subroutine instead of a single command (though the previous program can do that by making the command be a call to a subroutine batch file with the file name as its argument).

NDAYSDEL.BAT
 @echo off
 if %1!==}{! goto pass2
 set MARKER=0
 set OSTRING=.
 dir }{.dat > }{.dat
 for %%a in (test\*.*) do call %0 }{ %%a
 goto end
 :pass2
 if %MARKER%==2 goto end
 dir %2 > }{2.dat
 gawk -fndaysdel.awk days=3 }{.dat }{2.dat > nul
 if errorlevel 1 echo NOT deleting %2
 if errorlevel 1 goto end
 if %MARKER%==1 goto delit
 echo Ready to delete %2 - Press Y to continue, A for all, or 
 echo any other key to skip this file.  Press ^C to abort
 getch
 if errorlevel 3 set MARKER=2
 if errorlevel 65 if not errorlevel 66 set MARKER=1
 if errorlevel 65 if not errorlevel 66 goto delit
 if errorlevel 97 if not errorlevel 98 set MARKER=1
 if errorlevel 97 if not errorlevel 98 goto delit
 if errorlevel 89 if not errorlevel 90 goto delit
 if errorlevel 121 if not errorlevel 122 goto delit
 echo Skipping %2
 goto end
 :delit
 echo Deleting %2
 del %2
 :end
Note that the program has prompting capability, but is much slower than the previous version.

GETCH.COM (used in the above program for keyboard input) is a utility from DEBUG Scripts, another essay in this set.

This is a fairly simple approach that computes file age relative to the current second. The batch program is invoked with two arguments: the filespec of the file to test and the number of days to be tested for. The AWK script is GAWK specific:

OLDER.AWK # OLDER.AWK - a script to determine if a file is more than n days old.
# n is a command line argument in the form -v n=number (must be ahead of
# the filename of the file in work).
# The file in work must contain the system date from the DATE command as the
# last field on the first line, and the DIR listing of the file whose date
# is to be tested as the second line. Normally FIND would be used (in a batch
# file)to isolate the line. There must be no other lines.
# This program is intended to be insensitive to both the operating system
# being used (and therefore the location of the file date in the line), and
# of the language version (and therefore of the order of the date elements
# and of the delimiters used to separate the date elements).
# This program returns an exit code (ERRORLEVEL) of 1 if it is true that
# the file being tested is more than n days old, and 0 if it is not. Note
# that the number of days is measured from the current *second*, not
# midnight of the current day.

BEGIN{
# Generate the number to be used to create the date string for the cutoff
# date of the test. It is the current systime in seconds from the system's
# epoch minus the number of seconds in the number of days given on the
# command line as the variable n.
    t = systime() - (n * 24 * 60 *60)
}
# Process the input file
{
    if( NR == 1 ){
# The first line contains the current date, which is used as a pattern to
# determine where the year is in the date strings used by the OS version,
# and to determine what the delimiter is. The last field is split into
# individual characters, and the first one not a numeral is taken as
# the date delimiter.
        split( $NF, Array, "" )
        for( x in Array ) {
            y = Array[ x ]
            if( y !~ /[0-9]/ ) {
                delimiter = y
                break
            }
        }
# Now that the date delimiter is known, the program can split the date
# into elements - since the system date contains a four-digit year,
# the four character element is the year, and the index is the position
# of the year in the dates used by the OS version. Save the index.
        split( $NF, Array, delimiter )
        for( x in Array ) {
            y = Array[ x ]
            if( length( y ) == 4 ) year = x
        }
    }
    else {
# Since there are only two lines in the file, this is the second one, the
# one contain the file's directory listing.
        for( x = 1; x <= NF; x++ ) {
# It would be tempting to select the first field containing the delimiter
# as the date field, but some possible delimiters are also allowed in
# file names, so we have to make sure the field is not part of the file name.
# Well, reasonably sure: for some delimiters, it is not possible to tell the
# difference between a date and a file having a name in exactly the same
# format without knowing what the DIR listing format is, and that is not
# reasonably knowable from a single example of an unknown file. The test
# used is that the first field containing no letters and the known
# delimiter is the file's date.

            if( $x ~ delimiter ) { 
                if( $x !~ /[a-zA-Z]/ ) {
                    d = $x
                    break
                }
            }
        }
# d is the file's date field. Split it up into its elements.
        split( d, Array, delimiter )
# If the year's position - as determined from the system date (above) is
# the first element (yy-mm-dd format), then the month is the second element
# and the day is the third. This program does assume that the date format
# is either yy-mm-dd or mm-dd-yy, it doesn't work if the format is something
# else.
        if( year == 1 ) {
            year = Array[ 1 ]
            month = Array[ 2 ]
            day = Array[ 3 ]
        }
        else {
# This assumes that if the year is not the first element, it is the third.
            year = Array[ 3 ]
            month = Array[ 1 ]
            day = Array[ 2 ]
        }
# DIR listings use two digit years, but the comparison requires the century
# as well. The assumption used here is that no files have dates earlier than
# 1980 or later than 2079 (the program breaks on many systems in 2038 anyway
# because the systime() number goes negative (whose idea was it to make it
# a *signed* long anyway?)).
        century = 2000
        if( year >= 80 ) century = 1900
        year += century
# One of the features of awk is that strings that are numbers can be used
# either way. sprintf() forces the elements to be padded with leading zeros
# if they are single digits.
        d = sprintf( "%4s%0.2s%0.2s", year, month, day )
        ExitCode = 0
# This is the actual comparison between the normalized date of the file
# and the same format date derived from the number representing the
# cutoff time is seconds: strftime() generates the string for that. Since
# both strings are numbers, they can be compared for less than. If the
# file's date is less than the derived cutoff date, it is older than the
# cutoff date.
        if( d < strftime( "%Y%m%d", t ) ) ExitCode = 1
# ExitCode is 0 if the file date is not older than the cutoff date, otherwise
# it is 1. This is the program's exit code and becomes ERRORLEVEL when the
# program terminates.
        exit( ExitCode )
    }    
}


Code converted with
Code2HTML 3.0

and is invoked from this batch file - note that the batch program requires two arguments: the filespec (quoted if it contains spaces) and the number of days older than now that the file is to be tested for. It assumes that the current date line in the DATE response does not contain "(", that there are two non-blank lines in the report, and that the other one does contain "(". Otherwise, the program and script assume almost nothing about what opertating system is in use or what the varioys DIR and DATE formats are. The DIR command in the batch file does assume that DIR without any switches produces the default format for the OS, that is, that there are no DIRCMD environment variable switches that would break the DIR report for the script.

OLDER.BAT
@echo off
echo. | date | find /v "(" > %temp%\}{.dat
dir %1 | find ":" | find /v "\" >> %temp%/}{.dat
awk -folder.awk -v n=%2 %temp%/}{.dat
del %temp%\}{.dat





********* more later ********




This stuff has been only partially tested at the time of its initial release, but it is known that the versions of GAWK and MAWK used here do work in Real DOS, Win9x, and NT4. The complete programs have not all been tested under Real DOS.




  ** Copyright 1995, 1996, 1997, 1998, 1999, 2000, 2001 Ted Davis - see License, included by reference. ** 

Input and feedback from readers are welcome. NOTE: the subject of the message must contain the word "batch" for the message to get past the spam filter.



Back to the parent page AWK Scripts: Date and Time

Back to the Table of Contents page

Back to my personal links page - back to my home page