@echo off
set AWKLINES=4
set AWKVAR01=ABC
set AWKVAR0G=CDE
set AWKVAR03=GHI
set AWKDATE=on
Note that the entries are out of their final order and are nothing more than SET commands. This code could easily be incorporated in the main batch file, but would loose its identity as a configuration file. The test Web page used for demonstration contains several more fake shares entries than are to be processed. To avoid copyright problems, the "share names" used are blocks of three consecutive letters and are not intended to represent real stock market symbols, though some duplication of some real symbol used somewhere in the world is likely but unavoidable. It is also likely that some of the fake share names unintentionally duplicate real share names. Several of the share prices were made up to show various features of the way printf() can display numbers. The prices given on the real page that was given to me as an example lack a decimal point, but must have one (three digits) in the report - I have made some prices contain decimal points, and other be so small that they have nothing to the left of the decimal point.<H3 ALIGN="center">on 9/12/1999</H3>
and a typical data block is
<TD>ABC</TD>
<TD>Abend Computers</TD>
<TD align="right">123456</TD>
<TD align="right">123000</TD>
<TD align="right">123412</TD>
After our two-stage splitting process, these produce the fields
AWK -fgetshars.awk > result.txt. The RESULT.TXT file will contain
ABC,123.000,09/12/1999
CDE,76.543,09/12/1999
GHI,0.100,09/12/1999
If your version of AWK is called MAWK or GAWK, or whatever, either copy it with the name AW.EXE or change the command to match whatever you have.BEGIN{
# Explore the entire environment (this copy) for entries having names
# beginning with "AWKVAR", the marker for variables of interest to
# this program.
for( i in ENVIRON ){
if( i ~ /AWKVAR/ ){
# When one is found make its *value* an index into an array of null strings.
# Some versions of AWK don't like array indexes that are themselves array
# data, so an intermediate variable will be used.
s = ENVIRON[ i ]
Array[ s ] = ""
}
# There are other variables of interest: the one that specifies the location
# of the desired price field relative to the name code and the one that
# defines both the date field itself and the string to be reomoved.
if( i ~ /AWKLINES/ ) MaxLcount = ENVIRON[ i ]
if( i ~ /AWKDATE/ ) DateMark = ENVIRON[ i ] " "
# The space was added to DateMark so there would be no invisible characters
# in the batch file that sets the variables. This might have to be changed
# if the date string format doesn't have a space just before the date.
}
# Initialize the variable to be used as a line counter.
Lcount = 0
}
{
# We need to split up the line into fields with either "<" or ">"
# as the field separator to isolate the contents of containers, and
# also contents ofthe tags themselves as an unwanted byproduct. The
# easiest way to do this in this language is to replace ">" with "<"
# and split() the line into an array.
gsub(/>/, "<" )
# Blank lines are ignored by the "if()" - zero fields means an empty line
if(split( $0, Fields, "<" ) != 0 ) {
# Increment the line counter if it needs to be.
if( Lcount != 0 ) Lcount++
# Look for the line with the date - it contains a date in nn/nn/nnnn format,
# where the 'n's are numerals. The contents of the AWKDATE variable followed
# by a numeral identifies the line.
s1 = DateMark "[0-9]"
if( $0 ~ s1 ) {
# Note that a variable can be used as a regular expression.
for( i in Fields ) {
if( Fields[ i ] ~ /[0-9]/ ) {
DateField = Fields[ i ]
# Remove the value of DateMark from the field. DateMark contains the value
# of the AWKDATE environment variable followed by a space.
sub( DateMark, "", DateField )
# Sometimes it is desirable to change the format of the date. Since this
# is probably to be done only once, we can hard code it. This assumes you
# want the date elements in the yyyymmdd order, that the existing order is
# ddmmyyyy, that the new separator is '-', and that the existing separator
# is '/' - change it to suit your needs.
split( DateField, DateArray, "/" )
DateField = DateArray[3] "-" DateArray[2] "-" DateArray[1]
}
}
}
else {
# All other non-blank lines.
# We have to decide whether to look for a name code or a price. Since
# Lcount is initially 0 and will be reset to 0 when we find a price, if it
# is 0, look for a code, if it isn't, look for a price.
# If we are looking for a price, we wait for Lcount to reach the required
# value, 4 in the demonstration case, but the value passed in AWKLINES in
# any case (stored in MaxLcount).
if( Lcount == 0 ) {
# Looking for a code.
for( i in Fields ) {
for( j in Array ) {
# Note that the following match is equality, not a regular expression match.
# This prevents partial matches from causing trouble
if( Fields[ i ] == j ) {
Lcount = 1
ShareCode = j
}
}
}
}
if( Lcount == MaxLcount ) {
# The line with the closing price in it.
for( i in Fields ) {
if( Fields[ i ] ~ /[0-9]/ ) {
# When we have found the field with the numbers in it we can output the line,
# and we must reset Lcount.
Lcount = 0
printf( "%s,%.3f,%s\n", ShareCode, (Fields[ i ] * 0.001), DateField )
delete Array[ ShareCode ]
# That line is an additonal insurance against false matches, and it speeds up
# later matches because there are fewer comparisons to make.
}
}
}
}
}
}
This brings us to the batch program to manage what is really conversion of a Web page into a Quicken input file (in the original case that prompted all of this). Wouldn't it really be nice if the batch program could get the page as well as process it? It can, but that requires another supplemental language - one that can deal with Internet. Such a language is available free for 36 different platforms: Rebol - unfortunately, DOS is not one of the platforms, but Win9x and NT4 are. The original task was to run on Win9x, so it's fair to use Rebol to get the page.REBOL [
Title: "GetWebPage"
Date: 19-Sept-1999
]
write %shares.htm read /batch/shares.htm
quit
Obviously you can change the URL to whatever you need - the file name too, just make sure to prefix it with '%'. The header is unimportant to us, but essential to the language interpreter, so you might as well leave it as-is or change the title and date - but don't do anything else unless you are familiar with the language. @echo off
call setnames
set oldpath=%path%
path c:\progra~1\mawk; c:\progra~1\rebol;%path%
start /w rebol -s GetWebPage.r
awk -fgetshrs.awk shares.htm > report.txt
set path=%oldpath%
type report.txt
pause
The pause command allows the program to stay on the screen when it's launched by double clicking on the file or from an icon. If this is the method always used to launch it, then the third from the last line - the one that restores the path, is not really needed.
** Copyright 1995, 1996, 1997, 1998, 1999, 2000, 2001 Ted Davis - see License, included by reference. **
Input and feedback from readers are welcome. NOTE: the subject of the message must contain the word "batch" for the message to get past the spam filter.
Back to the Table of Contents page
Back to my personal links page - back to my home page