Table of Contents

WrpCsv Abstract Data Type (ADT)

Introduction

This document describes the WrpCsv Abstract Data Type (ADT), which handles Comma Separated Values (CSV) files. This program provides an Application Programming Interface (API) to simplify working with CSV files.

Included Files

The files that are included in this package are:

csvfmt
This program uses the wrpcsv ADT to create a program that allows users to create format files that can be used similar to mail merge programs to do things like reformatting CSV files.
csvxml
This program uses the wrpcsv ADT to convert CSV files into XML formatted files.
wrpcsv.h
This is the header file that should be included in the user programs that access this ADT. This file defines the API that any program you write that uses this package must use to handle CSV files.
wprcsv.c
This is the actual implementation of the ADT. You probably don't need to look at this unless you intend to modify the ADT and/or need a better understanding of how it works.
driver.c
This is a sample program that was used to develop and test the ADT. For those who are too impatient to read the documentation, this program will illustrate the basic use of the ADT.

Top of Page

Using the Programs

CsvFmt

Using CsvFmt

Command Line Arguments

This program takes a format file and a CSV file and produces a new file that contains the content of the format file, while inserting the fields from the CSV file. This is similar to what you can do with mail merge programs. You define the content of the file you want to create, while indicating what fields should be inserted where in the file.

For example, suppose we have a CSV file (addr.csv) that contains a name, phone number, and e-mail address:

John Jones,555-1234,jjones@whoever.com
"Sam Smith, Jr.",555-2345,"ssmith@here.com"
And we create a format file (fmt.txt):
Name: $[0], Phone: $[1], E-Mail: $[2]

In the format file, fields of the form $[X] indicates that field "X" should be inserted (where zero is the first field, one is the next, etc.)

If we then run this program using the following command:

csvfmt -f fmt.txt addr.csv

The output of this program would then be:

Name: John Jones, Phone:  555-1234, E-Mail:  jjones@whoever.com
Name: Sam Smith, Jr., Phone:  555-2345, E-Mail: ssmith@here.com

We probably want to add some explanation before the list, and maybe some note afterwards. So, we can create text files (beg.txt and end.txt) to include what we want to see before and after the list.

Here is beg.txt:

Here is the list of people:
And here is end.txt:
Please don't pester these people

Now, when we run this program with the command:

csvfmt -f fmt.txt -b beg.txt -eend.txt addr.csv

The output of this program would then be:

Here is the list of people:
Name: John Jones, Phone:  555-1234, E-Mail:  jjones@whoever.com
Name: Sam Smith, Jr., Phone:  555-2345, E-Mail: ssmith@here.com
Please don't pester these people

Command Line Arguments

Using CsvFmt

The following arguments are supported by the wrpfmt program:

-b file-name
Use the file "file-name" as the beginning text file. The contents of this file will be included in the output file before the CSV list appears.
-e file-name
Use the file "file-name" as the ending text file. The contents of this file will be included in the output file immediately after the CSV list.
-f fle-name
Use the file "file-name" as the format file. The contents of this file will be included for each line in the CSV file, with the corresponding fields from the CSV file being inserted in place of "$[0]", "$[1]", etc. Notice that this file can be more than one line, in which case, each line from the CSV file would take as many lines as are in the format file.
-n
By default, if a field is specified but there is no field in a given record, then the string "[nodata]" is substituted. By including this option, that is suppressed, and nothing will be printed in its place.
-o file-name
Write the output to the specified file ("file-name"). If this option isn't specified, then the result is written to the screen.
-s n
Causes the program to skip the first n lines of the input file.
This is useful if the file starts with a header line(s)

Top of Page

CsvXml

Using CsvXml

Command Line Arguments

This program will convert a CSV file into a simple XML file. XML is a popular format that is used to pass information between programs. If you have a program that creates CSV files, and another one that expects XML files, this program can be used to convert the file.

An XML file contains a collection of records. Each record contains attributes, which are similar to field names. You may specify the names of the collection and records, while the field (or attribute) names are derived from the input file, if the first line of the CSV file contains them, or from a separate file that you create. When exporting files from Excel or Access, you can usually have the first line contain the field names. If that isn't possible, then you can create a text file with the names of the fields and use that instead.

For example, suppose we have a CSV file (addr.csv) that contains a name, phone number, and e-mail address:

John Jones,555-1234,jjones@whoever.com
"Sam Smith, Jr.",555-2345,"ssmith@here.com"
Since this file doesn't contain the field names, we will have to create another text file (flds.txt) with the field names:
Name,Phone,EMail


Now, if we run this program, it will ask us for some information. Enter the following information:

Prompt Response
Input File addr.csv
Record Name Person
Collection Name People
Output File addr.xml
File with Field Names flds.txt


The output file (addr.xml) would contain the following:

<People>

<Person>
<Name>Name</Name>
<Phone>Phone</Phone>
<EMail>EMail</EMail>
</Person>

<Person>
<Name>John Jones</Name>
<Phone>555-1234</Phone>
<EMail>jjones@whoever.com</EMail>
</Person>

<Person>
<Name>Sam Smith, Jr.</Name>
<Phone>555-2345</Phone>
<EMail>ssmith@here.com</EMail>
</Person>

</People>


We could have entered all the above information on the command line by entering the following command:

csvxml -f flds.txt -c People -r Person -at -oaddr.xml addr.csv

This command tells the program to read addr.csv and create addr.xml, calling the collection People, and each record Person. XML can be written in two ways: as a tree structure, or by specifying attributes within the XML tag. This command will create a tree formatted XML file. The output file should be the same as above when we were prompted for the information.

If we use the command requesting fields be specified as attributes:

csvxml -f flds.txt -c People -r Person -aa -oaddr.xml addr.csv

Then the output file would look like this:

<People>

<Person
  Name="Name"
  Phone="Phone"
  EMail="EMail"
/>

<Person
  Name="John Jones"
  Phone="555-1234"
  EMail="jjones@whoever.com"
/>

<Person
  Name="Sam Smith, Jr."
  Phone="555-2345"
  EMail="ssmith@here.com"
/>

</People>


Both files contain the same information, and any program expecting and XML file should handle both files the same way. Which you use is more a matter of preference than anything else.

Top of Page

Command Line Arguments

Using CsvXml

As mentioned above, you can run the csvxml program by just typing the name of the program (or double-clicking it). You will then be prompted for all the needed values. While this makes it easy to run, if you are planning on doing this often, you will probably want to set up a batch job to run it with all the arguments you normally want to use.

The csvxml program accepts the following arguments:

-a attribute-type
This argument lets you specify either a tree-formatted XML file or attribute-value formatted XML file by specifying either t or a for attribute-type.
-r record-name
Specify what name you want to use for each record in the output file.
-c collection-name
Specify the name you want applied to the entire collection.
-f file-name
If the first line of the input file contains field names, then don't use this option; otherwise, specify the name of the file that contains the field names for the file.
-o file-name
This argument specifies the name of the output file
-d
This flag will turn on debugging, which you will probably never want to do
The last argument should be the name of the input file. There is no prefix for this argument.

Top of Page

Application Programming Interface

This section describes the functions that you can (or must) call when using the WrpCsv package to process CSV files. This section will only be of interest to a programmer who is intending to write a program to use this package to manipulate CSV files. The programmer should refer to the header file wrpcsv.h for a complete list of the symbolic constants and prototypes of all public functions.

Error Detection and Handling

Return Codes

In general, a negative return code always indicates some sort of error, although sometimes the error is minor and expected (e.g., EOF). The following symbollic constants are defined in the header file, and should be used within client programs when appropriate:

Symbolic Constant Explanation
WRPCSV_INVALID An invalid argument was passed to the routine returning this value
WRPCSV_MEMORY The program was unable to allocate sufficient memory to satisfy the request.
WRPCSV_EOF An End-of-file was encountered when reading from the input CSV file. This is a "normal" error code, since EOF is negative, but the program expected to eventually reach the end of the input file.
WRPCSV_OK This is the normal return code that indicates that the request was completed normally.


Error Analysis Routines

When an error code is returned, the program can obtain a more detailed error messasge by calling one or both of the following routines:

Routine Explanation
char *wrpCsvGetErrMsg(void) This roution will return a more detailed message for the most recent error that was encountered. The return value is the error message, and the routine expectes no arguments.
char *wrpCsvGetLine(int id) This routine can be called at any time (in other words, not just after an error). It will return the most recent line that was read. The return value is the actual input line as it was read from the input file.

Common Routines

This section contains the most common functions that will be used when processing CSV files. Less common routines will be listed later.

Routine Explanation
int wrpCsvCreate(void) This routine creates the actual CSV structure. It must be called before any requests to read data from the input file. The value returned is a handle, which is a unique value that must be passed to subsequent wrpcsv routines that take an id argument.
int wrpCsvDestroy(int id) This should be the last routine that is called. It causes the conrol structures created when wrpCsvCreate was called to be released to the system. If your program is going to terminate, then this routine isn't technically necessary, since the system will free up any allocated memory when a program terminates, but it is a good idea to include it anyway. Once this routine is called, the handle is no longer valid and cannot be used again. If you want to process more CSV files, you must call wrpCsvCreate again.
int wrpCsvReadLine(int id,FILE *fp) This routine causes the next line in the specified input file to be read into the CSV structure defined by id and parsed into individual fields. If this routine returns normally, you can extract individual fields. This is the routine you would normally call when you are reading a CSV file.
int wrpCsvParseString(int id,char *p) This routine parses the string that is passed into separate fields, and stored in the CSV structure defined by the id value. If this routine returns normally, you can extract individual fields. This is the routine you would normally call if you have already read the string to be parsed, or if your program builds a string which is then to be parsed.
int wrpCsvGetNumFields(int id) This function returns the number of fields that were contained in the most recent line parsed by the id CSV structure.
char *wrpCsvGetField(int id, int ndx) This function will return the requested field from the specified CSV structure. The field numbers start with zero, so a field value of one returns the second field.

Routines Affecting Global Parameters

The following routines will affect all CSV structures. They should be used with caution, and called before creating any CSV structures. There is another set of functions (described later) that affect similar values for a single CSV structure. If you only need a value changed for a single CSV structure, then you should consider using one of those functions.

Routine Explanation
int wrpCsvSetDefaultMaxLine(int val) This function will change the default maximum line size for all subsequent CSV structures. The default is BUFSIZ, which is the largest amount of data that can be read in a single input function.
int wrpCsvSetDefaultMaxFields(int val) This function resets the default maximum number of fields that all subsequent fields can have in a single line. The default is 100.
int wrpCsvSetDefaultDelims(char *delim) This function resets the default field delimiters for all subsequent CSV structures Once set, any of the characters contained in this string will be interpreted as a delimeter for the fields within the CSV. If any of these characters appear within a field, then the entire field must be quoted in order to prevent the character from causing the field to be split. The default delimeter is the comma.
int wrpCsvSetMaxEnt(int val) This function sets the maximum number of CSV structures that can be created. The default is 100.
int wrpCsvGetDefaultMaxLine(void) This function returns the current value for maximum line size.
int wrpCsvGetDefaultMaxFields(void) This function returns the current value for the maximum number of fields.
char *wrpCsvGetDefaultDelims(void) This function returns the default field delimeters. If a NULL value is returned, then the default is the comma.
int wrpCsvGetMaxEnt(void) This function returns the current maximum number of CSV structures that can be created at any one time. If a CSV is created and then destroyed, the CSV does not figure in this count.
Routine Explanation

Top of Page