Table of Contents

Introduction

This document describes the WrpCsv Abstract Data Type (ADT), which handles Comma Separated Values (CSV) files. This program provides an Application Programming Interface (API) to simplify working with CSV files.

Included Files

The files that are included in this package are:

wprcsv.c
This is the actual implementation of the ADT
wrpcsv.h
This is the header file that should be included in the user programs that access this ADT. This file defines the API that any program you write that uses this package must use to handle CSV files.
driver.c
This is a sample program that was used to develop and test the ADT
wrpfmt
This program uses the wrpcsv ADT to create a program that allows users to create format files that can be used similar to mail merge programs to do things like reformatting CSV files.
wrpxml
This program uses the wrpcsv ADT to convert CSV files into XML formatted files.

Top of Page

Using the Programs

WrpFmt

Quick Start

This program takes a format file and a CSV file and produces a new file that contains the content of the format file, while inserting the fields from the CSV file. This is similar to what you can do with mail merge programs. You define the content of the file you want to create, while indicating what fields should be inserted where in the file.

For example, suppose we have a CSV file (addr.csv) that contains a name, phone number, and e-mail address:

And we create a format file (fmt.txt): In the format file, fields of the form $[X] indicates that field "X" should be inserted (where zero is the first field, one is the next, etc.)

If we then run this program using the following command:

The output of this program would then be:

We probably want to add some explanation before the list, and maybe some note afterwards. So, we can create text files (beg.txt and end.txt) to include what we want to see before and after the list.

Here is beg.txt:

And here is end.txt:

Now, when we run this program with the command:

The output of this program would then be:

Top of Page

Detailed Documentation

The format file can be as many lines as necessary. The format file will be duplicated for each line within the CSV file, with the fields being inserted where indicated in the format file. The field name is of the form:

Where "X" is the field number (starting with zero), and "s" is an optional field size, which will be the minimum width of the field. This option is useful if you want the output to be in columns. The colon should only appear if you are specifying a width field.

The following arguments are supported by the wrpfmt program:

-b file-name
Use the file "file-name" as the beginning text file. The contents of this file will be included in the output file before the CSV list appears.
-e file-name
Use the file "file-name" as the ending text file. The contents of this file will be included in the output file immediately after the CSV list.
-f fle-name
Use the file "file-name" as the format file. The contents of this file will be included for each line in the CSV file, with the corresponding fields from the CSV file being inserted in place of "$[0]", "$[1]", etc. Notice that this file can be more than one line, in which case, each line from the CSV file would take as many lines as are in the format file.
-o file-name
Write the output to the specified file ("file-name"). If this option isn't specified, then the result is written to the screen.
-n
By default, if a field is specified but there is no field in a given record, then the string "[nodata]" is substituted. By including this option, that is suppressed, and nothing will be printed in its place.

Top of Page

WrpXml

Quick Start

This program will convert a CSV file into a simple XML file. XML is a popular format that is used to pass information between programs. If you have a program that creates CSV files, and another one that expects XML files, this program can be used to convert the file.

An XML file contains a collection of records. Each record contains attributes, which are similar to field names. You may specify the names of the collection and records, while the field (or attribute) names are derived from the input file, if the first line of the CSV file contains them, or from a separate file that you create. When exporting files from Excel or Access, you can usually have the first line contain the field names. If that isn't possible, then you can create a text file with the names of the fields and use that instead.

For example, suppose we have a CSV file (addr.csv) that contains a name, phone number, and e-mail address:

Since this file doesn't contain the field names, we will have to create another text file (flds.txt) with the field names:

Now, if we run this program, it will ask us for some information. Enter the following information:

PromptResponse
Input Fileaddr.csv
Record NamePerson
Collection NamePeople
Output Fileaddr.xml
File with Field Namesflds.txt

The output file (addr.xml) would contain the following:

We could have entered all the above information on the command line by entering the following command:

This command tells the program to read addr.csv and create addr.xml, calling the collection People, and each record Person. XML can be written in two ways: as a tree structure, or by specifying attributes within the XML tag. This command will create a tree formatted XML file. The output file should be the same as above when we were prompted for the information.

If we use the command requesting fields be specified as attributes:

Then the output file would look like this:

Both files contain the same information, and any program expecting and XML file should handle both files the same way. Which you use is more a matter of preference than anything else.

Top of Page

Detailed Description

As mentioned above, you can run the csvxml program by just typing the name of the program (or double-clicking it). You will then be prompted for all the needed values. While this makes it easy to run, if you are planning on doing this often, you will probably want to set up a batch job to run it with all the arguments you normally want to use.

The csvxml program accepts the following arguments:

-a attribute-type
This argument lets you specify either a tree-formatted XML file or attribute-value formatted XML file by specifying either t or a for attribute-type.
-r record-name
Specify what name you want to use for each record in the output file.
-c collection-name
Specify the name you want applied to the entire collection.
-f file-name
If the first line of the input file contains field names, then don't use this option; otherwise, specify the name of the file that contains the field names for the file.
-o file-name
This argument specifies the name of the output file
-d
This flag will turn on debugging, which you will probably never want to do
The last argument should be the name of the input file. There is no prefix for this argument.

Top of Page

Application Programming Interface

This section describes the functions that you can (or must) call when using the WrpCsv package to process CSV files. This section will only be of interest to a programmer who is intending to write a program to use this package to manipulate CSV files. The programmer should refer to the header file wrpcsv.h for a complete list of the symbolic constants and prototypes of all public functions.

Return Codes

In general, a negative return code always indicates some sort of error, although sometimes the error is minor and expected (e.g., EOF). The following symbollic constants are defined in the header file, and should be used within client programs when appropriate:

Symbolic ConstantExplanation
WRPCSV_INVALID An invalid argument was passed to the routine returning this value
WRPCSV_MEMORY The program was unable to allocate sufficient memory to satisfy the request.
WRPCSV_EOF An End-of-file was encountered when reading from the input CSV file. This is a "normal" error code, since EOF is negative, but the program expected to eventually reach the end of the input file.
WRPCSV_OK This is the normal return code that indicates that the request was completed normally.

When an error code is returned, the program can obtain a more detailed error messasge by calling one or both of the following routines:

RoutineExplanation
char *wrpCsvGetErrMsg(void) This roution will return a more detailed message for the most recent error that was encountered. The return value is the error message, and the routine expectes no arguments.
char *wrpCsvGetLine(int id) This routine can be called at any time (in other words, not just after an error). It will return the most recent line that was read. The return value is the actual input line as it was read from the input file.

Top of Page

Common Routines

This section contains the most common functions that will be used when processing CSV files. Less common routines will be listed later.

RoutineExplanation
int wrpCsvCreate(void) This routine creates the actual CSV structure. It must be called before any requests to read data from the input file. The value returned is a handle, which is a unique value that must be passed to subsequent wrpcsv routines that take an id argument.
int wrpCsvDestroy(int id) This should be the last routine that is called. It causes the conrol structures created when wrpCsvCreate was called to be released to the system. If your program is going to terminate, then this routine isn't technically necessary, since the system will free up any allocated memory when a program terminates, but it is a good idea to include it anyway. Once this routine is called, the handle is no longer valid and cannot be used again. If you want to process more CSV files, you must call wrpCsvCreate again.
int wrpCsvReadLine(int id, FILE *fp) This routine causes the next line in the specified input file to be read into the CSV structure defined by id and parsed into individual fields. If this routine returns normally, you can extract individual fields. This is the routine you would normally call when you are reading a CSV file.
int wrpCsvParseString(int id, char *p) This routine parses the string that is passed into separate fields, and stored in the CSV structure defined by the id value. If this routine returns normally, you can extract individual fields. This is the routine you would normally call if you have already read the string to be parsed, or if your program builds a string which is then to be parsed.
int wrpCsvGetNumFields(int id) This function returns the number of fields that were contained in the most recent line parsed by the id CSV structure.
char *wrpCsvGetField(int id, int ndx) This function will return the requested field from the specified CSV structure. The field numbers start with zero, so a field value of one returns the second field.

Top of Page

Routines Affecting Global Parameters

The following routines will affect all CSV structures. They should be used with caution, and called before creating any CSV structures. There is another set of functions (described later) that affect similar values for a single CSV structure. If you only need a value changed for a single CSV structure, then you should consider using one of those functions.

RoutineExplanation
int wrpCsvSetDefaultMaxLine(int val) This function will change the default maximum line size for all subsequent CSV structures. The default is BUFSIZ, which is the largest amount of data that can be read in a single input function.
int wrpCsvSetDefaultMaxFields(int val) This function resets the default maximum number of fields that all subsequent fields can have in a single line. The default is 100.
int wrpCsvSetDefaultDelims(char *delim) This function resets the default field delimiters for all subsequent CSV structures Once set, any of the characters contained in this string will be interpreted as a delimeter for the fields within the CSV. If any of these characters appear within a field, then the entire field must be quoted in order to prevent the character from causing the field to be split. The default delimeter is the comma.
int wrpCsvSetMaxEnt(int val) This function sets the maximum number of CSV structures that can be created. The default is 100.
int wrpCsvGetDefaultMaxLine(void) This function returns the current value for maximum line size.
int wrpCsvGetDefaultMaxFields(void) This function returns the current value for the maximum number of fields.
char *wrpCsvGetDefaultDelims(void) This function returns the default field delimeters. If a NULL value is returned, then the default is the comma.
int wrpCsvGetMaxEnt(void) This function returns the current maximum number of CSV structures that can be created at any one time. If a CSV is created and then destroyed, the CSV does not figure in this count.

Top of Page

Routines Affecting Specific CSV

The following routines can be used to affect the parameters of a specific CSV structure. In general, these routines should be used rather than the global routines, and they should be called before creating any CSV structures.

RoutineExplanation
int wrpCsvSetMaxLine(int id, int val) This function will reset the maximum length of the line for the specified CSV structure. The argument id should be the value returned by wrpCreateCsv, and the argument val should be the new value to set this parameter.
int wrpCsvSetMaxFields(int id, int val) This function will reset the maximum number of fields allowed for the specified CSV structure. The argument id should be the value returned by wrpCreateCsv, and the argument val should be the new value to set this parameter.
int wrpCsvSetDelims(int id, char *delim) This function will reset the set of delimeter characters for the specified CSV structure. The argument id should be the value returned by wrpCreateCsv, and the argument val should be the new value to set this parameter. If the new value is NULL, then it will be reset to the default value of a comma.

Top of Page