This document describes the WrpCsv Abstract Data Type (ADT), which handles Comma Separated Values (CSV) files. This program provides an Application Programming Interface (API) to simplify working with CSV files.
The files that are included in this package are:
This program takes a format file and a CSV file and produces a new file that contains the content of the format file, while inserting the fields from the CSV file. This is similar to what you can do with mail merge programs. You define the content of the file you want to create, while indicating what fields should be inserted where in the file.
For example, suppose we have a CSV file (addr.csv) that contains a name, phone number, and e-mail address:
John Jones,555-1234,[email protected] "Sam Smith, Jr.",555-2345,"[email protected]"And we create a format file (fmt.txt):
Name: $[0], Phone: $[1], E-Mail: $[2]In the format file, fields of the form $[X] indicates that field "X" should be inserted (where zero is the first field, one is the next, etc.)
If we then run this program using the following command:
csvfmt -f fmt.txt addr.csvThe output of this program would then be:
Name: John Jones, Phone: 555-1234, E-Mail: [email protected] Name: Sam Smith, Jr., Phone: 555-2345, E-Mail: [email protected]
We probably want to add some explanation before the list, and maybe some note afterwards. So, we can create text files (beg.txt and end.txt) to include what we want to see before and after the list.
Here is beg.txt:
Here is the list of people:And here is end.txt:
Please don't pester these people
Now, when we run this program with the command:
csvfmt -f fmt.txt -b beg.txt -eend.txt addr.csvThe output of this program would then be:
Here is the list of people: Name: John Jones, Phone: 555-1234, E-Mail: [email protected] Name: Sam Smith, Jr., Phone: 555-2345, E-Mail: [email protected] Please don't pester these people
The format file can be as many lines as necessary. The format file will be duplicated for each line within the CSV file, with the fields being inserted where indicated in the format file. The field name is of the form:
$[X:s]Where "X" is the field number (starting with zero), and "s" is an optional field size, which will be the minimum width of the field. This option is useful if you want the output to be in columns. The colon should only appear if you are specifying a width field.
The following arguments are supported by the wrpfmt program:
An XML file contains a collection of records. Each record contains attributes, which are similar to field names. You may specify the names of the collection and records, while the field (or attribute) names are derived from the input file, if the first line of the CSV file contains them, or from a separate file that you create. When exporting files from Excel or Access, you can usually have the first line contain the field names. If that isn't possible, then you can create a text file with the names of the fields and use that instead.
For example, suppose we have a CSV file (addr.csv) that contains a name, phone number, and e-mail address:
John Jones,555-1234,[email protected] "Sam Smith, Jr.",555-2345,"[email protected]"Since this file doesn't contain the field names, we will have to create another text file (flds.txt) with the field names:
Name,Phone,EMail
Now, if we run this program, it will ask us for some information. Enter the following information:
Prompt | Response |
---|---|
Input File | addr.csv |
Record Name | Person |
Collection Name | People |
Output File | addr.xml |
File with Field Names | flds.txt |
The output file (addr.xml) would contain the following:
<People> <Person> <Name>Name</Name> <Phone>Phone</Phone> <EMail>EMail</EMail> </Person> <Person> <Name>John Jones</Name> <Phone>555-1234</Phone> <EMail>[email protected]</EMail> </Person> <Person> <Name>Sam Smith, Jr.</Name> <Phone>555-2345</Phone> <EMail>[email protected]</EMail> </Person> </People>
We could have entered all the above information on the command line by entering the following command:
csvxml -f flds.txt -c People -r Person -at -oaddr.xml addr.csvThis command tells the program to read addr.csv and create addr.xml, calling the collection People, and each record Person. XML can be written in two ways: as a tree structure, or by specifying attributes within the XML tag. This command will create a tree formatted XML file. The output file should be the same as above when we were prompted for the information.
If we use the command requesting fields be specified as attributes:
csvxml -f flds.txt -c People -r Person -aa -oaddr.xml addr.csvThen the output file would look like this:
<People> <Person Name="Name" Phone="Phone" EMail="EMail" /> <Person Name="John Jones" Phone="555-1234" EMail="[email protected]" /> <Person Name="Sam Smith, Jr." Phone="555-2345" EMail="[email protected]" /> </People>
Both files contain the same information, and any program expecting and XML file should handle both files the same way. Which you use is more a matter of preference than anything else.
As mentioned above, you can run the csvxml program by just typing the name of the program (or double-clicking it). You will then be prompted for all the needed values. While this makes it easy to run, if you are planning on doing this often, you will probably want to set up a batch job to run it with all the arguments you normally want to use.
The csvxml program accepts the following arguments:
This section describes the functions that you can (or must) call when using the WrpCsv package to process CSV files. This section will only be of interest to a programmer who is intending to write a program to use this package to manipulate CSV files. The programmer should refer to the header file wrpcsv.h for a complete list of the symbolic constants and prototypes of all public functions.
In general, a negative return code always indicates some sort of error, although sometimes the error is minor and expected (e.g., EOF). The following symbollic constants are defined in the header file, and should be used within client programs when appropriate:
Symbolic Constant | Explanation |
---|---|
WRPCSV_INVALID | An invalid argument was passed to the routine returning this value |
WRPCSV_MEMORY | The program was unable to allocate sufficient memory to satisfy the request. |
WRPCSV_EOF | An End-of-file was encountered when reading from the input CSV file. This is a "normal" error code, since EOF is negative, but the program expected to eventually reach the end of the input file. |
WRPCSV_OK | This is the normal return code that indicates that the request was completed normally. |
When an error code is returned, the program can obtain a more detailed error messasge by calling one or both of the following routines:
Routine | Explanation |
---|---|
char *wrpCsvGetErrMsg(void) | This roution will return a more detailed message for the most recent error that was encountered. The return value is the error message, and the routine expectes no arguments. |
char *wrpCsvGetLine(int id) | This routine can be called at any time (in other words, not just after an error). It will return the most recent line that was read. The return value is the actual input line as it was read from the input file. |
This section contains the most common functions that will be used when processing CSV files. Less common routines will be listed later.
Routine | Explanation |
---|---|
int wrpCsvCreate(void) | This routine creates the actual CSV structure. It must be called before any requests to read data from the input file. The value returned is a handle, which is a unique value that must be passed to subsequent wrpcsv routines that take an id argument. |
int wrpCsvDestroy(int id) | This should be the last routine that is called. It causes the conrol structures created when wrpCsvCreate was called to be released to the system. If your program is going to terminate, then this routine isn't technically necessary, since the system will free up any allocated memory when a program terminates, but it is a good idea to include it anyway. Once this routine is called, the handle is no longer valid and cannot be used again. If you want to process more CSV files, you must call wrpCsvCreate again. |
int wrpCsvReadLine(int id, FILE *fp) | This routine causes the next line in the specified input file to be read into the CSV structure defined by id and parsed into individual fields. If this routine returns normally, you can extract individual fields. This is the routine you would normally call when you are reading a CSV file. |
int wrpCsvParseString(int id, char *p) | This routine parses the string that is passed into separate fields, and stored in the CSV structure defined by the id value. If this routine returns normally, you can extract individual fields. This is the routine you would normally call if you have already read the string to be parsed, or if your program builds a string which is then to be parsed. |
int wrpCsvGetNumFields(int id) | This function returns the number of fields that were contained in the most recent line parsed by the id CSV structure. |
char *wrpCsvGetField(int id, int ndx) | This function will return the requested field from the specified CSV structure. The field numbers start with zero, so a field value of one returns the second field. |
The following routines will affect all CSV structures. They should be used with caution, and called before creating any CSV structures. There is another set of functions (described later) that affect similar values for a single CSV structure. If you only need a value changed for a single CSV structure, then you should consider using one of those functions.
Routine | Explanation |
---|---|
int wrpCsvSetDefaultMaxLine(int val) | This function will change the default maximum line size for all subsequent CSV structures. The default is BUFSIZ, which is the largest amount of data that can be read in a single input function. |
int wrpCsvSetDefaultMaxFields(int val) | This function resets the default maximum number of fields that all subsequent fields can have in a single line. The default is 100. |
int wrpCsvSetDefaultDelims(char *delim) | This function resets the default field delimiters for all subsequent CSV structures Once set, any of the characters contained in this string will be interpreted as a delimeter for the fields within the CSV. If any of these characters appear within a field, then the entire field must be quoted in order to prevent the character from causing the field to be split. The default delimeter is the comma. |
int wrpCsvSetMaxEnt(int val) | This function sets the maximum number of CSV structures that can be created. The default is 100. |
int wrpCsvGetDefaultMaxLine(void) | This function returns the current value for maximum line size. |
int wrpCsvGetDefaultMaxFields(void) | This function returns the current value for the maximum number of fields. |
char *wrpCsvGetDefaultDelims(void) | This function returns the default field delimeters. If a NULL value is returned, then the default is the comma. |
int wrpCsvGetMaxEnt(void) | This function returns the current maximum number of CSV structures that can be created at any one time. If a CSV is created and then destroyed, the CSV does not figure in this count. |
The following routines can be used to affect the parameters of a specific CSV structure. In general, these routines should be used rather than the global routines, and they should be called before creating any CSV structures.
Routine | Explanation |
---|---|
int wrpCsvSetMaxLine(int id, int val) | This function will reset the maximum length of the line for the specified CSV structure. The argument id should be the value returned by wrpCreateCsv, and the argument val should be the new value to set this parameter. |
int wrpCsvSetMaxFields(int id, int val) | This function will reset the maximum number of fields allowed for the specified CSV structure. The argument id should be the value returned by wrpCreateCsv, and the argument val should be the new value to set this parameter. |
int wrpCsvSetDelims(int id, char *delim) | This function will reset the set of delimeter characters for the specified CSV structure. The argument id should be the value returned by wrpCreateCsv, and the argument val should be the new value to set this parameter. If the new value is NULL, then it will be reset to the default value of a comma. |