Cell processors

Cell processors are an integral part of reading and writing with Super CSV - they automate the data type conversions, and enforce constraints. They implement the chain of responsibility design pattern - each processor has a single, well-defined purpose and can be chained together with other processors to fully automate all of the required conversions and constraint validation for a single CSV column.

A typical CellProcessor configuration for reading the following CSV file

name,birthDate,weight
John,25/12/1946,83.5
Alice,06/08/1958,
Bob,01/03/1984,65.0,

might look like the following:

public static final CellProcessor[] PROCESSORS = new CellProcessor[] { 
    null, 
    new ParseDate("dd/MM/yyyy"), 
    new Optional(new ParseDouble()) };

The number of elements in the CellProcessor array must match up with the number of columns to be processed - the file has 3 columns, so the CellProcessor array has 3 elements.

  1. The first processor (for the name column) is null, which indicates that no processing is required (the String is used unchanged). Semantically, it might have been better to replace that with new Optional(), which means the same thing. If we wanted to guarantee that name was supplied (i.e. it's mandatory), then we could have used new NotNull() instead (which works because empty String ("") is converted to null when reading).
  2. The second processor (for the birthDate column) is new ParseDate("dd/MM/yyyy"), which indicates that that column is mandatory, and should be parsed as a Date using the supplied format.
  3. The third processor (for the weight column) is new Optional(new ParseDouble()), which indicates that the column is optional (the value will be null if the column is empty), but if it's supplied then parse it as a Double.

Cell processor overview

  • processors are similar to servlet filters in JEE - they can be chained together, and they can modify the data that's passed along the chain
  • processors are executed from left to right (but yes, the processor's constructors are invoked from right to left!)
  • the number of elements in the CellProcessor array must match up with the number of columns to be processed
  • a null processor means no processing is required
  • most processors expect input to be non-null - if it's an optional column then chain an Optional() processor before it, e.g. new Optional(new ParseDouble)). Further processing (processors chained after Optional) will be skipped if the value to be read/written is null.
  • all processors throw SuperCsvCellProcessorException if they encounter data they cannot process (this shouldn't normally happen if your processor configuration is correct)
  • constraint-validating processors throw SuperCsvConstraintViolationException if the value does not satisfy the constraint

Available cell processors

The examples above just touch the surface of what's possible with cell processors. The following table shows all of the processors available for reading, writing, and constraint validation.

ReadingWritingReading / WritingConstraints
ParseBigDecimalFmtBoolCollectorDMinMax
ParseBoolFmtDateConvertNullToEquals
ParseCharFmtNumberHashMapperForbidSubStr
ParseDateOptionalIsElementOf
ParseDoubleStrReplaceIsIncludedIn
ParseIntTokenLMinMax
ParseLongTrimNotNull
TruncateRequireHashCode
RequireSubStr
Strlen
StrMinMax
StrNotNullOrEmpty
StrRegEx
Unique
UniqueHashCode