1. Introduction
In this tutorial, we'll take a quick look at Univocity Parsers, a library for parsing CSV, TSV, and fixed-width files in Java.
We'll start with the basics of reading and writing files before moving on to reading and writing files to and from Java beans. Then, we'll take a quick look at the configuration options before wrapping up.
2. Setup
To use the parsers, we need to add the latest Maven dependency to our project pom.xml file:
<dependency> <groupId>com.univocity</groupId> <artifactId>univocity-parsers</artifactId> <version>2.8.4</version> </dependency>
3. Basic Usage
3.1. Reading
In Univocity, we can quickly parse an entire file into a collection of String arrays that represent each line in the file.
First, let's parse a CSV file by providing a Reader to our CSV file into a CsvParser with default settings:
try (Reader inputReader = new InputStreamReader(new FileInputStream( new File("src/test/resources/productList.csv")), "UTF-8")) { CsvParser parser = new CsvParser(new CsvParserSettings()); List<String[]> parsedRows = parser.parseAll(inputReader); return parsedRows; } catch (IOException e) { // handle exception }
We can easily switch this logic to parse a TSV file by switching to TsvParser and providing it with a TSV file.
It's only slightly more complicated to process a fixed-width file. The primary difference is that we need to provide our field widths in the parser settings.
Let's read a fixed-width file by providing a FixedWidthFields object to our FixedWidthParserSettings:
try (Reader inputReader = new InputStreamReader(new FileInputStream( new File("src/test/resources/productList.txt")), "UTF-8")) { FixedWidthFields fieldLengths = new FixedWidthFields(8, 30, 10); FixedWidthParserSettings settings = new FixedWidthParserSettings(fieldLengths); FixedWidthParser parser = new FixedWidthParser(settings); List<String[]> parsedRows = parser.parseAll(inputReader); return parsedRows; } catch (IOException e) { // handle exception }
3.2. Writing
Now that we've covered reading files with the parsers, let's learn how to write them.
Writing files is very similar to reading them in that we provide a Writer along with our desired settings to the parser that matches our file type.
Let's create a method to write files in all three possible formats:
public boolean writeData(List<Object[]> products, OutputType outputType, String outputPath) { try (Writer outputWriter = new OutputStreamWriter(new FileOutputStream(new File(outputPath)),"UTF-8")){ switch(outputType) { case CSV: CsvWriter writer = new CsvWriter(outputWriter, new CsvWriterSettings()); writer.writeRowsAndClose(products); break; case TSV: TsvWriter writer = new TsvWriter(outputWriter, new TsvWriterSettings()); writer.writeRowsAndClose(products); break; case FIXED_WIDTH: FixedWidthFields fieldLengths = new FixedWidthFields(8, 30, 10); FixedWidthWriterSettings settings = new FixedWidthWriterSettings(fieldLengths); FixedWidthWriter writer = new FixedWidthWriter(outputWriter, settings); writer.writeRowsAndClose(products); break; default: logger.warn("Invalid OutputType: " + outputType); return false; } return true; } catch (IOException e) { // handle exception } }
As with reading files, writing CSV files and TSV files are nearly identical. For fixed-width files, we have to provide the field width to our settings.
3.3. Using Row Processors
Univocity provides a number of row processors we can use and also provides the ability for us to create our own.
To get a feel for using row processors, let's use the BatchedColumnProcessor to process a larger CSV file in batches of five rows:
try (Reader inputReader = new InputStreamReader(new FileInputStream(new File(relativePath)), "UTF-8")) { CsvParserSettings settings = new CsvParserSettings(); settings.setProcessor(new BatchedColumnProcessor(5) { @Override public void batchProcessed(int rowsInThisBatch) {} }); CsvParser parser = new CsvParser(settings); List<String[]> parsedRows = parser.parseAll(inputReader); return parsedRows; } catch (IOException e) { // handle exception }
To use this row processor, we define it in our CsvParserSettings and then all we have to do is call parseAll.
3.4. Reading and Writing into Java Beans
The list of String arrays is alright, but we're often working with data in Java beans. Univocity also allows for reading and writing into specially annotated Java beans.
Let's define a Product bean with the Univocity annotations:
public class Product { @Parsed(field = "product_no") private String productNumber; @Parsed private String description; @Parsed(field = "unit_price") private float unitPrice; // getters and setters }
The main annotation is the @Parsed annotation.
If our column heading matches the field name, we can use @Parsed without any values specified. If our column heading differs from the field name we can specify the column heading using the field property.
Now that we've defined our Product bean, let's read our CSV file into it:
try (Reader inputReader = new InputStreamReader(new FileInputStream( new File("src/test/resources/productList.csv")), "UTF-8")) { BeanListProcessor<Product> rowProcessor = new BeanListProcessor<Product>(Product.class); CsvParserSettings settings = new CsvParserSettings(); settings.setHeaderExtractionEnabled(true); settings.setProcessor(rowProcessor); CsvParser parser = new CsvParser(settings); parser.parse(inputReader); return rowProcessor.getBeans(); } catch (IOException e) { // handle exception }
We first constructed a special row processor, BeanListProcessor, with our annotated class. Then, we provided that to the CsvParserSettings and used it to read in a list of Products.
Next, let's write our list of Products out to a fixed-width file:
try (Writer outputWriter = new OutputStreamWriter(new FileOutputStream(new File(outputPath)), "UTF-8")) { BeanWriterProcessor<Product> rowProcessor = new BeanWriterProcessor<Product>(Product.class); FixedWidthFields fieldLengths = new FixedWidthFields(8, 30, 10); FixedWidthWriterSettings settings = new FixedWidthWriterSettings(fieldLengths); settings.setHeaders("product_no", "description", "unit_price"); settings.setRowWriterProcessor(rowProcessor); FixedWidthWriter writer = new FixedWidthWriter(outputWriter, settings); writer.writeHeaders(); for (Product product : products) { writer.processRecord(product); } writer.close(); return true; } catch (IOException e) { // handle exception }
The notable difference is that we're specifying our column headers in our settings.
4. Settings
Univocity has a number of settings we can apply to the parsers. As we saw earlier, we can use settings to apply a row processor to the parsers.
There are many other settings that can be changed to suit our needs. Although many of the configurations are common across the three file types, each parser also has format-specific settings.
Let's adjust our CSV parser settings to put some limits on the data we're reading:
CsvParserSettings settings = new CsvParserSettings(); settings.setMaxCharsPerColumn(100); settings.setMaxColumns(50); CsvParser parser = new CsvParser(new CsvParserSettings());
5. Conclusion
In this quick tutorial, we learned the basics of parsing files using the Univocity library.
We learned how to read and write files both into lists of string arrays and Java beans. Before, we got into Java beans, we took a quick look at using different row processors. Finally, we briefly touched on how to customize the settings.
As always, the source code is available over on GitHub.