1. Introduction
In this article we’re going to focus on a practical, code-focused intro to Spring Batch. Spring Batch is a processing framework designed for robust execution of jobs.
It’s current version 3.0, which supports Spring 4 and Java 8. It also accommodates JSR-352, which is new java specification for batch processing.
Here are a few interesting and practical use-cases of the framework.
2. Workflow Basics
Spring batch follows the traditional batch architecture where a job repository does the work of scheduling and interacting with the job.
A job can have more than one steps – and every step typically follows the sequence of reading data, processing it and writing it.
And of course the framework will do most of the heavy lifting for us here – especially when it comes to the low level persistence work of dealing with the jobs – using sqlite for the job repository.
2.1. Our Example Usecase
The simple usecase we’re going to tackle here is – we’re going to migrate some financial transaction data from CSV to XML.
The input file has a very simple structure – it contains a transaction per line, made up of: a username, the user id, the date of the transaction and the amount:
username, userid, transaction_date, transaction_amount devendra, 1234, 31/10/2015, 10000 john, 2134, 3/12/2015, 12321 robin, 2134, 2/02/2015, 23411
3. The Maven pom
Dependencies required for this project are spring core, spring batch, and sqlite jdbc connector:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.baeldung</groupId> <artifactId>spring-batch-intro</artifactId> <version>0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>spring-batch-intro</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <spring.version>4.2.0.RELEASE</spring.version> <spring.batch.version>3.0.5.RELEASE</spring.batch.version> <sqlite.version>3.8.11.2</sqlite.version> </properties> <dependencies> <!-- SQLite database driver --> <dependency> <groupId>org.xerial</groupId> <artifactId>sqlite-jdbc</artifactId> <version>${sqlite.version}</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-oxm</artifactId> <version>${spring.version}</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-jdbc</artifactId> <version>${spring.version}</version> </dependency> <dependency> <groupId>org.springframework.batch</groupId> <artifactId>spring-batch-core</artifactId> <version>${spring.batch.version}</version> </dependency> </dependencies> </project>
4. Spring Batch Config
First thing we’ll do is to configure Spring Batch with XML:
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:jdbc="http://www.springframework.org/schema/jdbc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.2.xsd http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc-4.2.xsd "> <!-- connect to SQLite database --> <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource"> <property name="driverClassName" value="org.sqlite.JDBC" /> <property name="url" value="jdbc:sqlite:repository.sqlite" /> <property name="username" value="" /> <property name="password" value="" /> </bean> <!-- create job-meta tables automatically --> <jdbc:initialize-database data-source="dataSource"> <jdbc:script location="org/springframework/batch/core/schema-drop-sqlite.sql" /> <jdbc:script location="org/springframework/batch/core/schema-sqlite.sql" /> </jdbc:initialize-database> <!-- stored job-meta in memory --> <!-- <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean"> <property name="transactionManager" ref="transactionManager" /> </bean> --> <!-- stored job-meta in database --> <bean id="jobRepository" class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean"> <property name="dataSource" ref="dataSource" /> <property name="transactionManager" ref="transactionManager" /> <property name="databaseType" value="sqlite" /> </bean> <bean id="transactionManager" class= "org.springframework.batch.support.transaction.ResourcelessTransactionManager" /> <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher"> <property name="jobRepository" ref="jobRepository" /> </bean> </beans>
Of course a Java configuration is also available:
@Configuration @EnableBatchProcessing public class SpringConfig { @Value("org/springframework/batch/core/schema-drop-sqlite.sql") private Resource dropReopsitoryTables; @Value("org/springframework/batch/core/schema-sqlite.sql") private Resource dataReopsitorySchema; @Bean public DataSource dataSource() { DriverManagerDataSource dataSource = new DriverManagerDataSource(); dataSource.setDriverClassName("org.sqlite.JDBC"); dataSource.setUrl("jdbc:sqlite:repository.sqlite"); return dataSource; } @Bean public DataSourceInitializer dataSourceInitializer(DataSource dataSource) throws MalformedURLException { ResourceDatabasePopulator databasePopulator = new ResourceDatabasePopulator(); databasePopulator.addScript(dropReopsitoryTables); databasePopulator.addScript(dataReopsitorySchema); databasePopulator.setIgnoreFailedDrops(true); DataSourceInitializer initializer = new DataSourceInitializer(); initializer.setDataSource(dataSource); initializer.setDatabasePopulator(databasePopulator); return initializer; } private JobRepository getJobRepository() throws Exception { JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean(); factory.setDataSource(dataSource()); factory.setTransactionManager(getTransactionManager()); factory.afterPropertiesSet(); return (JobRepository) factory.getObject(); } private PlatformTransactionManager getTransactionManager() { return new ResourcelessTransactionManager(); } public JobLauncher getJobLauncher() throws Exception { SimpleJobLauncher jobLauncher = new SimpleJobLauncher(); jobLauncher.setJobRepository(getJobRepository()); jobLauncher.afterPropertiesSet(); return jobLauncher; } }
5. Spring Batch Job Config
Let’s now write our job description for the CSV to XML work:
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:batch="http://www.springframework.org/schema/batch" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-3.0.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.2.xsd "> <import resource="spring.xml" /> <bean id="record" class="org.baeldung.spring_batch_intro.model.Transaction"></bean> <bean id="itemReader" class="org.springframework.batch.item.file.FlatFileItemReader"> <property name="resource" value="input/record.csv" /> <property name="lineMapper"> <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper"> <property name="lineTokenizer"> <bean class= "org.springframework.batch.item.file.transform.DelimitedLineTokenizer"> <property name="names" value="username,userid,transactiondate,amount" /> </bean> </property> <property name="fieldSetMapper"> <bean class="org.baeldung.spring_batch_intro.service.RecordFieldSetMapper" /> </property> </bean> </property> </bean> <bean id="itemProcessor" class="org.baeldung.spring_batch_intro.service.CustomItemProcessor" /> <bean id="itemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter"> <property name="resource" value="file:xml/output.xml" /> <property name="marshaller" ref="recordMarshaller" /> <property name="rootTagName" value="transactionRecord" /> </bean> <bean id="recordMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller"> <property name="classesToBeBound"> <list> <value>org.baeldung.spring_batch_intro.model.Transaction</value> </list> </property> </bean> <batch:job id="firstBatchJob"> <batch:step id="step1"> <batch:tasklet> <batch:chunk reader="itemReader" writer="itemWriter" processor="itemProcessor" commit-interval="10"> </batch:chunk> </batch:tasklet> </batch:step> </batch:job> </beans>
And of course, the similar java based job config:
public class SpringBatchConfig { @Autowired private JobBuilderFactory jobs; @Autowired private StepBuilderFactory steps; @Value("input/record.csv") private Resource inputCsv; @Value("file:xml/output.xml") private Resource outputXml; @Bean public ItemReader<Transaction> itemReader() throws UnexpectedInputException, ParseException { FlatFileItemReader<Transaction> reader = new FlatFileItemReader<Transaction>(); DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(); String[] tokens = { "username", "userid", "transactiondate", "amount" }; tokenizer.setNames(tokens); reader.setResource(inputCsv); DefaultLineMapper<Transaction> lineMapper = new DefaultLineMapper<Transaction>(); lineMapper.setLineTokenizer(tokenizer); lineMapper.setFieldSetMapper(new RecordFieldSetMapper()); reader.setLineMapper(lineMapper); return reader; } @Bean public ItemProcessor<Transaction, Transaction> itemProcessor() { return new CustomItemProcessor(); } @Bean public ItemWriter<Transaction> itemWriter(Marshaller marshaller) throws MalformedURLException { StaxEventItemWriter<Transaction> itemWriter = new StaxEventItemWriter<Transaction>(); itemWriter.setMarshaller(marshaller); itemWriter.setRootTagName("transactionRecord"); itemWriter.setResource(outputXml); return itemWriter; } @Bean public Marshaller marshaller() { Jaxb2Marshaller marshaller = new Jaxb2Marshaller(); marshaller.setClassesToBeBound(new Class[] { Transaction.class }); return marshaller; } @Bean protected Step step1(ItemReader<Transaction> reader, ItemProcessor<Transaction, Transaction> processor, ItemWriter<Transaction> writer) { return steps.get("step1").<Transaction, Transaction> chunk(10) .reader(reader).processor(processor).writer(writer).build(); } @Bean(name = "firstBatchJob") public Job job(@Qualifier("step1") Step step1) { return jobs.get("firstBatchJob").start(step1).build(); } }
OK, so now that we have the whole config, let’s break it down and start discussing it.
5.1. Read Data and Create Objects with ItemReader
First we configured the cvsFileItemReader which will read the data from the record.csv and convert it into the Transaction object:
@SuppressWarnings("restriction") @XmlRootElement(name = "transactionRecord") public class Transaction { private String username; private int userId; private Date transactionDate; private double amount; /* getters and setters for the attributes */ @Override public String toString() { return "Transaction [username=" + username + ", userId=" + userId + ", transactionDate=" + transactionDate + ", amount=" + amount + "]"; } }
To do so – it uses a custom mapper:
public class RecordFieldSetMapper implements FieldSetMapper<Transaction> { public Transaction mapFieldSet(FieldSet fieldSet) throws BindException { SimpleDateFormat dateFormat = new SimpleDateFormat("dd/MM/yyyy"); Transaction transaction = new Transaction(); transaction.setUsername(fieldSet.readString("username")); transaction.setUserId(fieldSet.readInt(1)); transaction.setAmount(fieldSet.readDouble(3)); String dateString = fieldSet.readString(2); try { transaction.setTransactionDate(dateFormat.parse(dateString)); } catch (ParseException e) { e.printStackTrace(); } return transaction; } }
5.2. Processing Data with ItemProcessor
We have created our own item processor, CustomItemProcessor. This doesn’t process anything related to the transaction object – all it does is passes the original object coming from reader to the writer:
public class CustomItemProcessor implements ItemProcessor<Transaction, Transaction> { public Transaction process(Transaction item) { return item; } }
5.3. Writing Objects to the FS with ItemWriter
Finally, we are going to store this transaction into an xml file located at xml/output.xml:
<bean id="itemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter"> <property name="resource" value="file:xml/output.xml" /> <property name="marshaller" ref="recordMarshaller" /> <property name="rootTagName" value="transactionRecord" /> </bean>
5.4. Configuring the Batch Job
So all we have to do is connect the dots with a job – using the batch:job syntax.
Note the commit-interval – that’s the number of transactions to be kept in memory before committing the batch to the itemWriter; it will hold the transactions in memory until that point (or until the end of the input data is encountered):
<batch:job id="firstBatchJob"> <batch:step id="step1"> <batch:tasklet> <batch:chunk reader="itemReader" writer="itemWriter" processor="itemProcessor" commit-interval="10"> </batch:chunk> </batch:tasklet> </batch:step> </batch:job>
5.5. Running the Batch Job
That’s it – let’s now set up and run everything:
public class App { public static void main(String[] args) { // Spring Java config AnnotationConfigApplicationContext context = new AnnotationConfigApplicationContext(); context.register(SpringConfig.class); context.register(SpringBatchConfig.class); context.refresh(); JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher"); Job job = (Job) context.getBean("firstBatchJob"); System.out.println("Starting the batch job"); try { JobExecution execution = jobLauncher.run(job, new JobParameters()); System.out.println("Job Status : " + execution.getStatus()); System.out.println("Job completed"); } catch (Exception e) { e.printStackTrace(); System.out.println("Job failed"); } } }
6. Conclusion
This tutorial gives you a basic idea of how to work with Spring Batch and how to use it in a simple usecase.
It shows how you can easily develop your batch processing pipeline and how you can customize different stages in reading, processing and writing.
The full implementation of this tutorial can be found in the github project – this is an Eclipse based project, so it should be easy to import and run as it is.