Featured

CosmosDB and H2 dual update.

Trying to get to the habit of blogging often this year. New year’s resolution, hoping I can make it as a habit.

One of the interesting thing I was struggling was to try out was a dual update to two data-sources and work through a model where spring batch kicks off the entire workflow. Azure cloud has always been a good place to try out some spring related stack with its native support.

So here is the problem statement that this post is for., get a open data csv file, parse it using spring batch and update an h2 database and replicate the same data in CosmosDB.

CosmosDB is a managed serverless offering from Azure and is a very robust NoSQLDB. Here is a high level overview of the solution

The csv file for the solution can be downloaded from the open data portal. The file has lot more information than what is needed for the solution. Spring boot provides a API tool set for reading csv files. Here is the code snippet for the reader.

    @Bean
    public FlatFileItemReader<RoadInput> reader() {
        return new FlatFileItemReaderBuilder<RoadInput>().name("MatchItemReader")
                .resource(new ClassPathResource("Rustic_Roads.csv")).delimited().names(DATA_STRINGS).
                fieldSetMapper(new BeanWrapperFieldSetMapper<RoadInput>() {
                    {
                        setTargetType(RoadInput.class);
                    }
                }).build();
    }

The processor for the input file that has to be converted to an entity object goes something like this

import me.sathish.myroadbatch.data.Road;
import me.sathish.myroadbatch.data.RoadInput;
import org.springframework.batch.item.ItemProcessor;
public class RoadDataProcessor implements ItemProcessor<RoadInput, Road> {
    @Override
    public Road process(RoadInput roadInput) throws Exception {
        Road road = new Road();
        road.setRoad_name(roadInput.getRoadName());
        road.setSurface_type(roadInput.getSurfaceType());
        road.setMiles(roadInput.getMiles());
        road.setTotal_miles(roadInput.getTotalMiles());
      road.setRustic_road_number(roadInput.getRusticRoadNumber());
        road.setCommunity(roadInput.getCommunity());
        road.setCounty(roadInput.getCounty());
        road.setLine_to_web_page(roadInput.getLineToWebpage());
        road.setShape_len(roadInput.getShapeLen());
        return road;
    }
}

The writer is a simple JDBC template write

    @Bean
    public JdbcBatchItemWriter<Road> writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Road>()
                .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
                .sql("Insert Into Road(road_name, surface_type, miles, total_miles, rustic_road_number," +
                        " community, county,line_to_web_page," +
                        " shape_len) VALUES(:road_name, :surface_type,:miles, " +
                        ":total_miles,:rustic_road_number,:community,:county,:line_to_web_page," +
                        ":shape_len)").dataSource(dataSource).build();
    }

Finally the step and job for the reminder of the code is


    @Bean
    public Job importRoadUserJob(RoadsJobCompletionNotificationListener listener, Step step1) {
        return jobBuilderFactory.get("importRoadUserJob").
                incrementer(new RunIdIncrementer()).listener(listener).
                flow(step1).end().build();
    }
    @Bean
    public Step step1(JdbcBatchItemWriter<Road> writer) {
        return stepBuilderFactory.
                get("step2").<RoadInput, Road>chunk(10).
                reader(reader()).
                processor(processor()).
                writer(writer).
                build();
    }

The repo write is based of spring flux., here is a repository code

package me.sathish.myroadbatch.repository;
import com.azure.spring.data.cosmos.repository.ReactiveCosmosRepository;
import me.sathish.myroadbatch.data.RoadForCosmos;
import org.springframework.stereotype.Repository;
import reactor.core.publisher.Flux;
@Repository
public interface RoadForCosmosRepo extends ReactiveCosmosRepository<RoadForCosmos, String> {
    Flux<RoadForCosmos> findByMiles(String miles);
}

The properties file for the application

spring.h2.console.enabled=true
spring.datasource.url=jdbc:h2:mem:mytestdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=sa
spring.datasource.password=
spring.jpa.database-platform=org.hibernate.dialect.H2Dialect
spring.jpa.show-sql=true
spring.jpa.hibernate.ddl-auto=update
# Specify the DNS URI of your Azure Cosmos DB.
azure.cosmos.uri=https://cosmosdbv7.documents.azure.com:443/
# Specify the access key for your database.
azure.cosmos.key=
# Specify the name of your database.
azure.cosmos.database=roadwayscsv
azure.cosmos.populateQueryMetrics=true

finally the listener that ties the dual update is

package me.sathish.myroadbatch.listener;
import me.sathish.myroadbatch.data.Road;
import me.sathish.myroadbatch.data.RoadForCosmos;
import me.sathish.myroadbatch.repository.RoadForCosmosRepo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import javax.persistence.EntityManager;
import java.util.List;
@Component
public class RoadsJobCompletionNotificationListener extends JobExecutionListenerSupport {
    private static final Logger LOGGER = LoggerFactory.getLogger(RoadsJobCompletionNotificationListener.class);
    private final EntityManager entityManager;
    @Autowired
    private RoadForCosmosRepo roadForCosmosRepo;
    @Autowired
    public RoadsJobCompletionNotificationListener(EntityManager entityManager) {
        this.entityManager = entityManager;
    }
    @Override
    public void afterJob(JobExecution jobExecution) {
        LOGGER.debug("Finishing up H2 insert. Going to Couch DB");
        this.roadForCosmosRepo.deleteAll().block();
        if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
            Object count = entityManager.createQuery("from Road").getResultList().get(0);
            List<Road> roadList = entityManager.createQuery("from Road ").getResultList();
            processForCosmosDB(roadList);
            System.out.println("The count is " + count);
        }
    }
    private void processForCosmosDB(List<Road> roadList) {
        this.roadForCosmosRepo.deleteAll().block();
        roadList.stream().forEach(road -> {
            RoadForCosmos roadForCosmos = new RoadForCosmos();
            roadForCosmos.setId(road.getId());
            roadForCosmos.setCounty(road.getCounty());
            roadForCosmos.setCommunity(road.getCommunity());
            roadForCosmos.setMiles(road.getMiles());
            roadForCosmos.setRoad_name(road.getRoad_name());
            roadForCosmos.setLine_to_web_page(road.getLine_to_web_page());
            roadForCosmos.setRustic_road_number(road.getRustic_road_number());
            final RoadForCosmos data = roadForCosmosRepo.save(roadForCosmos).block();
        });
    }
}

The complete code base in the github url. Hope this is useful when you are looking for a solution. Remember cosmosDB is a free managed service, I had a tough time getting an instance in the US region. So I had to get a my instance in Canada region.

Featured

Java 16 logging

One of the features I liked in Java16 are the logging changes. The updated logging API works on handlers. There are different handlers FileHandler, MemoryHandler and ConsoleHandler. Depending on the need, these handlers can be configured in runtime.

Here is a scenario., if you want to debug or pipe entry of the logging information going to a different stream based on the condition. You can configure this in runtime.

Here is a run down code where a log file will get created in runtime based on a condition.

package me.sathish;
import java.io.IOException;
import java.util.logging.FileHandler;
import java.util.logging.Level;
import java.util.logging.Logger;
public class LoggingSample {
private static Logger logger = Logger.getLogger("me.sathish.package");
private static FileHandler fh;
private static FileHandler ch;
static {
try {
fh = new FileHandler("mylog.txt");
ch = new FileHandler("Secondfile.txt");
} catch (IOException e) {
logger.log(Level.SEVERE, "Logging file is not found");
}
}
public static void main(String[] args) {
int start = 5;
int counter = 0;
logger.addHandler(fh);
logger.setLevel(Level.ALL);
for (int i = 0; i < start; i++) {
if (i == 3) {
logger.addHandler(ch);
logger.setLevel(Level.ALL);
}
logger.log(Level.SEVERE, "This is the counter " + counter++);
}
}
}
Logging Gist

In the code block we have a condition where a file handler is added when the counter reaches a value of 3. So now based on this condition we will have two file handlers that will get generated. The first one will have all the entries of the code.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE log SYSTEM "logger.dtd">
<log>
<record>
<date>2021-08-28T04:37:27.363135Z</date>
<millis>1630125447363</millis>
<nanos>135000</nanos>
<sequence>0</sequence>
<logger>me.sathish.package</logger>
<level>SEVERE</level>
<class>me.sathish.LoggingSample</class>
<method>main</method>
<thread>1</thread>
<message>This is the counter0</message>
</record>
<record>
<date>2021-08-28T04:37:27.412285Z</date>
<millis>1630125447412</millis>
<nanos>285000</nanos>
<sequence>1</sequence>
<logger>me.sathish.package</logger>
<level>SEVERE</level>
<class>me.sathish.LoggingSample</class>
<method>main</method>
<thread>1</thread>
<message>This is the counter1</message>
</record>
<record>
<date>2021-08-28T04:37:27.413234Z</date>
<millis>1630125447413</millis>
<nanos>234000</nanos>
<sequence>2</sequence>
<logger>me.sathish.package</logger>
<level>SEVERE</level>
<class>me.sathish.LoggingSample</class>
<method>main</method>
<thread>1</thread>
<message>This is the counter2</message>
</record>
<record>
<date>2021-08-28T04:37:27.414026Z</date>
<millis>1630125447414</millis>
<nanos>26000</nanos>
<sequence>3</sequence>
<logger>me.sathish.package</logger>
<level>SEVERE</level>
<class>me.sathish.LoggingSample</class>
<method>main</method>
<thread>1</thread>
<message>This is the counter3</message>
</record>
<record>
<date>2021-08-28T04:37:27.414967Z</date>
<millis>1630125447414</millis>
<nanos>967000</nanos>
<sequence>4</sequence>
<logger>me.sathish.package</logger>
<level>SEVERE</level>
<class>me.sathish.LoggingSample</class>
<method>main</method>
<thread>1</thread>
<message>This is the counter4</message>
</record>
</log>
view raw mylog.txt hosted with ❤ by GitHub

The second file has only the entries that are satisfying the condition.,

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE log SYSTEM "logger.dtd">
<log>
<record>
<date>2021-08-28T04:37:53.555130Z</date>
<millis>1630125473555</millis>
<nanos>130000</nanos>
<sequence>3</sequence>
<logger>me.sathish.package</logger>
<level>SEVERE</level>
<class>me.sathish.LoggingSample</class>
<method>main</method>
<thread>1</thread>
<message>This is the counter 3</message>
</record>
<record>
<date>2021-08-28T04:37:53.555951Z</date>
<millis>1630125473555</millis>
<nanos>951000</nanos>
<sequence>4</sequence>
<logger>me.sathish.package</logger>
<level>SEVERE</level>
<class>me.sathish.LoggingSample</class>
<method>main</method>
<thread>1</thread>
<message>This is the counter 4</message>
</record>
</log>
view raw Secondfile.txt hosted with ❤ by GitHub

Featured

The new COBOL program – Cloud formation

The new COBOL program is the cloud formation template, it does not matter that it is an YAML/JSON.

Few years ago, the main drive for bringing JAVA/.NET framework into the shop was to de couple the giant multi thousand line COBOL programs. These programs had a library of key words. This could be used and the main logic for those programs were based of specific Boolean flags or at least the COBOL programs that I migrated were of this kind of as such.

Fast forward to my new world of writing cloud formation templates, these are the next set of monoliths that are a few thousand lines. The few keywords/logics that can be written are called intrinsic function.

There are different sections in a COBOL program, the same applies to a cloud formation template, for example you have a program, divisions, section etc. The same parallels can be applied to a cloud formation template, there is a section for declaring parameters, conditions, resources and the output.

The next item in a cloud formation are the conditions., this is the section where the parameters passed to the cloud formation templates are evaluated, so that right resources can be provisioned. If you have seen a COBOL program flag logic, this section is the same.

The resources section of the cloud formation template uses the conditions that you defined earlier to provision the resources that is needed.

The output section of the cloud formation stitches all these above sections so that there is an user friendly output.

All these sections are closely tied together, if one fails or errs out, then the entire stack is a dud.

Thinking about doing a cloud formation template, as an engineer it is a challenging feeling to support this development practice. Is this the correct thing to do for the modern world? Cheers.

Java 18 – starter kit

Started doing some GISTS around Java 18., The new String text block to convert String objects to a JSON string., here is the quick GIST

package me.sathish;

public class Main {

    public static void main(String[] args) {
        String simpleJSONData = """
                {
                    "fullName": "%s",
                    "aadress": "%s",
                    "title": "%s"
                }
                """;
        System.out.println(simpleJSONData.formatted("Sathish Jayapal", "18 Java version lane", "awesome"));
    }
}

Resume Builder- Full stack Engineer

It is getting tricky being a full stack engineer in free lance world. So if you have a belt of experience, and want to tailor an resume stack with your skills, each engagement demands a new version of your resume based on the technical stack that is needed, I tght of doing a mini series to do a resume builder that kind of builds a WORD/PDF format of your resume.

Here is a very simple use-case list out.,

  • Build a resume based on the technical tool stack that you will need
  • Build a data-bank of technical tools you have worked on
  • Build a data-bank of technical project you have worked on
  • Request for a word/PDF format of your resume
  • Keep this portal free of cost. Only server-less based stack

Here is a high level flow

Here are the high level needs

  • S3 bucket from user to fill new skills and employment information
  • Amplify to host a mono repo the resume portal
  • Complete pipeline stack with CI/CD from AWS
  • Fargate to work on various micro-services
    • View Resume API
    • Save Resume API
  • Dynamo is the default server-less db
  • AWS lambda to enforce the tags for all the assets, so that I don’t get dinged suddenly with a high bill
  • Keep the cost to be free for the entire stack

For this week., am going to push the stack that returns the View Resume API information exchange

Quick look at the Tables that are going to be involved in the read API. No not going the noSql Route, will explain more when we go further in the series

Read API is boot based reactive stack that will change based on any entry that comes through the “save service” stack. I implemented the spring boot security for this iteration and we will change it more design tenants are added to the code base.

	
id	2
userName	"bar"
theme	"2"
summary	"Foo Bar Summary"
name	"Bar Name"
phone	"2344563456"
twitterHandle	"mybarteitter"
gitHubRepo	"http://github.com"
resumeUserJobList	
0	
id	1
company	"American Express"
designation	"Application Development Engineer"
startDate	"1999-05-01"
endDate	"2000-06-01"
responsibilities	
0	"First Job"
1	"Second Job"
currentJob	false
1	
id	2
company	"Novell Express"
designation	"Application Development Engineer"
startDate	"2000-06-01"
endDate	"2002-10-01"
responsibilities	
0	"First Job"
1	"Second Job"
currentJob	false
resumeUserEducationList	
0	
id	4
college	"MIT"
qualification	"Bachelors Degree"
startDate	"2000-06-01"
endDate	"2002-10-01"
summary	"This is the ultimate Degree"
keySkills	
0	"JAVA"
1	"iOS"

The code base for generating this JSON string is in GitHub. Until I get more information exchange working on this code base., will share it and discuss more here.

ZGC options with Java 16

Java 16 provides production ready ZGC garbage collection.

For example here is a sample code that is going to run out of memory with a regular JRE integration, I compared the code run with Java 8 and Java 16. There is an option to log the GC cycles with a VM option, but i felt it was too overwhelming for me to understand, so I took a simpler stats information.

package me.sathish;

import java.time.Duration;
import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.List;

public class MemorySample {
    public static void main(String[] args) {
        LocalDateTime startTime = LocalDateTime.now();
        System.out.println("Starting time--" + startTime);
        getMemoryStats();
        List<Integer> someList = new ArrayList<>();
        List<Integer> someList1 = new ArrayList<>();
        try {
            for (int i = 0; i < 100000000; i++) {
                someList.add(i);
                someList1.add(i);
            }
        } finally {
            Duration duration = Duration.between(LocalDateTime.now(), startTime);
            getMemoryStats();
            System.out.println("The time diff is --" + duration.getSeconds());
        }
    }

    /**
     * Code barrowed from 
     * https://www.viralpatel.net/getting-jvm-heap-size-used-memory-total-memory-using-java-runtime/
     */
    private static void getMemoryStats() {
        Runtime runtime = Runtime.getRuntime();
        int mb = 1024 * 1024;
        System.out.println("##### Heap utilization statistics [MB] #####");
        //Print free memory
        System.out.println("Used Memory:"
                + (runtime.totalMemory() - runtime.freeMemory()) / mb);
        System.out.println("Free Memory:"
                + runtime.freeMemory() / mb);
        System.out.println("Total Memory:" + runtime.totalMemory() / mb);
        System.out.println("Max Memory:" + runtime.maxMemory() / mb);
    }
}

Now with JAVA 8 the increase the heap VM options to –Xmx4096m– took a total of 18 seconds to complete the above program

##### Start Heap utilization statistics [MB] #####
Used Memory:2
Free Memory:189
Total Memory:192
Max Memory:4096


##### End Heap utilization statistics [MB] #####
Used Memory:3857
Free Memory:238
Total Memory:4096
Max Memory:4096
The time diff is ---18

Enter new JAVA 16 world., with VM options -Xmx4096m -XX:+UseZGC

Total Memory:192
Max Memory:4096
##### Heap utilization statistics [MB] #####
Used Memory:3492
Free Memory:604
Total Memory:4096
Max Memory:4096
The time diff is ---29
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.Arrays.copyOf(Arrays.java:3511)
	at java.base/java.util.Arrays.copyOf(Arrays.java:3480)
	at java.base/java.util.ArrayList.grow(ArrayList.java:237)
	at java.base/java.util.ArrayList.grow(ArrayList.java:244)
	at java.base/java.util.ArrayList.add(ArrayList.java:454)
	at java.base/java.util.ArrayList.add(ArrayList.java:467)

Another set of options that JAVA 16 gives., new VM options for this run was

-XX:+UseZGC -XX:ZUncommitDelay=10 -XX:MaxHeapSize=10000000024,

A slick 10 seconds to complete everything

##### Start Heap utilization statistics [MB] #####
Used Memory:8
Free Memory:184
Total Memory:192
Max Memory:9538

##### End Heap utilization statistics [MB] #####
Used Memory:7548
Free Memory:0
Total Memory:7548
Max Memory:9538
The time diff is ---10

With JAVA 16 the uncommitDelay can be reduced, so with the following VM options

-XX:+UseZGC -Xmx9538m -XX:ZUncommitDelay=1, the code finished with a 44 second runtime.

Take note of the free memory.

##### Start Heap utilization statistics [MB] #####
Used Memory:8
Free Memory:184
Total Memory:192
Max Memory:9538
##### End Heap utilization statistics [MB] #####
Used Memory:6938
Free Memory:0
Total Memory:6938
Max Memory:9538
The time diff is ---44

So tried one other option with the a cool concurrent GC thread option so the new VM options are

XX:MaxHeapSize=10000000024 -XX:+UseParallelGC -XX:ConcGCThreads=4 -XX:ZUncommitDelay=1

The output for the VM options. The code took the time to complete, but the amount of free memory was around full 1038MB

##### Start Heap utilization statistics [MB] #####
Used Memory:4
Free Memory:179
Total Memory:184
Max Memory:8478
##### End Heap utilization statistics [MB] #####
Used Memory:3862
Free Memory:1038
Total Memory:4901
Max Memory:8478
The time diff is ---115

finally, one last VM options and the output, with a balance of memory usage

-XX:+UseZGC -XX:MaxHeapSize=5126M -XX:ZUncommitDelay=1

##### Start Heap utilization statistics [MB] #####
Used Memory:8
Free Memory:184
Total Memory:192
Max Memory:5126
##### End Heap utilization statistics [MB] #####
Used Memory:4674
Free Memory:452
Total Memory:5126
Max Memory:5126
The time diff is ---52

So nice production ready options with JAVA 16 to work with the new ZGC GC model.

Finally always a great watch.,