Wednesday, March 17, 2021

Compile-time constants

This seemingly simple concept has some quirks attached to it.

At the outset, it seemed like (or I understood it as) the variables that are static & final are compile time constants. 

I stumbled upon this when experimenting with inner classes. I will use the same example here to illustrate.

// Main.java
class OuterClass {

    class InnerClass {

        static final String CONSTANT = new String("A CONSTANT VALUE");

    }

}

So we have a static final String variable CONSTANT in the above code and is initialized by creating a new String object. We know that the value of CONSTANT cannot be changed after this initialization. 

But this does not make CONSTANT a compile time constant. Compiling the above code gives this verdict

>javac Main.java
Main.java:5: error: Illegal static declaration in inner class OuterClass.InnerClass
        static final String CONSTANT = new String("A CONSTANT VALUE");
                            ^
  modifier 'static' is only allowed in constant variable declarations
1 error

A little exploration and we find that the issue is in the way CONSTANT is initialized - using new String(). If we instead initialize it with String literal, it will be a compile time constant. Below code compiles without complaining

// Main1.java
class OuterClass {

    class InnerClass {

        static final String CONSTANT = "A CONSTANT VALUE";

    }

}

as does the below code, in which an expression concatenating two strings is assigned to CONSTANT 

// Main2.java
class OuterClass {

    class InnerClass {

        static final String CONSTANT = "A CONSTANT VALUE" + "THAT CANNOT CHANGE";

    }

}

But the below code which assigns null to CONSTANT complains its not compile time constant.

// Main3.java
class OuterClass {

    class InnerClass {

        static final String CONSTANT = null;

    }

}

Definition of a constant is given here 

A constant variable is a final variable of primitive type or type String that is initialized with a constant expression (§15.28). 

From this definition, a constant variable must be

  • Declared as final
  • Should be of primitive type or a String
  • Must be initialized when it is declared
  • And must be initialized with a constant expression

The definition of constant expression is given here

A key takeaway from this definition is a constant variable need not be static.

Stealing from the examples from the definition page, all the below are valid compile time constants

// Main4.java
class OuterClass {

    class InnerClass {

        final boolean CONSTANT1 = true;
        static final short CONSTANT2 = (short)(1*2*3*4*5*6);
        final int CONSTANT3 = Integer.MAX_VALUE / 2;
        static final double CONSTANT4 = 2.0 * Math.PI;
        final String CONSTANT5 = "The integer " + Long.MAX_VALUE + " is mighty big.";
        
    }
}

  

Sealed classes enhancement in Java 16

Sealed classes remains as preview feature in Java 16, which was released yesterday (16-Mar-2021). 

Only enhancement to the sealed classes is that - Java compiler is enhanced to identify disjoint types by running through the sealed classes hierarchy. Compilation now throws an error when there is casting between disjoint types. 

Lets explore this through an example. 

Consider the below code fragment:

//Main.java
interface Pet {

}

interface Robot {

}

public class Main {
    
    public static void check(Robot r) {
        Pet p = (Pet) r;
    }
    
    public static void main(String[] args) {
        
		check(new Robot() {
		}); // Throws ClassCastException at runtime

		class RoboDog implements Robot, Pet {
		} // Class that implements both Robot and Pet

		check(new RoboDog()); // Valid

    }
}

Here we have two interface types Pet & Robot which are unrelated. 

Compiler does not flag any error and casting Robot type to Pet is allowed. 

This is because, we could have a class that implements both these interfaces and it is valid to cast an object of that class to Pet at runtime. 

The local class RoboDog in the above code does exactly that - implements both the interfaces Robot & Pet and its perfectly valid to cast an instance of RoboDog to Pet type. 

But there are instances where the compiler can statically establish that the two types are disjoint (can never be related) and throw a compilation error when casting is performed between such types. 

To understand how sealed classes helps the compiler to establish two types as disjoint, consider the below code:

//Main1.java
interface Pet {

}

interface Robot {

}

final class RoboDog implements Robot {

}

public class Main {

    public static void check(Robot r) {
        Pet p = (Pet) r;
    }

    public static void main(String[] args) {

        check(new RoboDog()); // Throws ClassCastException at runtime

    }
}

Here, though the RoboDog  

  • implements only Robot and not Pet 
  • and is final and cannot be extended any further

It is very much possible to have a type at runtime that implements both Pet and Robot, and casting Robot type to Pet should be allowed for such scenarios.  

The above code compiles without error, but throws a ClassCastException at runtime because RoboDog is not a Pet type. 

If we make the Robot and RoboDog sealed, then no other types are allowed in the hierarchy. The compiler can establish in this case that Pet & Robot are disjoint (since no other type can be of type Robot anymore) and the compiler throws an error stating that casting of incompatible types is not allowed, where we are casting Robot to Pet.

Below code shows this implementation with sealed classes:

//Main2.java
interface Pet {

}

sealed interface Robot permits RoboDog {

}

final class RoboDog implements Robot {

}

public class Main {

    public static void check(Robot r) {
        Pet p = (Pet) r;
    }

    public static void main(String[] args) {

        check(new RoboDog());

    }
}

Here Robot is sealed allowing only RoboDog subtype. And RoboDog is final, allowing no further subtypes. 

We get a compilation error when this code is compiled on Java 16.

>javac --enable-preview --release 16 Main2.java
Main.java:16: error: incompatible types: Robot cannot be converted to Pet
                Pet p = (Pet) r;
                              ^
Note: Main.java uses preview language features.
Note: Recompile with -Xlint:preview for details.
1 error

This is the only enhancement for sealed classes in Java 16 as compared to its behavior in Java 15. 


Behavior in Java 15:

Compiling the same code on Java 15 does not give a compile time error. 

And since the Robot type hierarchy is fully sealed - not further subtypes of Robot is possible, This code is bound to throw ClassCastException at all times.

Output of compiling the code in Java 15: 

>javac --enable-preview --release 15 Main2.java
Note: Main.java uses preview language features.
Note: Recompile with -Xlint:preview for details.

Code gets compiled successfully on Java 15.


Tuesday, March 16, 2021

C1 & C2 Compilers

JVM implementation contains two JIT compilers, commonly referred to as C1 and C2 compilers.

The JIT compilation itself takes processing time and memory. The more aggressive the optimization it applies, the more resources in terms of CPU and memory it compilation is going to take.

C1 compiler runs quickly but it results in produces less optimized code

C2 compiler takes more time and resources, but results in producing well optimized code.


C1 compiler

This is more suitable for client side applications which typically have fewer resources (CPU, memory capacity) at their disposal. They require faster startup and should typically be more responsive to end user. Also client applications runs for a shorter period of time. Typically they run for a few hours before they are shutdown.

C1 compiler does simple optimizations that are not resource heavy, so that it does not introduce noticeable impact to the startup time or the responsiveness of the application.

C1 compiler is traditionally referred to as client compiler and till early releases of Java 7, it is enabled with -client option when starting the JVM. 


C2 compiler

This is most suitable for server side applications. Applications running on server typically has more resources (CPU, memory) at their disposal. Server applications typically run for a longer period of time. Its not uncommon to see servers applications kept running for months without getting restarted.

C2 compiler does aggressive optimization of bytecodes. This might bring in an additional lag when the compiler performing its job, but the long run efficiency from this aggressively compiled code outweighs the overhead incurred. 

C2 compiler is traditionally referred to as server compiler and till early releases of Java 7, it is enabled with -server option when starting the JVM. 


Choosing C1 vs. C2

Often tuning JVM for JIT compilation is about making a choice between C1 vs. C2. There is no default rule to state which one is best for which application. 

For GUI applications, where responsiveness is an important measure of performance of the application C1 might better at startup and initial load. C2 might bring in a slight improvement in responsiveness over time, but if this improvement will make a noticeable difference would depend on the nature and complexity of the application.

Server side applications and batch applications typically benefit from the use of C2 compiler. Longer the program runs and more no. of times a section of code is getting executed, cumulative benefit accrued is bound to outweigh the overhead caused by compilation.


JIT compiler options

In Java versions 7 & prior, there are 3 options to choose from, to select a compiler when starting JVM

  1. -client
  2. -server
  3. -d64

We have seen the -client and -server options in this post. -d64 can be thought of as a synonym for -server option. Only difference is -server can be specified for both 32 bit and 64 bit OS whereas -d64 can be specified only for 64 bit OS. JVM will throw an error if we specify -d64 on a 32 bit system

Even when specifying an option to use, there is no guarantee that the JVM uses it. This is because for some hardware architecture, JVM might have implementation for only one of the compilers. In this case, the option we specify gets ignored and the only compiler available will be used


Default compiler

When we do not specify an option at startup, JVM determines the default compiler to use. Its makes this decision based on

  1. What is the OS used
  2. Is it 32 bit or 64 bit 
  3. No. of CPUs on the system

The logic used to determine is a bit complex. But as a thumb rule, we can take that if the system has 64 bit OS or has 2 of more CPU's, -server option is used. But be aware that this thumb rule does not always hold good. 

 

Determining the compiler used

We can determine which JIT compiler is used, by running the java -version command.  

>java -version

openjdk version "1.8.0_41"
OpenJDK Runtime Environment (build 1.8.0_41-b04)
OpenJDK Client VM (build 25.40-b25, mixed mode)

The last line indicate that -client option is used. This is on my Windows laptop running OpenJDK 1.8. Output on your system might vary.

When I run the command with -server option, I see the -server option is getting used as is evident from the last line in the below output

>java -server -version

openjdk version "1.8.0_41"
OpenJDK Runtime Environment (build 1.8.0_41-b04)
OpenJDK Server VM (build 25.40-b25, mixed mode)


But since Java 8, tiered compilation is the default. This internally uses C1 and C2 for compilation and uses a tiered approach based on the hotness of the code that is getting executed. We will look at tiered compilation in depth in our next post.

 


Wednesday, March 10, 2021

When does JVM use JIT compilation

We saw in our previous post, three different scenarios of JIT compilation applied to our method. 

> JIT compilation was not applied at all when when the loop count was low

> We saw one JIT compile event log entry when loop is increased to 10K and 

> three entries (with two compilation id's) when loop count is increased to 1M.


Why is it varying depending on the no. of times our method is getting invoked? 

Why isn't the JIT compilation happening the same way all the time.  

Ideally it may be good to have JIT compilation done the same way all the time, but JIT compilation itself will consume resources, taking in CPU time and memory to do its job. 

The overhead incurred by JIT compilation, do not justify the benefit gained in executing compiled code produced, if the method is not getting invoked frequently. It will be better off to just interpret and execute that code. 

But if the method is getting invoked more frequently, it will be beneficial to compile that and convert it to native code.  


Hotspot JVM

For this reason, JVM keeps track of the sections of the code that are getting invoked frequently. Such code sections are are called hotspots. 

JVM only gives code sections that become hotspots for JIT compilation. The overhead incurred in JIT compilation of these hotspot code sections is justified by the performance gained in executing it multiple times over.

The name Hotspot JVM comes from this approach that the JVM takes to invoke JIT compilation. 


Optimization of compiled code:

Its one thing to compile the byte code to native assembly code. But the compiled code can also be optimized. Various levels of optimization are possible. Also more the no. of times the code is executed by the JVM, more information it has about the code which can be used for further optimization. 

The level to which optimization is done on the code during compilation, in itself adds an overhead, which can be justified only when the frequency of invocation is even higher. 


Tiered JIT compilation

JVM adopts tiered compilation of hotspot sections of the code. 

Initially code starts getting executed through interpretation

When the code section is identified as hot enough by JVM, it is compiled through JIT compilation, but less optimization is applied at this stage so as to avoid the overhead of optimization itself outweigh the benefit gained. Also some optimizations are possible only after the code section is invoked many more times.

When that code section has become even more hotter and JVM gains enough information for further optimization, aggressive optimization is applied to create the most optimized compiled code. 


This explains why when our method calculate() from the previous post is

  • Invoked 100 times - JIT compilation is not applied
  • Invoked 10K times - Single JIT compilation is applied
  • Invoked 1M times - multiple JIT compilations are applied


Next we will move on to understand C1 & C2 JIT compilers and Tiered compilation options



Tuesday, March 09, 2021

Understanding the output generated by PrintCompilation flag

The last example from our previous post produced the following output

> java -XX:+PrintCompilation Main2 | Select-String -Pattern calculate
     76   68       3       Main::calculate (9 bytes)
     79   72       4       Main::calculate (9 bytes)
     81   68       3       Main::calculate (9 bytes)   made not entrant

>

Lets try to understand what this output means

Frist column is the no. of milliseconds elapsed since the start of the program. This indicates the time at which our method calculate() is JIT compiled

The second column here is the compilation id. Each compilation unit gets a unique id. 68 on 1st and 3rd lines in the above output indicates they refer to the same compilation unit. 

The third column is blank in our output. Its a five character string, representing the characteristics of the code compiled

% - OSR compilation.
s - synchronized method.
! - Method has an exception handler.
b - Blocking mode.
n - Wrapper to a native method.

Fourth column is a number from 0 to 4 indicating the tier at which the compilation is done. If tiered compilation is turned off, this column will be blank.

Fifth column is the fully qualified method name

Sixth column is the size in bytes - size of the byte code that is getting compiled 

Last column contains the message of the deoptimization done - made not entrant in our sample output


 


Watching JIT in Action

How can we find what JIT is doing to our code at runtime? And how can we figure out which of our methods are getting compiled at runtime and when?

We have a java command line flag -XX:+PrintCompilation which when included, logs all the JIT compile events to standard output.

Lets see this in action. We will start with the below code

public class Main {
    
    static final int LOOP_COUNT = 10 * 10; //100

    public static void main(String[] args) {

        for (int i = 0; i < LOOP_COUNT; i++) {
            calculate();
        }
    }
    
    static void calculate() {
        double value = Math.random() * Math.random();
    }
}

We have the calculate() method, which creates two random numbers and multiplies them. This method is called in a loop from the main() method. We start with a loop count of 100. 

Execute this program with PrintCompilation flag, and watch for JIT compilation of compute method in the output, using the below command

> java -XX:+PrintCompilation Main | Select-String -Pattern calculate

>

Note: Above command is run on Windows OS, using Select-String -> an equivalent of grep for powershell

Without grep, we can see a lot of lines in the output - the logs from JIT compilation of java library methods.

Here we grep for "calculate" in the generated output. We do not see any compile event log for our method, indicating that our method calculate() is not JIT compiled this time.  

We will now increase the loop count to 10K and watch out for JIT compilation event for our calculate() method. The code now is as shown below

public class Main1 {
    
    static final int LOOP_COUNT = 100 * 100; //10K

    public static void main(String[] args) {

        for (int i = 0; i < LOOP_COUNT; i++) {
            calculate();
        }
    }
    
    static void calculate() {
        double value = Math.random() * Math.random();
    }
}

Executing this code with the PrintCompilation flag, and watching for JIT compilation event for calculate() method, we see the compilation event log in the output as shown below 

> java -XX:+PrintCompilation Main1 | Select-String -Pattern calculate

     71   67       3       Main1::calculate (9 bytes)
     
>

This indicates that our method calculate() is JIT compiled this time. 

What happens if we increase the loop count still further? We will try to increase the loop count to 1M this time. Code now is as shown below

public class Main2 {
    
    static final int LOOP_COUNT = 1000 * 1000; //1M

    public static void main(String[] args) {

        for (int i = 0; i < LOOP_COUNT; i++) {
            calculate();
        }
    }

    static void calculate() {
        double value = Math.random() * Math.random();
    }
}
 

Executing this code again with PrintCompilation flag, we new see multiple JIT compilation event logs  for our calculate() method  

> java -XX:+PrintCompilation Main2 | Select-String -Pattern calculate
     76   68       3       Main::calculate (9 bytes)
     79   72       4       Main::calculate (9 bytes)
     81   68       3       Main::calculate (9 bytes)   made not entrant

>

Why is the JIT compilation kicking in only when loop count is high and why we we seeing multiple JIT compilation events occurring when loop count is very high? We will explore that in a subsequent post. 

Before that we will see how to read the output generated by PrintCompilation flag in our next post.


JIT Compiler

Lets start by talking a bit about JIT compiler...

This is an age old topic that has been widely discussed. There are a lot of materials out there explaining JIT compilation in much greater detail.

But I am bent on telling it one more time... the way I have understood it... put in as simply as I can!!!

So lets start with this universal Hello World! code

public class Main {

    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

To execute this program, we execute two commands

  1. javac command to compile this source code into a class file
  2. java command to execute the class file
javac Main.java

java Main

The first command converts the source code into set of JVM instructions, commonly referred to as Java byte codes. These byte codes are stored in a .class file.

The second command starts a JVM instance, reads the byte codes from the class file and executes them on the JVM to produce the desired output.  

Now the JVM itself is a virtual layer on top of the hardware on which we execute the java command. What JVM actually does is interpret the byte codes and convert it (compile it) into assembly language instruction set that can be executed on the specific hardware.  

So the JVM, at a high level performs the following steps for each byte code instruction

  1. Read that byte code instruction
  2. Interpret that byte code and compile it to generate the equivalent assembly language instructions for the specific hardware on which its getting executed 
  3. And finally get these generated assembly language instructions executed on that hardware

This might just be fine for a simple program like hello world, but the real world programs are much more complex. 

Consider for instance, the below example where we print the String "Hello World" from within the method hello(). This method is called 10,000 times over from the main method

public class Main1 {

    public static void main(String[] args) {

        for (int i = 0; i < 10000; i++) {
            hello();
        }
    }
    
    static void hello() {
        System.out.println("Hello World!");
    }
}

The JVM performing the cycle of read -> interpret -> execute 10,000 times would sure be not an efficient approach. 

The compiled instruction set that gets generated is going to be the same for each of the 10,000 cycles. Java byte code need not be interpreted each time the JVM loops through. 

The component that does this compilation is the Just In-time Compiler or the JIT compiler and it is executed as part of the JVM process. 

So, 

  • javac is the static compiler that converts java source files into byte code instruction set
  • JIT compiler is part of the JVM process that is started by the java command and it performs dynamic compilation of byte code instruction set to native assembly language instructions


Monday, March 08, 2021

Variable declaration inside if statement

Took a while for me to figure this out... 

Consider the below piece of code

public class Main {

    public static void main(String[] args) {

        boolean flag = true;
        
        if(flag)
            String message = "I will not compile";
        
        if(flag) {
            String message = "Here I am ok...";
        }
    }
}

In this code, the first if statement does not compile. Second if statement is all fine. 

Only difference here is 

- in the first there are no curly braces surrounding the if statement. 

- in the second one, we have the curly braces surrounding the if statement.

Compiling this code throws the error message "variable declaration not allowed here"

Main.java:7: error: variable declaration not allowed here
                        String message = "I will not compile";

So the variable declaration is not allowed inside the if statement, when it is not surrounded by curly braces.

But why is the variable declaration not allowed when we do not have curly braces? 

Is it not perfectly fine to avoid curly braces when we have only one statement inside the if block?

The first reason I could think of is: 

Variable declaration and value assignment to that variable are considered as two statements by the compiler though its declared in a single line. Since only one line is allowed inside of an if statement when its not surrounded by braces, the compiler rejects it. 

This argument made some sense, but then why not the compiler just take the 1st statement as contained within the if block and the 2nd as contained outside of the if block even when variable declaration and value assignment are considered as two separate statements?

This doesn't make sense as then we have variable declaration inside of the if block and assignment of value to that variable outside of if block where that variable is not visible.  

So the compiler rejecting the code for the reason that a single line of code like 

String message = "I will not compile";

actually represents two statements as shown below is on expected lines.

String message;
message = "I will not compile";

Except for one catch. What if we only do variable declaration inside the if block, that is not surrounded by braces. 

Of course that variable will not be of any use. But so is the case of our original example - declaring a variable and assigning a value to it is as well of no use, unless it is used somewhere isn't it?

So lets try that out

public class Main1 {

    public static void main(String[] args) {

        boolean flag = true;
        
        if(flag)
            String message; // Compilation error here
        
        if(flag) {
            String message; // Allowed
        }

    }
}

We have made the statement inside if block as atomic as possible - having only a variable declaration without assigning any value to it. But here again, the one without curly braces throws compilation error while the other with curly braces compiles fine.  

OK... Time for some searching around...

And it turns out that the scope of a variable declared within if statement not surrounded by braces - is the scope in which if statement itself is (i.e, the surrounding scope of the if statement). 

Since the variable is declared conditionally - that the variable will be defined if and only if the if statement evaluates to true (too many if's... :)). This variable is not guaranteed to be available in the surrounding scope in all scenarios. Hence the compiler does not allow this statement and throws an error.

And for the same reason, variable declaration is not allowed in looing statements as well, when the body of the statement is not surrounded by curly braces. 

Below code shows all the error scenarios

public class Main2 {

    public static void main(String[] args) {

        boolean flag = true;
        
        if(flag)
            String message = "I will not compile";
        
        for(; flag ;)
            String message = "I will not compile";
            
        while(flag)
            String message = "I will not compile";
    }
}


Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Curious-Cases/Variable-declaration-inside-if-statement

Saturday, March 06, 2021

Using -Xlint:text-blocks compiler option

Consider this piece of code:

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                The woods are lovely, dark and deep,        
                But I have promises to keep,      
                And miles to go before I sleep, 
                And miles to go before I sleep.    
                """;
        System.out.println(poemTextBlock);
    
    }
}

This seemingly perfect code when executed produces the following output




The indentation and white spaces included within the string produced by this text block is not what could have been intended. 

To help identifying this not so obviously visible issue, -Xlint:text-blocks compiler option was introduced.

When compiling the code with this option, it throws out warning messages highlight issues with white spaces used within the text block. 

It specifically shows these two warning messages

  • inconsistent white space indentation - shown if there is inconsistency in the incidental white space characters across the lines within text block
  • trailing white space will be removed - shown if a trailing space is present in any of the lines within the text block that would stripped off

Try compiling the above program with -Xlint:text-blocks flag included as in the command below

javac -Xlint:text-blocks Main.java

This gives the two warning messages, as shown below 

Main.java:5: warning: [text-blocks] inconsistent white space indentation
                String poemTextBlock = """
                                       ^
Main.java:5: warning: [text-blocks] trailing white space will be removed
                String poemTextBlock = """
                                       ^
2 warnings



Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Using--Xlinttext-blocks-compiler-option

New escape sequence - \

This new escape sequence "\<line terminator>" can be used when we do not want to include a new line character at the end of a line within a text block. 

When used, this escape sequence effectively suppresses the new line character that gets implicitly included at the end of that line.

Below code shows the usage of this escape sequence

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                The woods are lovely, dark and deep, \
                But I have promises to keep,
                And miles to go before I sleep, \
                And miles to go before I sleep.
                """;
        System.out.println(poemTextBlock);
    
    }
}

In this code, we have used the "\<line terminator>" escape sequence on 1st and 3rd lines within the text block. 

This suppresses the new line character on the 1st and 3rd lines and produces the below output


When using this, take care to ensure that the "\" at the end of the line is immediately followed by the line terminator without leaving any blank spaces after the "\". 

Leaving a blank space at the end accidentally will throw a compilation error stating "illegal escape character"

 

Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/New-escape-sequence-2

New escape sequence - \s

Two new escape sequences got introduced with text blocks

  • \s
  • \<line terminator> 

First lets see how to use "\s"

"\s" is the escape sequence for adding a space character within the string. It can be used in both regular strings and in text blocks. 

Below code shows the usage of "\s" when used within a text block and a regular string. 

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                The\swoods\sare\slovely,\sdark\sand\sdeep\s""";
        String poemString = "The\swoods\sare\slovely,\sdark\sand\sdeep\s";

        System.out.println(poemTextBlock);
        System.out.println(poemString);
    
    }
}


Strings produced by both this text block and the regular string expression are the same. The output is shown below 




We can see that each of the "\s" is replaced by a single space, including one at the end of the line (Remember trailing spaces at the end of the line gets stripped off in text blocks, but not when escape sequence equivalent is used for providing space)

"\s" can be used to include trailing spaces within text block, The escape sequence approach or fencing approach can be used with "\s" to include the needed spaces as explained here. All the techniques given in this post for the usage of octal escape sequence "\040" can be applied with "\s" as well.

We will see the other new escape sequence introduced text blocks - \<line terminator> in the next post


Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/New-escape-sequence-1

Techniques for including trailing whitespaces into text blocks

 Trailing whitespaces can be included in a text block using one of the following approaches

Character substitution 

Here we include a special character in the text block for trailing whitespaces and replace them with space after the text block is processed by the compiler. Code for this shown below

public class CharacterSubstitution {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,###
                            But I have promises to keep,###
                            And miles to go before I sleep,###
                            And miles to go before I sleep.###
                        </pre>
                    </body>
                </html>""".replace('#',' ');
    
        System.out.println(poemTextBlock);
    }
}

Character fencing

Here we including the needed trailing spaces. But instead of ending the line with the space, include a special fence character at the end so that the spaces are not considered trailing spaces and hence are not stripped away. 

We remove this fence character after the text block is processed using the replace method as shown in the code below

public class CharacterFencing {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,   #
                            But I have promises to keep,   #
                            And miles to go before I sleep,   #
                            And miles to go before I sleep.   #
                        </pre>
                    </body>
                </html>""".replace("#\n","\n");
    
        System.out.println(poemTextBlock);
    }
}

Escape sequence for space:

Here we use the octal escape sequence for space, in the text block where we need trailing spaces. Sample code for this shown below

public class EscapeSequence {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,\040\040\040
                            But I have promises to keep,\040\040\040
                            And miles to go before I sleep,\040\040\040
                            And miles to go before I sleep.\040\040\040
                        </pre>
                    </body>
                </html>""";
    
        System.out.println(poemTextBlock);
    }
}

Note that unicode escape sequence for space cannot be used as they are translated prior to lexical analysis where as octal escape sequence gets processed after lexical analysis. 

What exactly happens if we use unicode escape sequence inside of text block? That's a topic to explore in a separate post. 

Since the escape sequences gets processed later in the processing, octal escape sequence for space can be used as a fencing character to include trailing blank spaces. Here we do not have to replace the fencing character as it is also a space character that we want to include. 

The below code shows this. Here we use two regular white space characters followed by a octal whitespace escape sequence to include one more additional whitespace. 
public class EscapeSequence1 {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,  \040
                            But I have promises to keep,  \040
                            And miles to go before I sleep,  \040
                            And miles to go before I sleep.  \040
                        </pre>
                    </body>
                </html>""";
    
        System.out.println(poemTextBlock);
    }
}

The output here includes three whitespaces at the end lines 4 to 7. 







So far, we have used only space character in all our examples. But tab character also represent whitespace and they are widely used for code indentation and formatting. How does the tab character behave when used within a text block? We will explore that in our next post.


Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Techniques-for-including-trailing-whitespaces

System.out.println() vs. System.out.print("\n")

There is a subtle difference between these two statements

System.out.println()

and

System.out.print("\n")

Though on the surface they both seem to be doing the same thing and fact they are doing the same thing - print a new line to the console, there is a subtle difference between the two that is worth taking note of. 

System.out.print("\n"): Always prints "\n" to the console. This is the platform neutral way of printing a new line character to the console. 

System.out.println():  Prints platform specific new line character to the console, which is different for different OS. On windows, it prints "\r\n". On linux it prints "\n" and so on... 

This code demonstrates this

public class Main {

    public static void main(String[] args) {

        System.out.println();
        System.out.print("\n");
    }
}

It produces the below output on my windows laptop



System.out.println() is equal to System.out.print(System.lineSeparator()) - both of which produces the same output - printing platform specific new line character to the console

public class Main {

    public static void main(String[] args) {

        System.out.println();
        System.out.print(System.lineSeparator());
    }
}

This code produces the output for both the print statements




Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Curious-Cases/println()-vs.-print()

Escape sequence in text blocks - \"

So we want to include three double quotes in the string contained within a text block. 

Say if we want the processed string to be as the one shown here

The woods are """lovely, dark and deep,
But I have """promises to keep,
And miles to go """before I sleep,
And miles to go """before I sleep.

We can use three double quotes with escape sequence like \""". Here \""" is not a new escape sequence. In fact, the escape sequence characters here is only \" - the escape sequence for double quote. The next two double quotes are the actual characters included in the string. 

The escaped double quote can be used for any of the three double quotes. The below code shows this. This code produces the same output string that is shown above.

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                The woods are \"""lovely, dark and deep,
                But I have "\""promises to keep,
                And miles to go ""\"before I sleep,
                And miles to go \"\"\"before I sleep.
                """;
        System.out.println("Text block: \n"+poemTextBlock);

    }
}

In this program, we escape the double quote at different positions in each line and for the last line, we use escape sequence for each of the three double quote characters. 

Where we need to include three or more continuous double quotes within a text block, we will have to use escape sequence so as to avoid having three continuous double quotes which will end the text block.

Below code includes five continuous double quote before the word 'lovely'

        String poemTextBlock = """
                The woods are \"""\""lovely, dark and deep,
                """;



Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Escape-sequences-in-text-blocks-2

Friday, March 05, 2021

Escape sequences in text blocks - '\n'

All the escape sequences that can be used with the String, can be used in text blocks as well. 

But some escape sequences are not required to be used within text blocks. The actual character can be used directly instead. The most common escape sequence that can and should be avoided where possible in text blocks is the new line character '\n'.

At the end of each line of the multi-line string literal represented by the text block, new line character '\n' is included by Java compiler when processing the text block. 

We saw many examples of this in the previous posts.

There are a few tricks that we need to be aware of. First, lets see what happens if we explicitly include '\n' at the end of each line inside the text block

Below code shows a regular text block and the same with '\n' included at the end of each line 

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                The woods are lovely, dark and deep,
                But I have promises to keep,
                And miles to go before I sleep,
                And miles to go before I sleep.
                """;

        String poemTextBlockWithNewLine = """
                The woods are lovely, dark and deep,\n
                But I have promises to keep,\n
                And miles to go before I sleep,\n
                And miles to go before I sleep.
                """;

        System.out.println("Text block: \n"+poemTextBlock);
        System.out.println("Text block with new line: \n"+poemTextBlockWithNewLine);

    }
}

The '\n' at the end of each line introduces an additional new line between each of the lines and produces the output shown below


A new line character '\n' is not a whitespace and is not stripped away by the Java compiler when processing the text block.

Now consider the below snippet of code.  

Here we have two '\n' on the first line of text block, with a tab included in between.  

How does this get processed? There are four leading tab spaces in each of the lines of the text block, but between the two new line characters there is just one tab space. Does this impact the indentation of the lines making the text block. More specifically, will this modify the position of left margin for incidental whitespace stripping? 

The above text block when printed has a value shown below


As you can see, the left margin is not affected by the whitespaces included between '\n' characters. Also note the presence of tab characters at the beginning of the 2nd line, indicating it is preserved and has not got stripped away in the processing. 

This is because the escape translation happens as a last step in the compilation process. 

The left margin gets identified and incidental whitespaces gets stripped away before '\n' escape sequence gets processed in our example. Escape sequence processing happens as a last step, making the '\n' characters include additional line feeds within the string. 

This is also the reason why we may have to use '\n' explicitly - to include empty line with specific no. of whitespace characters without impacting the margin of other lines within a text block. 

There is no other way of doing this when defining a text block.   


Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Escape-sequences-in-text-blocks-1

Thursday, March 04, 2021

Normalization of platform specific line terminator characters

Line terminator character is platform specific. 

You can find the line termination character for your platform using System.lineSeparator()API call

On my windows machine, I get "\r\n" as the line termination character. 

jshell> System.lineSeparator()
$1 ==> "\r\n"

Unix & Linux uses "\n" as line termination character & some older versions of Mac OS uses "\r" as line termination character.

This poses a few issues with handling multi-line string literals represented by text blocks

  • Some editor used may automatically change the line termination character
  • When the source file gets edited on different platforms, there is a chance of getting different line termination characters getting used within the same text block. 

To avoid these issues, Java compiler normalizes line termination character inside the multi-line string literal in text blocks to '\n' while processing. So, all the different line termination characters "\r", "\r\n" and "\n" becomes "\n" after processing a text block.

Let us check this with the below program:


Code is shown as an image to make the line termination characters visible. Here the line termination character is "\r\n" represented by CR|LF

This produces the following output

Comparing with \n:true
Comparing with \r\n:false

indicating that \r\n line termination character in the source code is converted to \n after the text block is processed.

Below is the version of the code for copy/pasting if needed. 

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                 And miles to go before I sleep.
                 """;
 
        System.out.println("Comparing with \\n:" + "And miles to go before I sleep.\n".equals(poemTextBlock));
        System.out.println("Comparing with \\r\\n:" + "And miles to go before I sleep.\r\n".equals(poemTextBlock));
    }
}

In the next post, we will see if and how of using escape sequences within text blocks


Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Normalization-of-platform-specific-line-terminator-characters

Wednesday, March 03, 2021

Tab character in text blocks

The problem with tab character is that there is no standard definition for how many spaces it represents. Different editors use different no. of spaces to display the tab character and most editors give this choice to the users to configure it as per his/her preference.

For this reason, Java treats tab characters as of size 1 when processing the text block. Irrespective of any no. of spaces that a tab may occupy when displayed in your editor, when the text block gets processed by the Java compiler, its whitespace size is counted as 1.

Consider the code snippet shown below 


Tab character here is represented by a long arrow ---->

In the editor I use, tab character occupies 4 spaces. When looking at this code in the editor, it has a well indented text block defined, as shown below:

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,
                            But I have promises to keep,
                            And miles to go before I sleep,
                            And miles to go before I sleep.
                        </pre>
                    </body>
                </html>""";

        System.out.println(poemTextBlock);
    }
}

But when executed, the output of the above program is


Since Java compiler considers only one character size for tabs, lines 4-7 contains only 7 leading whitespace characters (7 tab characters). 

The visibly least indented line 1 contains 16 whitespace characters (16 space characters). 

To the Java compiler, lines 4-7 are least indented and hence the start of these lines is taken as the left margin. 

Note that the Java compiler does not convert the tab character into a space character when processing. It only counts the size of the whitespaces represented by the tab character as 1 for the purpose of fixing the left margin and stripping away the incidental white spaces from text block.

Below code demonstrates this


This code has different no. of tabs for each of the lines 5, 6, 7 & 8. 

Line 5 has the least no. of tabs and is taken as the left margin by the Java compiler. 

When this text block is printed, it produces the below output, indicating that the tab characters are preserved and when printed on console, they are represented by as many spaces as per the console configuration


Full code for the above example from my editor for your reference. 

public class Main1 {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,
                                But I have promises to keep,
                                    And miles to go before I sleep,
                                        And miles to go before I sleep.
                        </pre>
                    </body>
                </html>""";

        System.out.println(poemTextBlock);
    }
}


Note: While copy/pasting the code samples from this post, you might have to edit the code to make sure that tab and space characters are correctly pasted for you to see it working as explained here. 


Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Tab-character-in-text-blocks


Tuesday, March 02, 2021

Techniques for including leading white spaces into text blocks

We saw one approach for controlling leading indentation by moving the ending three double quotes position as required. 

But a scenario in which this approach would not work is when we do not want a new line after the last line of the text block. The code then becomes

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,
                            But I have promises to keep,
                            And miles to go before I sleep,
                            And miles to go before I sleep.
                        </pre>
                    </body>
                </html>""";

Here we will not be able to use the position of """ to dictate leading indentation required. 

We will have to use the indent() method on string to provide the necessary indentation. Code for this  shown in the below sample below

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,
                            But I have promises to keep,
                            And miles to go before I sleep,
                            And miles to go before I sleep.
                        </pre>
                    </body>
                </html>""".indent(8);
    
        System.out.println(poemTextBlock);
    }
}


Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Techniques-for-including-leading-white-spaces


Incidental and essential white spaces in text blocks

Consider the below piece of code containing a text block declaration

String poemTextBlock = """               
    <html>
        <body>
            <pre>
                The woods are lovely, dark and deep,   
                But I have promises to keep,   
                And miles to go before I sleep,   
                And miles to go before I sleep.
            </pre>
        </body>
    </html>
    """;

From this, can we infer how the string in the text block gets formatted for spaces. 

Would all leading spaces get into it? 

And what about trailing spaces if there are any, included at the end of some of the lines?

Lets first see what and how the spaces are contained within the above block of code. 

For that, we will refer to the below screen grab from the editor, with whitespace visibility set to 'Yes'


Spaces are indicated by dot (.) and end of line by CRLF 

Note that there are some trailing spaces on lines 5, 6 & 7.

So with all these leading and trailing spaces, how does the text block format the string contained in it?

We do not want it to retain all the spaces as-is

  • Trailing spaces may not be intentional and they are not even visible in the editors to check and correct. 
  • A part of the leading spaces were introduced just to align with the indentation of the surrounding code. Changing indentation of the code would result in the content of the text block getting changed.

And we do not want it to simply remove all leading and trailing spaces either. 

This would make the final string to be as shown below without indentation, which definitely is not what we want. 



To understand how Java handles leading & trailing whitespaces contained within text block, lets first check what are incidental and essential whitespaces  

Incidental whitespace
These are whitespaces that are 
  • To the left of the least intended line within the text block
  • All the trailing whitespaces on each line

Essential whitespace
Leading whitespaces on each line that are not incidental are essential whitespaces. They are essential for providing the indentation of the text contained within the text block. 

Incidental whitespaces are stripped away and essential whitespaces are retained by the Java compiler when processing a text block. 

This makes the text block represented in the above code, formatted as shown below after processing


Note that all the incidental whitespaces are removed but essential white spaces are retained in the processed text. 

Full code to test this sample shown below: 
public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,
                            But I have promises to keep,
                            And miles to go before I sleep,
                            And miles to go before I sleep.
                        </pre>
                    </body>
                </html>
                """;
        System.out.println(poemTextBlock);
    }
}

Note that the ending three double quotes is also considered when establishing the left margin for incidental whitespace

Below code fragment would produce a string with all leading white spaces included as the ending three double quotes are aligned to the left most margin

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,
                            But I have promises to keep,
                            And miles to go before I sleep,
                            And miles to go before I sleep.
                        </pre>
                    </body>
                </html>
        """;

Above code will produce a string that is formatted as shown below, with the leading spaces included


We have seen in this post, how Java handles leading and trailing whitespaces by stripping away the incidental whitespaces. But what if we want to include leading or trailing whitespaces into text blocks? We will see how to do that in the next post.

 

Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Incidental-and-essential-white-spaces

Monday, March 01, 2021

Text Block Syntax - Deep Dive

We mentioned in the previous post about the subtle difference between the string formed through Text Block syntax and the one formed through regular String syntax. 

Lets examine that through this sample code

public class Main {

    public static void main(String[] args) {
        
        String poemTextBlock = """
                The woods are lovely, dark and deep,   
                But I have promises to keep,   
                And miles to go before I sleep,   
                And miles to go before I sleep.
                """;
        
        String poemString = "The woods are lovely, dark and deep,\n"
                + "But I have promises to keep,\n"
                + "And miles to go before I sleep,\n"
                + "And miles to go before I sleep.";
        
        System.out.println("Text Block: "+poemTextBlock);
        System.out.println("String: "+poemString);
        
        if (poemString.equals(poemTextBlock)) {
            System.out.println("Textblock and String are equal");
        } else {
            System.out.println("Textblock and String are NOT equal");
        }

    }
}.

Here we are comparing the two strings and printing if they are equal or not. 

Output of this code is 

Text Block: The woods are lovely, dark and deep,
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.

String: The woods are lovely, dark and deep,
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.
Textblock and String are NOT equal

Yes. They are not equal. It is due to the new line at the end of the string formed by the Text Block. 

The strings will show as equal when we form the text block string, with the ending three double quotes on the same line as shown below

        String poemTextBlock = """
                The woods are lovely, dark and deep,   
                But I have promises to keep,   
                And miles to go before I sleep,   
                And miles to go before I sleep.""";

This avoids introducing a line terminator (\n) character to the last line of the string. 

But then, what about the first line. Wouldn't the code above introduce a \n before the first line? 

Turns out that the three double-quote and a line terminator marks the beginning of the text block. We cannot start a text block content in the same line as that of the beginning three double quotes. 

Below code would throw a compile time error

        String poemTextBlock = """The woods are lovely, dark and deep,   
                But I have promises to keep,   
                And miles to go before I sleep,   
                And miles to go before I sleep.""";

So a text block begins from the next line of the beginning three double-quotes. And the placement of ending three double-quotes on the same line as the last line of the text block avoids adding a line terminator to the last line. 

What about the spaces used for indentation of the text block content? Would those spaces become part of the text block content? 

We see from the code above that they have not been part of the content in this example. We will explore how the indentation behaves in the text block in our next post.

 

Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Text-Block-Syntax