Wednesday, March 03, 2021

Tab character in text blocks

The problem with tab character is that there is no standard definition for how many spaces it represents. Different editors use different no. of spaces to display the tab character and most editors give this choice to the users to configure it as per his/her preference.

For this reason, Java treats tab characters as of size 1 when processing the text block. Irrespective of any no. of spaces that a tab may occupy when displayed in your editor, when the text block gets processed by the Java compiler, its whitespace size is counted as 1.

Consider the code snippet shown below 


Tab character here is represented by a long arrow ---->

In the editor I use, tab character occupies 4 spaces. When looking at this code in the editor, it has a well indented text block defined, as shown below:

public class Main {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,
                            But I have promises to keep,
                            And miles to go before I sleep,
                            And miles to go before I sleep.
                        </pre>
                    </body>
                </html>""";

        System.out.println(poemTextBlock);
    }
}

But when executed, the output of the above program is


Since Java compiler considers only one character size for tabs, lines 4-7 contains only 7 leading whitespace characters (7 tab characters). 

The visibly least indented line 1 contains 16 whitespace characters (16 space characters). 

To the Java compiler, lines 4-7 are least indented and hence the start of these lines is taken as the left margin. 

Note that the Java compiler does not convert the tab character into a space character when processing. It only counts the size of the whitespaces represented by the tab character as 1 for the purpose of fixing the left margin and stripping away the incidental white spaces from text block.

Below code demonstrates this


This code has different no. of tabs for each of the lines 5, 6, 7 & 8. 

Line 5 has the least no. of tabs and is taken as the left margin by the Java compiler. 

When this text block is printed, it produces the below output, indicating that the tab characters are preserved and when printed on console, they are represented by as many spaces as per the console configuration


Full code for the above example from my editor for your reference. 

public class Main1 {

    public static void main(String[] args) {

        String poemTextBlock = """
                <html>
                    <body>
                        <pre>
                            The woods are lovely, dark and deep,
                                But I have promises to keep,
                                    And miles to go before I sleep,
                                        And miles to go before I sleep.
                        </pre>
                    </body>
                </html>""";

        System.out.println(poemTextBlock);
    }
}


Note: While copy/pasting the code samples from this post, you might have to edit the code to make sure that tab and space characters are correctly pasted for you to see it working as explained here. 


Sample code used in this post can be downloaded from https://github.com/ashokkumarta/awesomely-java/tree/main/2021/03/Language-Features/Text-blocks/Tab-character-in-text-blocks


No comments:

Post a Comment