HTTPS hostname wrong issue

2016-12-24
troubleshooting

Sometimes, we work on interesting problems. Recently, My team was developing our
service to service auth architecture at Financial Engines. We completed initial
development and checked in changes in our legacy service which was
invoking one of the microservices behind our apigateway.

Post deployment, I tested the changes in our development environment and everything
appeared okay. However, we started receiving below errors in our test environment

1
2
3
4
5
6
7
8
9
Caused by: java.io.IOException: HTTPS hostname wrong:  should be <api.feitest.com>
at sun.net.www.protocol.https.HttpsClient.checkURLSpoofing(HttpsClient.java:649)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:573)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
at com.sun.net.ssl.internal.www.protocol.https.HttpsURLConnectionOldImpl.connect(HttpsURLConnectionOldImpl.java:142)
at org.springframework.http.client.SimpleBufferingClientHttpRequest.executeInternal(SimpleBufferingClientHttpRequest.java:76)
at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:49)
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:519)

I started looking into the issue and from the first looks, it appeared that SSL
handshake was failing since server’s certificate CN name was not matching with the
host we were trying to connect to.

It was interesting to see that we never faced this issue in other environments
including our local environment which has pretty much same configuration as of dev,
test environments except few things!

As per best practices, all environments should have exact same configuration but
this being a legacy service we have some inconsistencies in place.

I also noticed that this issue was intermittent and server bounce sometimes appeared
to resolve the issue. However, other times, it caused the issue. So being a
intermittent issue, best things we could have done was to compile all the information
which might give us some leads.

Initial Analysis


  • In short, issue appears intermittently. See below data for more information.
    • 79 failure events from 12/4/16 12:00:00.000 AM to 12/8/16 9:43:59.000 AM
    • 31 success events (12/4/16 12:00:00.000 AM to 12/8/16 9:45:08.000 AM)
  • When issue appears, SSL handshake failed as it was evident from exception.
  • Looking at the logs in past few days, I saw that this issue is happening in all
    development and test environments intermittently but not consistently.
  • We have other internal, external clients connecting to same endpoint and they never
    faced the issue.
  • We were using Spring Template library to invoke other Microservice. As you might
    know, Spring Rest library provides means for synchronous client side HTTP access. It
    handles HTTP connections and helps in writing client code in much more
    cleaner way.

In nutshell, it was pretty clear that the issue was very much related to legacy
service client side code. So I just looked at the versions we had in our dev,test
environments. Production environment was to be upgraded to JRE8 in couple of weeks.

  • Dev / Test
    • Tomcat 6.0.45
    • JRE 1.8.0_101
  • Production
    • Tomcat 6.x
    • JRE 1.7.x

When HTTPS hostname issue appears?


Java by default verifies that the certificate CN (Common Name) is the same as hostname
in the URL. If the CN in the certificate is not the same as the host name, your web
service client fails with the following exception:
java.io.IOException: HTTPS hostname wrong: should be hostname as in the certificates.

Quick search on google indicated following scenarios when this issue could appear

  • As per this stackoverflow link, It appeared that certain misconfigured servers send
    an “Unrecognized Name” warning in the SSL handshake which is ignored by most
    clients except for Java which might cause this.
  • Other scenario was that it could happen if you are using non-CA issued cert.
  • Another Stackoverflow link also suggested that this issue might also come when you
    have some code for the separate JSSE that shipped prior to JDK 1.4. Delete the java
    .protocol.handler.pkgs setting, delete anything that adds the JSSE provider, delete
    anything that refers to com.sun.net.ssl.anything. If you have any other relices of
    the separate JSSE installation delete them too. The jre/lib/jsse.jar should be dated
    no earlier than December 2004.

However, I validated that first 2 didn’t apply in our case.

Initial troubleshooting


I could have taken many directions from here:

  • I tested the https endpoint via openssl client and browser and SSL handshake was
    perfectly fine.
  • I was not able to reproduce issue programmatically as well even after having exact
    same versions for jdk/web server locally.
  • I tried running the api under load just to avoid heavy load theory with 500, 1 K,
    10 K invocations but we were not able to replicate as expected.
  • We enabled ssl debugging on the server by adding -Djavax.net.debug=ssl, handshake
    property to get additional information.
  • As per ssl debug logs, we noticed that server_name extension is not getting sent
    by the client.
  • See the screenshot below. Right side shows failure scenario and its clear that
    extension_server name header is not getting sent in failure scenario and due to that
    https handshake fails.

logs

  • So to fix this issue, we needed to find out why sni headers are not getting sent
    by default?
  • As a company, we were in transition to move to JRE8. We were following a phase
    wise approach where we were upgrading the run time to JRE8 followed by JDK. Our
    techops team upgraded one of the servers in production to JRE8 around same time.
    Luckily, We had similar invocation and alerting in place and we started seeing this
    issue. Once, we rolled back the application to run on JRE7, everything worked fine.
    Technically, it was our first clue that there is something in jre8 which is causing this.

Overall, I had to answer one question, why sni headers are getting sent in JRE8
environments intermittently.

What’s SNI?

SNI stands for Server Name Indication and is an extension of the TLS protocol.
It indicates which hostname is being contacted by the client at the beginning of the
‘handshake’-process. This allows a server to connect multiple SSL Certificates to one
IP address and present relevant certificate to client based on SNI header.

Why SNI header is getting sent in JRE8 intermittently?


So I wrote down couple of questions which i wanted to get answers:

Does SNI implementation changed between JRE7 and JRE8?

  • The JDK comes standard with automatically installed and configured providers such
    as “SUN” and “SunJCE”. Oracle’s implementation of Java SE includes a JSSE provider
    named SunJSSE, which comes preinstalled and preregistered with
    the JCA.
  • SNI is handled by JSSE implementation library which comes with JRE.
  • SunJSSE has enabled SNI extension for client applications by default in JDK 7.
  • It appears that nothing has changed as per java release documents.

is it classpath issue?

  • As per one of stackoverflow link, One possibility could have been that jsse 1.0.x
    might be getting loaded which didn’t support SNI.
  • So first task was to find out which JSSE implementation is getting loaded.
  • I printed all security providers list and their versions and noticed that SunJSSE
    version was 1.8. SunJSSE 1.8. is the latest version which comes along with JRE8 and
    hence classpath theory seems to fall apart.

is it due to Java 8144567

  • As per the bug 8144567 description,
    SNI does not work with HTTPSUrlConnection if you have a custom HostnameVerifier
    in the code
  • Quickly checking in the code confirmed that we are definitely adding custom host
    name verifier although it was not applied globally.
  • I was able to reproduce the bug locally in sample application. I still saw some
    ray of hope if something is causing custom hostname verifier to be applied globally
  • I pushed in code to print custom hostname verifier and it does print default
    custom hostname verified indicating that it has not been overridden. In same
    checkin, I also added fix suggested by the bug by adding custom SSL socket factory
    like - HttpsURLConnection.setDefaultSSLSocketFactory(new FeCustomSSLSocketFactory()
    ); and it did not resolve the issue.
  • Hence, it proved that it’s not the outcome of above-mentioned bug.
  • Just FYI, The bug was Resolved via 8159569 and will
    be released in 8u122 release which will be resolved to public in January 2017.
    Refer here to look at the changeset.

is it due to Rest Template library incompatibility with JRE8?

  • Although chances of this were less but I still wanted to get to root of it.
  • One theory is if spring code is explicitly using sun.net packages which might
    be causing java library to search HTTPS url handler in older url package
  • I looked at the Rest Template library source code and it appeared that Spring is
    just invoking underlying url connection api to get handle to Https url
    implementation.
  • Switching the implementation to Jersey Client instead of Spring Rest template did
    not help either but proved clearly that issue is at the JRE level.

Understanding Java Url Handler Implementation

So It appeared that I have to go one level down to Java’s URL implementation. Next
step was to print what URL handler we are getting for our success versus failure
scenarios.

  • I added custom clientHttpRequestFactory in RestTemplate so that we can get the
    handle to HttpsUrlConnection objects manually. After adding custom
    clientHttpRequestFactory and further logging/debugging I noticed that application is
    getting 2 instances of HTTPSURLConnection
  • I noticed that whenever request fails, we were getting HttpsURLConnectionOldImpl

    1
    2
    3
    4
    5
    20161216 09:26:19.958 INF: tst120a1iwr20ij0hl|do-tst-10|46857|http-8080-10|domain.
    auth.ssl.FeCustomClientHttpRequestFactory: [INFO ]
    domain.auth.ssl.FeCustomClientHttpRequestFactory - CustomSSLSocketFactory - Set
    connection specific properties for connection=class com.sun.net.ssl.
    internal.www.protocol.https.HttpsURLConnectionOldImpl
  • Whenever request succeeds, we are getting HttpsURLConnectionImpl.

    1
    2
    3
    4
    5
    20161215 15:55:24.081 INF: tst120b1iwpl6au02e|do-tst-10|191|http-8080-16|
    domain.auth.ssl.FeCustomClientHttpRequestFactory: [INFO ]
    domain.auth.ssl.FeCustomClientHttpRequestFactory - CustomSSLSocketFactory - Set
    connection specific properties for connection=class
    sun.net.www.protocol.https.HttpsURLConnectionImpl
  • Probably, this was biggest hint. So main question was when java returns
    HttpsURLConnectionImpl and when does it return HttpsURLConnectionOldImpl? and why
    its not consistent?

When does Java returns HttpsURLConnectionImpl and when does it return HttpsURLConnectionOldImpl?

The following is the default hierarchy, and is found by default when a “https” URL is used.

1
2
3
4
5
 java.lang.Object ->
java.net.URLConnection ->
java.net.HttpURLConnection ->
javax.net.ssl.HttpsURLConnection ->
sun.net.www.protocol.https.HttpsURLConnectionImpl - Sun impl

The following is the hierarchy used for backwards compatibility with JSSE 1.0.x.

1
2
3
4
5
java.lang.Object ->
java.net.URLConnection ->
java.net.HttpURLConnection ->
com.sun.net.ssl.HttpsURLConnection ->
com.sun.net.ssl.internal.www.protocol.https.HttpsURLConnectionOldImpl - impl

In order to get an instance of com.sun.net.ssl.HttpsURLConnection, you have to
override the URL search path via java.protocol.handler.pkgs property. So it appeared
that somewhere this property was getting overridden causing the issue.

I noticed in logs that whenever server started java.protocol.handler.pkgs was null
and whenever request failed, this property was getting set to
com.sun.net.ssl.internal.www.protocol. It also explains why we were getting old
HttpsURLConnection implementation which does not appear to support SNI.

java.protocol.handler.pkgs property sets your URL resolution to override the normal
javax.net.ssl.HttpsURLConnecction and points at the old HttpsURLConnection
implementation, which is an instance of com.sun.net.ssl.HttpsURLConnection.

So next logical steps were validating the hypothesis locally and find the root cause.

Reproducing the problem locally


In my local project, I set this property and tried to connect to https endpoint
(which required SNI) and was able to reproduce the issue when run against JRE8. Next
thing was to run the same program against JRE7 and i noticed that
ssl handshake and request was successful.

This proved the actual behaviour we were facing where everything worked fine in JRE7
but failed intermittently in JRE8.

Finding the culprit


I did quick research in the project but could not find the location where we were
setting the property. So it appeared that this could be functionality added by tomcat,
JRE or 3rd party jars.

I followed below steps on high level

  • extracted all classes from all 3rd party jars
  • ran the jd-gui java decompiler
  • saved all of that source to single directory
  • ran a find command to find the reference of this property

    1
    find . -type f -name '*java'  -exec grep -l  java.protocol.handler.pkgs {} \;
  • Below were list of files which I found

    1
    2
    3
    4
    5
    6
    7
    ./com/sun/xml/messaging/saaj/client/p2p/HttpSOAPConnection.java
    ./org/apache/axis/client/Call$1.java
    ./org/apache/axis/client/Call.java
    ./org/apache/axis/wsdl/toJava/JavaServiceImplWriter.java
    ./org/apache/commons/ssl/Java13.java
    ./org/codehaus/classworlds/uberjar/boot/Bootstrapper.java
    ./weblogic/utils/zip/Handler.java
  • I validated via tomcat logs that out of above only HttpSOAPConnection and Java13
    was getting loaded in JVM.

  • Finally, It appeared that HttpSOAPConnection is the culprit which sets the property
    if it’s null. Sample code below.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    static
    {
    if (isIBMVM)
    {
    SSL_PKG = "com.ibm.net.ssl.internal.www.protocol";
    SSL_PROVIDER = "com.ibm.net.ssl.internal.ssl.Provider";
    }
    else
    {
    SSL_PKG = "com.sun.net.ssl.internal.www.protocol";
    SSL_PROVIDER = "com.sun.net.ssl.internal.ssl.Provider";
    }
    }

    private void initHttps()
    {
    String pkgs = System.getProperty("java.protocol.handler.pkgs");
    log.log(Level.FINE, "SAAJ0053.p2p.providers", new String[] { pkgs });
    if ((pkgs == null) || (pkgs.indexOf(SSL_PKG) < 0))
    {
    if (pkgs == null) {
    pkgs = SSL_PKG;
    } else {
    pkgs = pkgs + "|" + SSL_PKG;
    }
    System.setProperty("java.protocol.handler.pkgs", pkgs);
    ...
    }
    }

Solution / Final Comments


Interim solution, we had in place was to set the property java.protocol.handler
.pkgs=sun.net.www.protocol so that the culprit class does not overwrite the property.

It also allows url handler to handover the newer implementation of HttpsUrlConnection which supports SNI.

Final solution would be to follow up with Oracle and understand what could have been
done better to not break other clients.

Special thanks to many folks at Financial Engines for being sounding board and in
helping to take this issue to closer!

References


Version History


2016-12-24    Initial Version
2017-1-28    Updated the blog with correct handler.pkgs value

Comments: