Keep in mind that communication with other hosts takes considerably more time than anything you are doing in your threads. I wouldn't worry about atomic operations in this case.
Let's say we have the threads t1
and t2
. t1
sends a requests to hostA
and waits for a response. When the timeout is reached, a RestClientException
will be thrown. Now there is a very tiny timespan between throwing the exception and adding that host to the list of blocked host. It could happen that t2
tries to sends a request to hostA
in this moment, before the host is blocked - but it is way more likely that t2
already send it during the long time t1
was waiting for a response, which you can't prevent.
You can try to set reasonable timeouts. Of course there are other types of errors that don't await a timeout, but even those way more time than handling the exception.
Using a ConcurrentHashMap
is thread safe and should be enough to keep track of blocked hosts.
An AtomicReference
by itself doesn't do much unless you use methods like compareAndSet
, so the call is not atomic (but as explained above doesn't need to be in my opinion). If you really want to block a host immediately after you got an exception, you should use some kind of synchronization. You could use a synchronized set to store blocked hosts. This still wouldn't solve the problem that it takes some time until any connection error is actually detected.
Regarding the update: As said in the comments the Future timeout should be larger than the request timeout. Otherwise the Callable might be canceled and the host won't be added to the list. You probably don't even need a timeout when using Future.get because the request will eventually succeed or fail.
The actual problem why you see many exceptions when host A goes down could simply be that many thread are still waiting for the response of host A. You only check for blocked hosts before starting a request, not during any requests. Any thread still waiting for a response from that host will continue to do so until the the timeout is reached.
If you want to prevent this you could try to periodically check if the current host isn't blocked yet. This is a very naive solution and kind of defeats the purpose of futures since it's basically polling. It should help understanding the general problem though.
// bad pseudo code DataTask dataTask = new DataTask(dataKeys, restTemplate);future = service.submit(dataTask);while(!future.isDone()) { if( blockedHosts.contains(currentHost) ) { // host unreachable, don't wait for http timeout future.cancel(); } thread.sleep(/* */);}
A better way would be to send an interrupt to all DataTask threads that are waiting for the same hosts when it goes down, so they can abort the request and try the next host.