Hector state when a cassandra node goes down
Hi, I am using hector -0.7.0-26 in my application persisting data over the network to Cassandra cluster of 3 nodes. Everything works well until i manually bring down one of the cassandra nodes. i need to bring down a cassandra node as part of doing negative testing on my application. After the node goes down, i see hector exceptions on the client side and it seems that hector is trying to connect to the downed host. Below is the exception trace and this is repeatedly seen in the logs. My question is, when hector sees that the node is down, doesn't hector close the connections to the node and stop trying again until it detects the node to be up again? What should be done at the client side (while using Hector) to ensure that hector cleans up the connections to a dead node and stops trying to reuse it. [pool-1-thread-1] ERROR (HThriftClient.java:88) - Unable to open transport to asp.corp.apple.com(17.108.122.70):9162 org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused at org.apache.thrift.transport.TSocket.open(TSocket.java:185) at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java: 81) at me.prettyprint.cassandra.connection.HThriftClient.open(HThriftClient.java: 84) at me.prettyprint.cassandra.connection.CassandraHostRetryService $RetryRunner.verifyConnection(CassandraHostRetryService.java:114) at me.prettyprint.cassandra.connection.CassandraHostRetryService $RetryRunner.run(CassandraHostRetryService.java:94) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java: 441) at java.util.concurrent.FutureTask $Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor $ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor $ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor $ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java: 195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at org.apache.thrift.transport.TSocket.open(TSocket.java:180) ... 13 moreActually this is working as anticipated. The following line from the stack trace: me.prettyprint.cassandra.connection.HThriftClient.open(HThriftClient.java: indicates this is the host retry service (running in a background thread) attempting to connect to the downed host every 10 seconds (by default). What was just fixed in master and tip of 0.7.0 was an issue with incorrect handling of UnavailableException on a consistency level failure. This will be released at some point today (marked as 0.7.0-28). On looking at the above trace again, that error could probably be more clear about what is going on. I'll clean that up today as well.
No comments:
Post a Comment