Discussion:
[jcifs] jcifs.util.transport.TransportException when connection is killed on purpose.
M. D.
2015-03-11 07:24:54 UTC
Permalink
Hello,

I have found a potential issue that is reproducable every time. Please share your thoughts on how that can or should be improved.

Scenario is:
1. JCIFS client starts to write a file.
2. In the middle of writing, you kill the connection on client or server side (I do it using TCP View on client side but it should matter)
3. At that point, the jcifs writing thread is probably stuck in the wait(timeout) directive in the jcifs.util.Transport#sendrecv() method, waiting for a response to arrive.
4. The thread that reads the responses is now holding the transport lock and executing jcifs.util.transport.Transport#loop() but the doRecv( response ) invocation fails with exception "Connection reset by peer" since the connection is closed.
5. The exception is caught in the loop method and then the looping thread calls disconnect. The disconnect procedure calls logoff on the SmbSession and the connection state is then changed. The disconnect is then successful (as it should be) and then all threads waiting for the transport are notified by the looping thread.
6. The JCIFS client thread that initially was writing the file exits the wait(timeout) since it was notified by the loop thread. But since a response wasn't received, another wait(timeout) is issued. After 30 seconds (default) the writing threads throws jcifs.util.transport.TransportException exception.

Could that be improved somehow? Since the session state is disconnected it isn't necessary that the writing thread wait for another 30 seconds and then fail with a timeout exception. Additionally, this "Connection reset by peer" exception gets swallowed by the loop thread when jcifs.util.loglevel < 3 which it is by default. So the jcifs user has no way to learn from the logs what caused the timeout.

Please share your thought on the topic.

Thank you in advance!


Best regards,
Marin
Vella, Shon
2015-03-11 12:49:59 UTC
Permalink
Marin,

copyTo() is a mess, and I've previously submitted two patches that address
a lot of the issues with it, though not sure if it addresses everything or
what you are seeing. See
https://lists.samba.org/archive/jcifs/2014-June/010165.html and
https://lists.samba.org/archive/jcifs/2013-October/010115.html. Also note
that there is a jcifs property jcifs.smb.client.ignoreCopyToException that
defaults to true, that is part of the equation, and I always set it to
false.

*Shon Vella*
*Identity Automation*
Product Engineer
281-220-0021 x2030 office
281-817-5579 fax
www.identityautomation.com
Post by M. D.
Hello,
I have found a potential issue that is reproducable every time. Please
share your thoughts on how that can or should be improved.
1. JCIFS client starts to write a file.
2. In the middle of writing, you kill the connection on client or server
side (I do it using TCP View on client side but it should matter)
3. At that point, the jcifs writing thread is probably stuck in the
wait(timeout) directive in the jcifs.util.Transport#sendrecv() method,
waiting for a response to arrive.
4. The thread that reads the responses is now holding the transport lock
and executing jcifs.util.transport.Transport#loop() but the doRecv(
response ) invocation fails with exception "Connection reset by peer" since
the connection is closed.
5. The exception is caught in the loop method and then the looping thread
calls disconnect. The disconnect procedure calls logoff on the SmbSession
and the connection state is then changed. The disconnect is then successful
(as it should be) and then all threads waiting for the transport are
notified by the looping thread.
6. The JCIFS client thread that initially was writing the file exits the
wait(timeout) since it was notified by the loop thread. But since a
response wasn't received, another wait(timeout) is issued. After 30 seconds
(default) the writing threads throws
jcifs.util.transport.TransportException exception.
Could that be improved somehow? Since the session state is disconnected it
isn't necessary that the writing thread wait for another 30 seconds and
then fail with a timeout exception. Additionally, this "Connection reset by
peer" exception gets swallowed by the loop thread when jcifs.util.loglevel
< 3 which it is by default. So the jcifs user has no way to learn from the
logs what caused the timeout.
Please share your thought on the topic.
Thank you in advance!
Best regards,
Marin
M. D.
2015-03-12 09:19:55 UTC
Permalink
Hello Shon,

Thank you for the tip!
I had no idea how nasty this copyTo() method was!

Unfortunately, this isn't the cause of our issues.

I think that it is a good idea to add an if statement that checks the state of the transport after the sendrecv timeout. Something like that:

diff --git a/src/jcifs/util/transport/Transport.java b/src/jcifs/util/transport/Transport.java
index fd77a19..9143d00 100644
--- a/src/jcifs/util/transport/Transport.java
+++ b/src/jcifs/util/transport/Transport.java
@@ -1,8 +1,9 @@
package jcifs.util.transport;

-import java.io.*;
-import java.net.*;
-import java.util.*;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+
import jcifs.util.LogStream;

/**
@@ -74,6 +75,10 @@
" timedout waiting for response to " +
request );
}
+ if (state != 3 && te != null)
+ {
+ throw new TransportException("Exception occured while waiting for response to " + request, te);
+ }
}
} catch( IOException ioe ) {
if (log.level > 2)

I will test this for a couple of days and report back if this fixes the problem.

Best regards,
M.D.
-------- Оригинално писмо --------
От: Vella, Shon
Относно: Re: [jcifs] jcifs.util.transport.TransportException when connection
is killed on purpose.
Изпратено на: Сряда, 2015, Март 11 14:49:59 EET
Marin,
copyTo() is a mess, and I&#39;ve previously submitted two patches that address a lot of the issues with it, though not sure if it addresses everything or what you are seeing. See https://lists.samba.org/archive/jcifs/2014-June/010165.html and https://lists.samba.org/archive/jcifs/2013-October/010115.html. Also note that there is a jcifs property jcifs.smb.client.ignoreCopyToException that defaults to true, that is part of the equation, and I always set it to false.Shon Vella
Identity AutomationProduct Engineer281-220-0021 x2030 office281-817-5579 faxwww.identityautomation.com
 Hello,
I have found a potential issue that is reproducable every time. Please share your thoughts on how that can or should be improved.
1. JCIFS client starts to write a file.
2. In the middle of writing, you kill the connection on client or server side (I do it using TCP View on client side but it should matter)
3. At that point, the jcifs writing thread is probably stuck in the wait(timeout) directive in the jcifs.util.Transport#sendrecv() method, waiting for a response to arrive.
4. The thread that reads the responses is now holding the transport lock and executing jcifs.util.transport.Transport#loop() but the doRecv( response ) invocation fails with exception &quot;Connection reset by peer&quot; since the connection is closed.
5. The exception is caught in the loop method and then the looping thread calls disconnect. The disconnect procedure calls logoff on the SmbSession and the connection state is then changed. The disconnect is then successful (as it should be) and then all threads waiting for the transport are notified by the looping thread.
6. The JCIFS client thread that initially was writing the file exits the wait(timeout) since it was notified by the loop thread. But since a response wasn&#39;t received, another wait(timeout) is issued. After 30 seconds (default) the writing threads throws jcifs.util.transport.TransportException exception.
Could that be improved somehow? Since the session state is disconnected it isn&#39;t necessary that the writing thread wait for another 30 seconds and then fail with a timeout exception. Additionally, this &quot;Connection reset by peer&quot; exception gets swallowed by the loop thread when jcifs.util.loglevel < 3 which it is by default. So the jcifs user has no way to learn from the logs what caused the timeout.
Please share your thought on the topic.
Thank you in advance!
Best regards,
Marin
Loading...