Discussion:
[jcifs] NullPointerException in Dfs.resolve
Conrad Herrmann
2015-06-14 09:00:35 UTC
Permalink
Martin,



I have recently run into the same problem.



I think the problem is that SmbFile.resolveDfs() uses the currently
connected transport as the DFS resolver/domain server
(tree.session.transport.tconHostName), but it is very possible that there is
no currently connected transport. In your case, that happens when the file
server forces the TCP connection to close, and the transport tears itself
down.



And, although the resolveDfs() method calls connect0(), in fact this does
nothing because doConnect() doesn't force creation of a new connection if we
are talking about a DFS resolved path.



It seems to me that in that case, we have to start over again at the top of
the referral tree, with the Domain.



So my solution has this: change the code for SmbFile.resolveDfs() lines 671
(or so) so that it says:



connect0();
String hostName = tree.session.transport.tconHostName;
String domainDfsServerName = getServerWithDfs();
if (hostName == null)
hostName = domainDfsServerName;
DfsReferral dr = dfs.resolve(
hostName,
tree.share,

unc,

auth);




The code comes from the other use of tconHostName, in SmbFile.doConnect():

String hostName = getServerWithDfs();

tree.inDomainDfs = dfs.resolve(hostName, tree.share, null, auth) !=
null;

In this code, we are getting the DFS resolver (which might be the domain
server) as the hostName, and asking it to resolve our share.



Basically what this new code is saying is that:

- in the case where the transport has closed (ie, because of a timeout or
TCP close on the DFS server side) reconnect to the DFS domain server in
order to resolve a share's DFS server.



I can imagine a case where this doesn't work--if we have multiple levels of
DFS redirection, where the domain server cannot redirect the client to a
deep subdirectory. But, I don't even know if this is possible in DFS. If
it is, then at least this solution removes the top level case and identifies
the problem, which would require walking down the DFS resolution path to
resolve the actual file server.



Conrad Herrmann

Primdaesk, Inc.
Hi,
I encountered a NullpointerException similar to
https://lists.samba.org/archive/jcifs/2012-January/009856.html - at
least the stack traces are similar.
- jcifs 1.3.18
- IBM JDK 7
- AIX 7.1
The NPE occured in a (in-house) plugin for the Jenkins build server. In
this system, JCIFs is used to recursively copy files from a Windows
share to an AIX machine.
Re-running a build shortly after it finished triggered the NPE.
After some debugging, it seems to me like the SmbFile's underlying
transport is closed (by timeout), and when SmbFile.resolveDfs is called,
the transport is not reconnected (unlike, for example, later in
SmbFile.resolve, or in SmbSession.getChallenge).
I was able to reproduce the NPE during debugging using the following
- Trigger a build (recursively copying from a CIFS DFS tree)
- Wait until the transport objects disconnect by timeout (tracked by
breakpoint)
- Retrigger the build (recursively copying the same directory structure)
The Jenkins plugin usually runs the second JCIFS copy operation in the
same thread than the first (though that's not guaranteed).
Each run uses a new SmbFile object.
Am I missing something (like some close operation on SmbFile)?
Is this a known error?
Can I do something to fix it?
Best regards,
Martin
Michael B Allen
2015-06-14 15:37:48 UTC
Permalink
I have added this post to the list of people who have reported it to
the TODO list so that it can be considered when I get around to
looking at this.

Note that the 1.3.18b mentioned in the link cited is here:

http://jcifs.samba.org/old/jcifs-1.3.18b.jar

Although I cannot recall what it actually does anymore it might be
worth a try. We never received feedback about it.

Mike
Post by Conrad Herrmann
Martin,
I have recently run into the same problem.
I think the problem is that SmbFile.resolveDfs() uses the currently
connected transport as the DFS resolver/domain server
(tree.session.transport.tconHostName), but it is very possible that there is
no currently connected transport. In your case, that happens when the file
server forces the TCP connection to close, and the transport tears itself
down.
And, although the resolveDfs() method calls connect0(), in fact this does
nothing because doConnect() doesn't force creation of a new connection if we
are talking about a DFS resolved path.
It seems to me that in that case, we have to start over again at the top of
the referral tree, with the Domain.
So my solution has this: change the code for SmbFile.resolveDfs() lines 671
connect0();
String hostName = tree.session.transport.tconHostName;
String domainDfsServerName = getServerWithDfs();
if (hostName == null)
hostName = domainDfsServerName;
DfsReferral dr = dfs.resolve(
hostName,
tree.share,
unc,
auth);
String hostName = getServerWithDfs();
tree.inDomainDfs = dfs.resolve(hostName, tree.share, null, auth) !=
null;
In this code, we are getting the DFS resolver (which might be the domain
server) as the hostName, and asking it to resolve our share.
- in the case where the transport has closed (ie, because of a timeout or
TCP close on the DFS server side) reconnect to the DFS domain server in
order to resolve a share's DFS server.
I can imagine a case where this doesn't work--if we have multiple levels of
DFS redirection, where the domain server cannot redirect the client to a
deep subdirectory. But, I don't even know if this is possible in DFS. If
it is, then at least this solution removes the top level case and identifies
the problem, which would require walking down the DFS resolution path to
resolve the actual file server.
Conrad Herrmann
Primdaesk, Inc.
Hi,
I encountered a NullpointerException similar to
https://lists.samba.org/archive/jcifs/2012-January/009856.html - at
least the stack traces are similar.
- jcifs 1.3.18
- IBM JDK 7
- AIX 7.1
The NPE occured in a (in-house) plugin for the Jenkins build server. In
this system, JCIFs is used to recursively copy files from a Windows
share to an AIX machine.
Re-running a build shortly after it finished triggered the NPE.
After some debugging, it seems to me like the SmbFile’s underlying
transport is closed (by timeout), and when SmbFile.resolveDfs is called,
the transport is not reconnected (unlike, for example, later in
SmbFile.resolve, or in SmbSession.getChallenge).
I was able to reproduce the NPE during debugging using the following
- Trigger a build (recursively copying from a CIFS DFS tree)
- Wait until the transport objects disconnect by timeout (tracked by
breakpoint)
- Retrigger the build (recursively copying the same directory structure)
The Jenkins plugin usually runs the second JCIFS copy operation in the
same thread than the first (though that's not guaranteed).
Each run uses a new SmbFile object.
Am I missing something (like some close operation on SmbFile)?
Is this a known error?
Can I do something to fix it?
Best regards,
Martin
--
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/
Martin Kutter
2015-06-15 11:45:22 UTC
Permalink
Thanks for your reply.
I've tested 1.3.18b - it does not fix this specific error.

Regarding the fix suggested by Conrad: In the meantime, I've also
implemented a fix for the issue - I changed SmbFile.resolveDfs()
(lines 671 and following) to

DfsReferral dr = null;
// disconnect is synchronized to transport, too.
// make sure our transport doesn't get disconnected while
// we're inside the synchronized block
synchronized (tree.session.transport) {
if (tree.session.transport.tconHostName == null) {
// disconnect properly if connection is lost
tree.treeDisconnect(false);
}
tree.session.transport.connect();
dr = dfs.resolve(tree.session.transport.tconHostName,
tree.share,
unc,
auth);
}
if (dr != null) {


As my knowledge of CIFS is quite limited, I have no idea whether this
is right (or has bad side effects).

The idea behind is similar (but not equal) to Conrad's fix: In case the
transport has disconnected, disconnect properly and reconnect.

Best regards,

Martin
Post by Michael B Allen
I have added this post to the list of people who have reported it to
the TODO list so that it can be considered when I get around to
looking at this.
http://jcifs.samba.org/old/jcifs-1.3.18b.jar
Although I cannot recall what it actually does anymore it might be
worth a try. We never received feedback about it.
Mike
Post by Conrad Herrmann
Martin,
I have recently run into the same problem.
I think the problem is that SmbFile.resolveDfs() uses the currently
connected transport as the DFS resolver/domain server
(tree.session.transport.tconHostName), but it is very possible that there is
no currently connected transport. In your case, that happens when the file
server forces the TCP connection to close, and the transport tears itself
down.
And, although the resolveDfs() method calls connect0(), in fact this does
nothing because doConnect() doesn't force creation of a new connection if we
are talking about a DFS resolved path.
It seems to me that in that case, we have to start over again at the
top
Post by Michael B Allen
Post by Conrad Herrmann
of
the referral tree, with the Domain.
So my solution has this: change the code for SmbFile.resolveDfs()
lines
Post by Michael B Allen
Post by Conrad Herrmann
671
connect0();
String hostName = tree.session.transport.tconHostName;
String domainDfsServerName = getServerWithDfs();
if (hostName == null)
hostName = domainDfsServerName;
DfsReferral dr = dfs.resolve(
hostName,
tree.share,
unc,
auth);
The code comes from the other use of tconHostName, in
String hostName = getServerWithDfs();
tree.inDomainDfs = dfs.resolve(hostName, tree.share, null,
auth)
Post by Michael B Allen
Post by Conrad Herrmann
!=
null;
In this code, we are getting the DFS resolver (which might be the domain
server) as the hostName, and asking it to resolve our share.
- in the case where the transport has closed (ie, because of a timeout or
TCP close on the DFS server side) reconnect to the DFS domain server in
order to resolve a share's DFS server.
I can imagine a case where this doesn't work--if we have multiple
levels
Post by Michael B Allen
Post by Conrad Herrmann
of
DFS redirection, where the domain server cannot redirect the client to a
deep subdirectory. But, I don't even know if this is possible in DFS.
If
it is, then at least this solution removes the top level case and identifies
the problem, which would require walking down the DFS resolution path to
resolve the actual file server.
Conrad Herrmann
Primdaesk, Inc.
Hi,
I encountered a NullpointerException similar to
https://lists.samba.org/archive/jcifs/2012-January/009856.html - at
least the stack traces are similar.
- jcifs 1.3.18
- IBM JDK 7
- AIX 7.1
The NPE occured in a (in-house) plugin for the Jenkins build server. In
this system, JCIFs is used to recursively copy files from a Windows
share to an AIX machine.
Re-running a build shortly after it finished triggered the NPE.
After some debugging, it seems to me like the SmbFile’s underlying
transport is closed (by timeout), and when SmbFile.resolveDfs is called,
the transport is not reconnected (unlike, for example, later in
SmbFile.resolve, or in SmbSession.getChallenge).
I was able to reproduce the NPE during debugging using the following
- Trigger a build (recursively copying from a CIFS DFS tree)
- Wait until the transport objects disconnect by timeout (tracked by
breakpoint)
- Retrigger the build (recursively copying the same directory structure)
The Jenkins plugin usually runs the second JCIFS copy operation in the
same thread than the first (though that's not guaranteed).
Each run uses a new SmbFile object.
Am I missing something (like some close operation on SmbFile)?
Is this a known error?
Can I do something to fix it?
Best regards,
Martin
Vella, Shon
2015-06-15 14:46:51 UTC
Permalink
One of the patches from google submitted to the list last year:

http://article.gmane.org/gmane.network.samba.java/9410

has an attempted fix of this.

https://code.google.com/p/google-enterprise-connector-file-system/source/detail?r=563

It seems to work for me - in any case it shouldn't ever throw that NPE out
of resolveDfs.


*Shon Vella*
*Identity Automation*
Product Engineer
281-220-0021 x2030 office
281-817-5579 fax
www.identityautomation.com
Post by Martin Kutter
Thanks for your reply.
I've tested 1.3.18b - it does not fix this specific error.
Regarding the fix suggested by Conrad: In the meantime, I've also
implemented a fix for the issue - I changed SmbFile.resolveDfs()
(lines 671 and following) to
DfsReferral dr = null;
// disconnect is synchronized to transport, too.
// make sure our transport doesn't get disconnected while
// we're inside the synchronized block
synchronized (tree.session.transport) {
if (tree.session.transport.tconHostName == null) {
// disconnect properly if connection is lost
tree.treeDisconnect(false);
}
tree.session.transport.connect();
dr = dfs.resolve(tree.session.transport.tconHostName,
tree.share,
unc,
auth);
}
if (dr != null) {
As my knowledge of CIFS is quite limited, I have no idea whether this
is right (or has bad side effects).
The idea behind is similar (but not equal) to Conrad's fix: In case the
transport has disconnected, disconnect properly and reconnect.
Best regards,
Martin
Post by Michael B Allen
I have added this post to the list of people who have reported it to
the TODO list so that it can be considered when I get around to
looking at this.
http://jcifs.samba.org/old/jcifs-1.3.18b.jar
Although I cannot recall what it actually does anymore it might be
worth a try. We never received feedback about it.
Mike
Post by Conrad Herrmann
Martin,
I have recently run into the same problem.
I think the problem is that SmbFile.resolveDfs() uses the currently
connected transport as the DFS resolver/domain server
(tree.session.transport.tconHostName), but it is very possible that there is
no currently connected transport. In your case, that happens when the file
server forces the TCP connection to close, and the transport tears
itself
Post by Michael B Allen
Post by Conrad Herrmann
down.
And, although the resolveDfs() method calls connect0(), in fact this
does
Post by Michael B Allen
Post by Conrad Herrmann
nothing because doConnect() doesn't force creation of a new connection if we
are talking about a DFS resolved path.
It seems to me that in that case, we have to start over again at the
top
Post by Michael B Allen
Post by Conrad Herrmann
of
the referral tree, with the Domain.
So my solution has this: change the code for SmbFile.resolveDfs()
lines
Post by Michael B Allen
Post by Conrad Herrmann
671
connect0();
String hostName = tree.session.transport.tconHostName;
String domainDfsServerName = getServerWithDfs();
if (hostName == null)
hostName = domainDfsServerName;
DfsReferral dr = dfs.resolve(
hostName,
tree.share,
unc,
auth);
The code comes from the other use of tconHostName, in
String hostName = getServerWithDfs();
tree.inDomainDfs = dfs.resolve(hostName, tree.share, null,
auth)
Post by Michael B Allen
Post by Conrad Herrmann
!=
null;
In this code, we are getting the DFS resolver (which might be the
domain
Post by Michael B Allen
Post by Conrad Herrmann
server) as the hostName, and asking it to resolve our share.
- in the case where the transport has closed (ie, because of a timeout
or
Post by Michael B Allen
Post by Conrad Herrmann
TCP close on the DFS server side) reconnect to the DFS domain server in
order to resolve a share's DFS server.
I can imagine a case where this doesn't work--if we have multiple
levels
Post by Michael B Allen
Post by Conrad Herrmann
of
DFS redirection, where the domain server cannot redirect the client to
a
Post by Michael B Allen
Post by Conrad Herrmann
deep subdirectory. But, I don't even know if this is possible in DFS.
If
it is, then at least this solution removes the top level case and identifies
the problem, which would require walking down the DFS resolution path
to
Post by Michael B Allen
Post by Conrad Herrmann
resolve the actual file server.
Conrad Herrmann
Primdaesk, Inc.
Hi,
I encountered a NullpointerException similar to
https://lists.samba.org/archive/jcifs/2012-January/009856.html - at
least the stack traces are similar.
- jcifs 1.3.18
- IBM JDK 7
- AIX 7.1
The NPE occured in a (in-house) plugin for the Jenkins build server.
In
Post by Michael B Allen
Post by Conrad Herrmann
this system, JCIFs is used to recursively copy files from a Windows
share to an AIX machine.
Re-running a build shortly after it finished triggered the NPE.
After some debugging, it seems to me like the SmbFile’s underlying
transport is closed (by timeout), and when SmbFile.resolveDfs is
called,
Post by Michael B Allen
Post by Conrad Herrmann
the transport is not reconnected (unlike, for example, later in
SmbFile.resolve, or in SmbSession.getChallenge).
I was able to reproduce the NPE during debugging using the following
- Trigger a build (recursively copying from a CIFS DFS tree)
- Wait until the transport objects disconnect by timeout (tracked by
breakpoint)
- Retrigger the build (recursively copying the same directory
structure)
Post by Michael B Allen
Post by Conrad Herrmann
The Jenkins plugin usually runs the second JCIFS copy operation in the
same thread than the first (though that's not guaranteed).
Each run uses a new SmbFile object.
Am I missing something (like some close operation on SmbFile)?
Is this a known error?
Can I do something to fix it?
Best regards,
Martin
Loading...