Discussion:
[jcifs] Failed to access files on Samba server with path in Chinese
joller
2004-12-20 06:19:09 UTC
Permalink
Hi all,

I've encountered a problem concerning NO-ASCII file name.
When I use jcifs as the client to access a file
with Chinese (Big5) characters in its path,
it works well if the server is a Win32 PC.
But if the server is a Samba server on Linux,
I fail to read the content of it, though a SmbFile.exists()
return true on the URL:

SmbFile f =
new SmbFile("smb://hostname/share/SOME_CHINESE_FILE_NAME");

System.out.println(f.exists()); // returns true!
f.getInputStream(); // throws a SmbException as follows

The last sentence throws a SmbException:

Exception in thread "main" jcifs.smb.SmbException: The system cannot find
the file specified.
at jcifs.smb.SmbTransport.send(SmbTransport.java:704)
at jcifs.smb.SmbSession.send(SmbSession.java:232)
at jcifs.smb.SmbTree.send(SmbTree.java:103)
at jcifs.smb.SmbFile.send(SmbFile.java:724)
at jcifs.smb.SmbFile.open0(SmbFile.java:862)
at jcifs.smb.SmbFile.open(SmbFile.java:880)
at jcifs.smb.SmbFileInputStream.<init>(SmbFileInputStream.java:70)
at jcifs.smb.SmbFileInputStream.<init>(SmbFileInputStream.java:64)
at jcifs.smb.SmbFile.getInputStream(SmbFile.java:2469)

If the URL is actually a directory, I cannot get its file list.

There is no problem if the client is Win32 file manager.
Only when the client is Jcifs and the server is a Samba does this happen.

Is there any solution or workaround to this problem?

joller
Michael B Allen
2004-12-20 08:40:33 UTC
Permalink
On Mon, 20 Dec 2004 06:19:09 +0000 (UTC)
Post by joller
Hi all,
I've encountered a problem concerning NO-ASCII file name.
When I use jcifs as the client to access a file
with Chinese (Big5) characters in its path,
it works well if the server is a Win32 PC.
But if the server is a Samba server on Linux,
I fail to read the content of it, though a SmbFile.exists()
return true
The Samba server is probably configured to use an 8 bit encoding as
opposed to Unicode. By default jCIFS will use the 'file.encoding' System
property to decode filenames. Therefore jCIFS will only behave properly
with default settings if both the Samba server codepage and the Java VM
file.encoding are compatible. If they are not compatible, you can override
the encoding using the 'jcifs.encoding' property. Such as:

jcifs.encoding=Big5

Mike
--
Greedo shoots first? Not in my Star Wars.
joller
2004-12-20 09:46:48 UTC
Permalink
Post by Michael B Allen
The Samba server is probably configured to use an 8 bit encoding as
opposed to Unicode. By default jCIFS will use the 'file.encoding' System
property to decode filenames. Therefore jCIFS will only behave properly
with default settings if both the Samba server codepage and the Java VM
file.encoding are compatible. If they are not compatible, you can override
jcifs.encoding=Big5
Mike
Hi Mike,

In fact we have tried to set the property `jcifs.encoding',
and the result is as follows:
With this property set to `big5', SmbFile.exists() returns true,
while SmbFile.getInputStream() fails with the same exception
as mentioned;
With `utf8', SmbFile.exists() returns *false*,
and SmbFile.getInputStream() fails with the same exception.
It seems that big5 is the correct encoding.

Besides, I wonder why there is no such problem when the client
is a Windows file manager.
If the server is not a Samba, but is a Windows PC instead,
there is no such problem either.
Does this mean that the way jcifs talks to the server is different from
that of a Windows file manager?

And why SmbFile.exists() correctly returns true in this case?
SmbFile.lastModified() also returns the correct information.
It seems that querying information of a file works fine with Chinese path.

Thanks for replying very much!

joller
Michael B Allen
2004-12-20 19:42:11 UTC
Permalink
On Mon, 20 Dec 2004 09:46:48 +0000 (UTC)
Post by joller
Post by Michael B Allen
The Samba server is probably configured to use an 8 bit encoding as
opposed to Unicode. By default jCIFS will use the 'file.encoding' System
property to decode filenames. Therefore jCIFS will only behave properly
with default settings if both the Samba server codepage and the Java VM
file.encoding are compatible. If they are not compatible, you can
jcifs.encoding=Big5
Hi Mike,
Hi joller,

What versions of Samba and JCIFS are you using?
Post by joller
In fact we have tried to set the property `jcifs.encoding',
With this property set to `big5', SmbFile.exists() returns true,
while SmbFile.getInputStream() fails with the same exception
as mentioned;
Ok, so the file.encoding is probably already set to Big5.
Post by joller
With `utf8', SmbFile.exists() returns *false*,
and SmbFile.getInputStream() fails with the same exception.
It seems that big5 is the correct encoding.
Besides, I wonder why there is no such problem when the client
is a Windows file manager.
If the Java VM was not cofigured correctely you would need to set the
ecoding. But it seems it is probably correct.
Post by joller
If the server is not a Samba, but is a Windows PC instead,
there is no such problem either.
Does this mean that the way jcifs talks to the server is different from
that of a Windows file manager?
Potentially yes. If Windows is negotiating Unicode and Samba is
negotiation 8bit-mode then the way strings are decoded and encoded is a
little different. We haven't had any reports of problems and I know we
have some CJK users but there certainly there could be a bug in 8-bit
mode communication. Or it's a bug in Samba.
Post by joller
And why SmbFile.exists() correctly returns true in this case?
SmbFile.lastModified() also returns the correct information.
It seems that querying information of a file works fine with Chinese path.
It just means that there is a bug in the NT_CREATE_ANDX command in either
jCIFS or Samba. To determine which we need three packet captures [1];
one of jcifs failing against samba, one of Windows succeeding against
samba, and one of jcifs succeeding against windows.

Mike

[1] http://jcifs.samba.org/capture.html
--
Greedo shoots first? Not in my Star Wars.
joller
2004-12-21 08:49:55 UTC
Permalink
Post by Michael B Allen
Hi joller,
What versions of Samba and JCIFS are you using?
I've tried both Samba 2.2.1-a4 and 2.2.7a-7.9.0,
and they show no difference.
Jcifs is version 1.1.5.
Post by Michael B Allen
Post by joller
And why SmbFile.exists() correctly returns true in this case?
SmbFile.lastModified() also returns the correct information.
It seems that querying information of a file works fine with Chinese path.
It just means that there is a bug in the NT_CREATE_ANDX command in either
jCIFS or Samba. To determine which we need three packet captures [1];
one of jcifs failing against samba, one of Windows succeeding against
samba, and one of jcifs succeeding against windows.
Mike
I'm working on the captures now.
When it is done, I'll sent them to your mailbox.
Thanks a lot!

joller
Michael B Allen
2004-12-22 04:33:51 UTC
Permalink
On Wed, 22 Dec 2004 11:57:52 +0800
Dear Mike,
It works!
I can get the content of the file, and list files in a directory with
Chinese name as well.
Is this a bug of JciFS?
Yes. There was a line in the NT_CREATE_ANDX encoder that assumed the path
was encoded in a non-variable width encoding. Apparently Big5 is a variable
width encoding and the filename you were using had some characters that
encoded as variable length.
Is it just fixed? That is, do you suggest we use the new jar in place?
Yes. The fix is solid.
Or will there be a new stable release in near future that will fix the
problem?
Yes. I will release the same code as 1.1.6 (with some other very minor
changes that are in the queue).
In addition, if the client must specify encoding compatible with the
server,
then it has to know the encoding.
Is there any programmatic method to determine the encoding of the
server?
The System property file.encoding is used to encode path names. I believe we
determined that that was already set to Big5. So you do not need to set the
encoding. Try it.

Mike
On Wed, 22 Dec 2004 10:32:26 +0800
Dear Mike,
Enclosed are the captures.
http://jcifs.samba.org/src/jcifs-1.1.5c.jar
--
Greedo shoots first? Not in my Star Wars.
joller
2004-12-22 05:49:05 UTC
Permalink
Post by Michael B Allen
Yes. The fix is solid.
That's a really good news for me!
Thanks for your help and patience. :D
Post by Michael B Allen
In addition, if the client must specify encoding compatible with the
server,
then it has to know the encoding.
Is there any programmatic method to determine the encoding of the
server?
The System property file.encoding is used to encode path names. I believe we
determined that that was already set to Big5. So you do not need to set the
encoding. Try it.
Mike
I meant, the user of the client program will ask to connect to some server
and retrieve some files,
But the servers the user can specify may not use the same encoding.
Is there any better way than asking the user to specify the encoding?

joller
Michael B Allen
2004-12-22 07:11:57 UTC
Permalink
On Wed, 22 Dec 2004 05:49:05 +0000 (UTC)
Post by joller
Post by Michael B Allen
The System property file.encoding is used to encode path names. I
believe we determined that that was already set to Big5. So you do not
need to set the encoding. Try it.
Mike
I meant, the user of the client program will ask to connect to some
server and retrieve some files,
But the servers the user can specify may not use the same encoding.
First, if Unicode is negotiated then you don't care. The encoding problem
only arises when the server is running on 8-bit mode. Windows uses Unicode
by default so now the problem is narrowed down to CJK user's the like to use
8-bit mode because they have existing filesystems in that encoding. Second,
a user accessing resources on a server will very likely be running on a
locale that is compatible with the server. This means that the file.encoding
will match (as it did in your case).
Post by joller
Is there any better way than asking the user to specify the encoding?
If the file.encoding property does not match the server 8-bit encoding you
have a problem because there is no way for CIFS to negotiate the encoding
(Microsoft solved the multiple codepage problem by switching to Unicode). It
is expected that the client and server will simply be configured to use the
same encoding.

Otherwise, there is no way to dynamically change the encoding. Setting
jcifs.encoding programmatically will have no effect after jcifs classes have
been accessed.

Mike
--
Greedo shoots first? Not in my Star Wars.
joller
2004-12-22 07:42:25 UTC
Permalink
Post by Michael B Allen
If the file.encoding property does not match the server 8-bit encoding you
have a problem because there is no way for CIFS to negotiate the encoding
(Microsoft solved the multiple codepage problem by switching to Unicode). It
is expected that the client and server will simply be configured to use the
same encoding.
Mike
Got it. Thank you!

joller

Loading...