Discussion:
[jcifs] Grabbing a directory and all of its subdirectories
Chris Moesel
2005-02-24 21:00:33 UTC
Permalink
If you read my previous email, you saw that I have a need for grabbing
the entire contents of a directory tree using the SMB protocol. You
also saw that it takes me over 40 seconds to traverse 5400 files in 600
subdirectories of the tree (NOTE, I don't need the actual contents of
each file, only the information for creating a listing-- name, date,
size, etc).

Is there a best practice for doing this recursive retrieval? My current
algorithm is a pretty simple recursion, but I was wondering if there was
some switch in JCIFS for turning on recursive retrieval, or some API I
was unaware of.

Here's a simplified version of what I'm currently doing is:

List getFilesFromDir(String path) {
SmbFile baseDir = new SmbFile(path);
SmbFile[] files = baseDir.listFiles();
List results = new ArrayList();

for (int i = 0; i < files.length; i++) {
SmbFile file = files[i];
if (file.isDirectory()) {
results.addAll(getFilesFromPath(file.getPath()));
} else {
results.add(file);
}
}
}

Any ideas for improving performance? By the way, using log statements,
I've been able to determine that of the 42 seconds it takes to make this
call on my big directory, only 2 seconds is in my code and the remaining
40 seconds is in the baseDir.listFiles() method.

Also, in case you're wondering, the Microsoft server I'm querying is
about 30 miles away-- when I run the code from the campus where the
server resides, the 42 seconds drops down to about 24 seconds (using
version 1.1.8 in both cases-- it is less with 1.1.3).

Thanks,
Chris
Scovetta, Michael V
2005-02-24 21:04:30 UTC
Permalink
Chris,

I'd only suggest changing
results.addAll(getFilesFromPath(file.getPath()));
to
results.addAll(getFilesFromDir(file.getPath()));
so it compiles ;)


Mike

Michael Scovetta
Computer Associates
Senior Application Developer


-----Original Message-----
From: jcifs-bounces+michael.scovetta=***@lists.samba.org
[mailto:jcifs-bounces+michael.scovetta=***@lists.samba.org] On Behalf
Of Chris Moesel
Sent: Thursday, February 24, 2005 4:01 PM
To: ***@lists.samba.org
Subject: [jcifs] Grabbing a directory and all of its subdirectories

If you read my previous email, you saw that I have a need for grabbing
the entire contents of a directory tree using the SMB protocol. You
also saw that it takes me over 40 seconds to traverse 5400 files in 600
subdirectories of the tree (NOTE, I don't need the actual contents of
each file, only the information for creating a listing-- name, date,
size, etc).

Is there a best practice for doing this recursive retrieval? My current

algorithm is a pretty simple recursion, but I was wondering if there was

some switch in JCIFS for turning on recursive retrieval, or some API I
was unaware of.

Here's a simplified version of what I'm currently doing is:

List getFilesFromDir(String path) {
SmbFile baseDir = new SmbFile(path);
SmbFile[] files = baseDir.listFiles();
List results = new ArrayList();

for (int i = 0; i < files.length; i++) {
SmbFile file = files[i];
if (file.isDirectory()) {
results.addAll(getFilesFromPath(file.getPath()));
} else {
results.add(file);
}
}
}

Any ideas for improving performance? By the way, using log statements,
I've been able to determine that of the 42 seconds it takes to make this

call on my big directory, only 2 seconds is in my code and the remaining

40 seconds is in the baseDir.listFiles() method.

Also, in case you're wondering, the Microsoft server I'm querying is
about 30 miles away-- when I run the code from the campus where the
server resides, the 42 seconds drops down to about 24 seconds (using
version 1.1.8 in both cases-- it is less with 1.1.3).

Thanks,
Chris
Chris Moesel
2005-02-24 21:10:56 UTC
Permalink
Thanks Mike,

I guess my on-the-fly compilation checking isn't working in my email
client today. Funny, my email client didn't give me a stacktrace when I
tried sending the code either. I thought Mozilla Thunderbird did
*everything*. ;)

-Chris
Post by Scovetta, Michael V
Chris,
I'd only suggest changing
results.addAll(getFilesFromPath(file.getPath()));
to
results.addAll(getFilesFromDir(file.getPath()));
so it compiles ;)
Mike
Michael Scovetta
Computer Associates
Senior Application Developer
-----Original Message-----
Of Chris Moesel
Sent: Thursday, February 24, 2005 4:01 PM
Subject: [jcifs] Grabbing a directory and all of its subdirectories
If you read my previous email, you saw that I have a need for grabbing
the entire contents of a directory tree using the SMB protocol. You
also saw that it takes me over 40 seconds to traverse 5400 files in 600
subdirectories of the tree (NOTE, I don't need the actual contents of
each file, only the information for creating a listing-- name, date,
size, etc).
Is there a best practice for doing this recursive retrieval? My current
algorithm is a pretty simple recursion, but I was wondering if there was
some switch in JCIFS for turning on recursive retrieval, or some API I
was unaware of.
List getFilesFromDir(String path) {
SmbFile baseDir = new SmbFile(path);
SmbFile[] files = baseDir.listFiles();
List results = new ArrayList();
for (int i = 0; i < files.length; i++) {
SmbFile file = files[i];
if (file.isDirectory()) {
results.addAll(getFilesFromPath(file.getPath()));
} else {
results.add(file);
}
}
}
Any ideas for improving performance? By the way, using log statements,
I've been able to determine that of the 42 seconds it takes to make this
call on my big directory, only 2 seconds is in my code and the remaining
40 seconds is in the baseDir.listFiles() method.
Also, in case you're wondering, the Microsoft server I'm querying is
about 30 miles away-- when I run the code from the campus where the
server resides, the 42 seconds drops down to about 24 seconds (using
version 1.1.8 in both cases-- it is less with 1.1.3).
Thanks,
Chris
Michael B Allen
2005-02-25 01:20:58 UTC
Permalink
On Thu, 24 Feb 2005 16:00:33 -0500
Post by Chris Moesel
If you read my previous email, you saw that I have a need for grabbing
the entire contents of a directory tree using the SMB protocol. You
also saw that it takes me over 40 seconds to traverse 5400 files in 600
subdirectories of the tree (NOTE, I don't need the actual contents of
each file, only the information for creating a listing-- name, date,
size, etc).
JCIFS can travers tens of thousands of files in less time than that so I
think the bottleneck is the server and/or the network. First try changing
jcifs.smb.client.listSize to a value ~MTU - 80 (1200 is good). This can
be a little better on high latency networks. Second, note that if you
build up a big list and *then* go back and examine each file and the
jcifs.smb.client.attrExpirationPeriod has expired on those files the
client will re-query the server for fresh attributes. That will make
things MUCH slower so try setting this to 0 to see what happends. Finally
if you really must have maximum performance you could change your code
to be like examples/T2Crawler which uses multiple threads and a breadth
first search technique that keeps the stale list short.

Otherwise do a diff on the 1.1.3 and 1.1.8 source and see what changed
in the relivent files.

Mike
--
IRC - where men are men, women are men, and the boys are FBI agents.
Continue reading on narkive:
Loading...