DirectoryExists() on S3 buckets is extremely slow on Lucee5

Hey Everyone,

I have an application that I use to write files to an S3 bucket. The bucket
itself, admittedly, has quite a lot of folders and subfolders nested within
it. When a page request is made within my application we check to see if
the folder exists and create it if it doesn’t. Pretty basic code.------------------------------------------------
if(! directoryExists(‘/S3Mapping’)){
directoryCreate(‘/S3Mapping’);
}

When running this code on Lucee 4.5 the operation takes about 1.2 seconds
to complete. On Lucee 5 it’s taking 10x as long! 13 seconds. I’ve done
some testing and the length of time it takes is exponential based on the
size/complexity of the bucket directory. Although, I understand the more
folders and subfolders a bucket has the more Lucee has to do to traverse
the path, however, this was (and still is) working much more efficiently on
4.5 so the bucket size can’t be the issue. Has anyone had similar issues?
Any ideas?

Cheers,

Todd:

When I have worked with S3 I found many quirks with both ACF and
Railo/Lucee libs. In the end we went native to solve them. This is not
unique to Lucee though. We had the same issues with C#.

S3 is a case sensitive complex key/value object store whose structured key
names can be used to mimic directories. It does not behave exactly like a
file system.

To find out whether a directory exists, some wrappers translate it by first
scanning the whole key/value store for keys; then parse the keys and, then,
check the portion that would correspond to your directory.

Even in this there are quite a few performance difference depending on how
you setup api for the call. For example, if you miss to provide a parameter
called “delimiter” and/or “prefix” the execution takes substantially longer.

All that said.

I did take a peek into lucee 5 vs lucee 4 code. And here is what I see:

In lucee 4, the implementation passes a more comprehensive set of
parameters to the AWS S3 HTTP API. The S3 call API is implemented by the
lucee team:

e.g.

return s3.listContents(name, prefix, marker, maxKeys);

==> goes through HTTP

In Lucee 5 it looks like the jets3t open source library is used, so no more
lucee team working on keeping up with S3. And, the call is to actually list
the full content of the bucket like so:

listObjects(bucketName);

Then, the list is filtered as needed. The exists() call only filters down
and check whether the result is null:

return get(bucketName, objectName,includePseudoFolder)!=null;

The better alternative today would be to use a newer exposed java class
that checks for an object’s existence: doesObjectExist()

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3.html

So in short, you can create a ticket or implement your own call.

For speed, it may be advisable to check on a simple S3 wrapper alternative
and make the calls yourself to S3.

I think there is an alternate S3 wrapper on Riaforge. You can try to use it
to see whether you get the same response times.

HTH,

BilalOn Monday, July 11, 2016 at 7:36:18 PM UTC-4, Todd Kingham wrote:

Hey Everyone,

I have an application that I use to write files to an S3 bucket. The
bucket itself, admittedly, has quite a lot of folders and subfolders nested
within it. When a page request is made within my application we check to
see if the folder exists and create it if it doesn’t. Pretty basic code.


if(! directoryExists(‘/S3Mapping’)){
directoryCreate(‘/S3Mapping’);
}

When running this code on Lucee 4.5 the operation takes about 1.2 seconds
to complete. On Lucee 5 it’s taking 10x as long! 13 seconds. I’ve done
some testing and the length of time it takes is exponential based on the
size/complexity of the bucket directory. Although, I understand the more
folders and subfolders a bucket has the more Lucee has to do to traverse
the path, however, this was (and still is) working much more efficiently on
4.5 so the bucket size can’t be the issue. Has anyone had similar issues?
Any ideas?

Cheers,

Thanks for the reply Bilal,

I think we will be moving towards using the Java SDK and building my own
interface. We are needing to do that with some other AWS services anyway
(DynamoDB and SQS) so this will be a more consistent (and maintainable)
approach.

Thanks again :)On Wednesday, July 13, 2016 at 9:19:08 AM UTC-7, Bilal wrote:

Todd:

When I have worked with S3 I found many quirks with both ACF and
Railo/Lucee libs. In the end we went native to solve them. This is not
unique to Lucee though. We had the same issues with C#.

S3 is a case sensitive complex key/value object store whose structured key
names can be used to mimic directories. It does not behave exactly like a
file system.

To find out whether a directory exists, some wrappers translate it by
first scanning the whole key/value store for keys; then parse the keys and,
then, check the portion that would correspond to your directory.

Even in this there are quite a few performance difference depending on how
you setup api for the call. For example, if you miss to provide a parameter
called “delimiter” and/or “prefix” the execution takes substantially longer.

All that said.

I did take a peek into lucee 5 vs lucee 4 code. And here is what I see:

In lucee 4, the implementation passes a more comprehensive set of
parameters to the AWS S3 HTTP API. The S3 call API is implemented by the
lucee team:

e.g.

return s3.listContents(name, prefix, marker, maxKeys);

==> goes through HTTP

In Lucee 5 it looks like the jets3t open source library is used, so no
more lucee team working on keeping up with S3. And, the call is to actually
list the full content of the bucket like so:

listObjects(bucketName);

Then, the list is filtered as needed. The exists() call only filters down
and check whether the result is null:

return get(bucketName, objectName,includePseudoFolder)!=null;

The better alternative today would be to use a newer exposed java class
that checks for an object’s existence: doesObjectExist()

AmazonS3 (AWS SDK for Java - 1.12.437)

So in short, you can create a ticket or implement your own call.

For speed, it may be advisable to check on a simple S3 wrapper alternative
and make the calls yourself to S3.

I think there is an alternate S3 wrapper on Riaforge. You can try to use
it to see whether you get the same response times.

HTH,

Bilal

On Monday, July 11, 2016 at 7:36:18 PM UTC-4, Todd Kingham wrote:

Hey Everyone,

I have an application that I use to write files to an S3 bucket. The
bucket itself, admittedly, has quite a lot of folders and subfolders nested
within it. When a page request is made within my application we check to
see if the folder exists and create it if it doesn’t. Pretty basic code.


if(! directoryExists(‘/S3Mapping’)){
directoryCreate(‘/S3Mapping’);
}

When running this code on Lucee 4.5 the operation takes about 1.2 seconds
to complete. On Lucee 5 it’s taking 10x as long! 13 seconds. I’ve done
some testing and the length of time it takes is exponential based on the
size/complexity of the bucket directory. Although, I understand the more
folders and subfolders a bucket has the more Lucee has to do to traverse
the path, however, this was (and still is) working much more efficiently on
4.5 so the bucket size can’t be the issue. Has anyone had similar issues?
Any ideas?

Cheers,

Just created a new bug:

https://luceeserver.atlassian.net/browse/LDEV-1176

Watch/Vote… hope this helps.On Monday, July 11, 2016 at 5:36:18 PM UTC-6, Todd Kingham wrote:

Hey Everyone,

I have an application that I use to write files to an S3 bucket. The
bucket itself, admittedly, has quite a lot of folders and subfolders nested
within it. When a page request is made within my application we check to
see if the folder exists and create it if it doesn’t. Pretty basic code.


if(! directoryExists(‘/S3Mapping’)){
directoryCreate(‘/S3Mapping’);
}

When running this code on Lucee 4.5 the operation takes about 1.2 seconds
to complete. On Lucee 5 it’s taking 10x as long! 13 seconds. I’ve done
some testing and the length of time it takes is exponential based on the
size/complexity of the bucket directory. Although, I understand the more
folders and subfolders a bucket has the more Lucee has to do to traverse
the path, however, this was (and still is) working much more efficiently on
4.5 so the bucket size can’t be the issue. Has anyone had similar issues?
Any ideas?

Cheers,