N00b question: how to deal with quotes and double quotes in cfhttp csv files

Evening, All.

I’m converting a truly ancient application from CF6 to lucee.

Everything works great, the actual code migration went very smoothly and
I’ve managed to get everything working, as well as cleaning up a few old
irritations. So far, so good.

However, I’m having a problem when using the production data. The app uses
cfhttp to create an sql-like result set from a CSV file and, when a single
or double quote appears in a field, lucee appears to ignore the field
separators, line feeds , etc and keeps reading until it finds another
instance of the same character. This may be several lines/records later and
usually ends with a an error stating that there is an incorrect number of
columns in the input line.

I’m running on Ubuntu server 16.10 with the latest Apache2. Lucee was
installed with ‘lucee-5.0.0.252-pl0-linux-x64-installer.run’.

cfhttp is:

I’ve tried encasing the fields of the csv file in different textdelimiters
(’ " ` and others); tried different separator characters and I’m all but
tearing my hair out. I could write a preprocessing filter script but that
wasn’t needed in CF6 and I’m wondering why it’s needed in lucee.

What have I missed?

Let me know if more doco is required and I’ll be happy to post it.

Thanks,
Pete

as an alternative method, take a look at cfspreadsheet
https://github.com/cfsimplicity/lucee-spreadsheet

https://github.com/cfsimplicity/lucee-spreadsheet/wiki/readOn 5 September 2016 at 18:46, Peter Whitby <@Peter_Whitby> wrote:

Evening, All.

I’m converting a truly ancient application from CF6 to lucee.

Everything works great, the actual code migration went very smoothly and
I’ve managed to get everything working, as well as cleaning up a few old
irritations. So far, so good.

However, I’m having a problem when using the production data. The app uses
cfhttp to create an sql-like result set from a CSV file and, when a single
or double quote appears in a field, lucee appears to ignore the field
separators, line feeds , etc and keeps reading until it finds another
instance of the same character. This may be several lines/records later and
usually ends with a an error stating that there is an incorrect number of
columns in the input line.

I’m running on Ubuntu server 16.10 with the latest Apache2. Lucee was
installed with ‘lucee-5.0.0.252-pl0-linux-x64-installer.run’.

cfhttp is:

I’ve tried encasing the fields of the csv file in different textdelimiters
(’ " ` and others); tried different separator characters and I’m all but
tearing my hair out. I could write a preprocessing filter script but that
wasn’t needed in CF6 and I’m wondering why it’s needed in lucee.

What have I missed?

Let me know if more doco is required and I’ll be happy to post it.

Thanks,
Pete


Get 10% off of the regular price for this years CFCamp in Munich, Germany
(Oct. 20th & 21st) with the Lucee discount code Lucee@cfcamp. 189€
instead of 210€. Visit https://ti.to/cfcamp/cfcamp-
2016/discount/Lucee@cfcamp

You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/lucee/66f1baf0-e710-4d8e-8f21-373aba3d0892%40googlegroups.com
https://groups.google.com/d/msgid/lucee/66f1baf0-e710-4d8e-8f21-373aba3d0892%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

AJ Mercer
<webonix:net strength=“Industrial” /> http://webonix.net | <webonix:org
community=“Open” /> http://webonix.org
http://twitter.com/webonix

Can you post a sample of the CVS file?

Andrew penhorwoodOn Monday, September 5, 2016 at 7:06:33 AM UTC-4, Peter Whitby wrote:

Evening, All.

I’m converting a truly ancient application from CF6 to lucee.

Everything works great, the actual code migration went very smoothly and
I’ve managed to get everything working, as well as cleaning up a few old
irritations. So far, so good.

However, I’m having a problem when using the production data. The app uses
cfhttp to create an sql-like result set from a CSV file and, when a single
or double quote appears in a field, lucee appears to ignore the field
separators, line feeds , etc and keeps reading until it finds another
instance of the same character. This may be several lines/records later and
usually ends with a an error stating that there is an incorrect number of
columns in the input line.

I’m running on Ubuntu server 16.10 with the latest Apache2. Lucee was
installed with ‘lucee-5.0.0.252-pl0-linux-x64-installer.run’.

cfhttp is:

I’ve tried encasing the fields of the csv file in different textdelimiters
(’ " ` and others); tried different separator characters and I’m all but
tearing my hair out. I could write a preprocessing filter script but that
wasn’t needed in CF6 and I’m wondering why it’s needed in lucee.

What have I missed?

Let me know if more doco is required and I’ll be happy to post it.

Thanks,
Pete

Sample CSV file attached. It has the same problem that breaks the prod app.
Lucee error screen below (and attached, just in case). replacing single
quote characters with double quotes generates the same error.

In the Detail area below, you can see that the field separator | in record
1 is recognised until the 5th field where the error string (H’ve) causes
Lucee to look for it’s matching character in record 3 field 6 (i’) before
accepting new field separators.

As a work-around, I’m pre-processing the csv file to replace quotes and
double quotes with HTML entities (&#39 and &#34), which works fine. I’d
prefer it if Lucee could be told to ignore them…

More information:
I put the second single quote in the same field on record 3 which means
that the field count is correct and Lucee doesn’t fail. However, the data
between the two quotes is included in the summary field for record 1 as
shown below.On Tuesday, 6 September 2016 21:06:17 UTC+10, Andrew Penhorwood wrote:

Can you post a sample of the CVS file?

Andrew penhorwood

On Monday, September 5, 2016 at 7:06:33 AM UTC-4, Peter Whitby wrote:

Evening, All.

I’m converting a truly ancient application from CF6 to lucee.

Everything works great, the actual code migration went very smoothly and
I’ve managed to get everything working, as well as cleaning up a few old
irritations. So far, so good.

However, I’m having a problem when using the production data. The app
uses cfhttp to create an sql-like result set from a CSV file and, when a
single or double quote appears in a field, lucee appears to ignore the
field separators, line feeds , etc and keeps reading until it finds another
instance of the same character. This may be several lines/records later and
usually ends with a an error stating that there is an incorrect number of
columns in the input line.

I’m running on Ubuntu server 16.10 with the latest Apache2. Lucee was
installed with ‘lucee-5.0.0.252-pl0-linux-x64-installer.run’.

cfhttp is:

I’ve tried encasing the fields of the csv file in different
textdelimiters (’ " ` and others); tried different separator characters and
I’m all but tearing my hair out. I could write a preprocessing filter
script but that wasn’t needed in CF6 and I’m wondering why it’s needed in
lucee.

What have I missed?

Let me know if more doco is required and I’ll be happy to post it.

Thanks,
Pete

sample.csv (1.25 KB)

Auto Generated Inline Image 1 (10.3 KB)

Auto Generated Inline Image 2 (5.93 KB)

I downloaded your sample.csv file and created a test. I ran it on ACF 10
and it parses correctly. At this point I think the problem is a bug in how
Lucee is handling the delimiters. I also ran my test on Lucee 4.5.2.018
express. The express version generated the same error that you listed
here. One possible workaround is to add double-quotes (") around each
column in the file creation process. When I did this to the sample.csv and
changed the textqualifier the file processed correctly.

You should open a bug report with Lucee (
https://luceeserver.atlassian.net/secure/Dashboard.jspa ) with a test case
so Mica can look into the issue.

Andrew Penhorwood

Thanks Andrew. I’ll raise a bug report.

Regarding the doublequote textdelimiter - that will not work for random
doublequotes in the fields. I’ll ensure that there’s a sample of that in
the files I provide with the bug report.

My work around is to write a pre-processor and replace any quote or
doublequote with an appropriate html entity (eg &#39).

Thanks - PeteOn Wednesday, 7 September 2016 20:58:59 UTC+10, Andrew Penhorwood wrote:

I downloaded your sample.csv file and created a test. I ran it on ACF 10
and it parses correctly. At this point I think the problem is a bug in how
Lucee is handling the delimiters. I also ran my test on Lucee 4.5.2.018
express. The express version generated the same error that you listed
here. One possible workaround is to add double-quotes (") around each
column in the file creation process. When I did this to the sample.csv and
changed the textqualifier the file processed correctly.

You should open a bug report with Lucee (
https://luceeserver.atlassian.net/secure/Dashboard.jspa ) with a test
case so Mica can look into the issue.

Andrew Penhorwood

I tried to raise a Jira report. The account creation failed with the
following:

Any ideas?

PeteOn Thursday, 8 September 2016 17:47:19 UTC+10, Peter Whitby wrote:

Thanks Andrew. I’ll raise a bug report.

Regarding the doublequote textdelimiter - that will not work for random
doublequotes in the fields. I’ll ensure that there’s a sample of that in
the files I provide with the bug report.

My work around is to write a pre-processor and replace any quote or
doublequote with an appropriate html entity (eg &#39).

Thanks - Pete

On Wednesday, 7 September 2016 20:58:59 UTC+10, Andrew Penhorwood wrote:

I downloaded your sample.csv file and created a test. I ran it on ACF 10
and it parses correctly. At this point I think the problem is a bug in how
Lucee is handling the delimiters. I also ran my test on Lucee 4.5.2.018
express. The express version generated the same error that you listed
here. One possible workaround is to add double-quotes (") around each
column in the file creation process. When I did this to the sample.csv and
changed the textqualifier the file processed correctly.

You should open a bug report with Lucee (
https://luceeserver.atlassian.net/secure/Dashboard.jspa ) with a test
case so Mica can look into the issue.

Andrew Penhorwood

Auto Generated Inline Image 1 (2.92 KB)

No idea on why that happened. Create a new post here about not being able
to add a JIRA issue. At this point only people who are interesting in this
thread are reading it. Did you create an account on the JIRA site? It has
been awhile since I reported anything there.

Andrew PenhorwoodOn Thursday, September 8, 2016 at 3:48:46 AM UTC-4, Peter Whitby wrote:

I tried to raise a Jira report. The account creation failed with the
following:

Any ideas?

Pete

On Thursday, 8 September 2016 17:47:19 UTC+10, Peter Whitby wrote:

Thanks Andrew. I’ll raise a bug report.

Regarding the doublequote textdelimiter - that will not work for random
doublequotes in the fields. I’ll ensure that there’s a sample of that in
the files I provide with the bug report.

My work around is to write a pre-processor and replace any quote or
doublequote with an appropriate html entity (eg &#39).

Thanks - Pete

On Wednesday, 7 September 2016 20:58:59 UTC+10, Andrew Penhorwood wrote:

I downloaded your sample.csv file and created a test. I ran it on ACF
10 and it parses correctly. At this point I think the problem is a bug in
how Lucee is handling the delimiters. I also ran my test on Lucee
4.5.2.018 express. The express version generated the same error that you
listed here. One possible workaround is to add double-quotes (") around
each column in the file creation process. When I did this to the
sample.csv and changed the textqualifier the file processed correctly.

You should open a bug report with Lucee (
https://luceeserver.atlassian.net/secure/Dashboard.jspa ) with a test
case so Mica can look into the issue.

Andrew Penhorwood

As another alternative method, I have used the following function in the
past to parse csv style strings. I tested it against your sample file and
it seemed to work OK:





<cfargument name=“escapeChar” type=“string” default="""" required=“false”>
<cfargument name=“escapedEscapeChar” type=“string” default=""""""
required=“false”>

var currentVal = “”; // string containing the value currently being parsed
var inEscapedCell = false; // set to true when we are parsing an escaped
cell value
var i = 1;
var currentChar = “”;
var rowDelimCheck = “”;
var escapeDelimCheck = “”;
var escapedEscapeCheck = “”;
var aDataSet = arrayNew(1);
var aRowSet = arrayNew(1);
// we parse our string a character at a time
for(i=1; i LTE len(arguments.csvString); i=i+1){
currentChar = mid(arguments.csvString, i, 1);
rowDelimCheck = mid(arguments.csvString, i, len(arguments.rowDelim));
escapeDelimCheck = mid(arguments.csvString, i, len(arguments.escapeChar));
escapedEscapeCheck = mid(arguments.csvString, i,
len(arguments.escapedEscapeChar));
// are we in an escaped cell???
if(inEscapedCell){
// we are in an escaped cell, we need to check for our ending escape
character
if((NOT escapeDelimCheck IS arguments.escapeChar) OR (escapedEscapeCheck IS
arguments.escapedEscapeChar)){
if(escapedEscapeCheck IS arguments.escapedEscapeChar){
currentVal = currentVal & arguments.escapeChar; // log our character
i = i + len(arguments.escapedEscapeChar) - 1;
}else{
currentVal = currentVal & currentChar; // log our character
}
}else{
i = i + len(arguments.escapeChar) - 1;
inEscapedCell = false; // break from our escaped cell
}
}else{
// are we at the end of our cell???
if((currentChar IS arguments.itemDelim) OR (rowDelimCheck IS
arguments.rowDelim)){
arrayAppend(aRowSet, currentVal);// commit our current value
currentVal = “”; // get ready for our next value
if(rowDelimCheck IS arguments.rowDelim){
arrayAppend(aDataSet, duplicate(aRowSet));// commit our current row
aRowSet = arrayNew(1); // get ready for our next row
i = i + len(arguments.rowDelim) - 1;
}
}else{
if(escapeDelimCheck IS arguments.escapeChar){
i = i + len(arguments.escapeChar) - 1;
inEscapedCell = true; // open our escaped cell clause
}else{
currentVal = currentVal & currentChar; // log our character
}
}
}
}
return aDataSet;

<cfset csv = fileRead(expandPath("./sample.csv"))>

<cfset parsed = csvToArray(csv, “|”, chr(10))>

All the best,

MartinOn Monday, 5 September 2016 12:06:33 UTC+1, Peter Whitby wrote:

Evening, All.

I’m converting a truly ancient application from CF6 to lucee.

Everything works great, the actual code migration went very smoothly and
I’ve managed to get everything working, as well as cleaning up a few old
irritations. So far, so good.

However, I’m having a problem when using the production data. The app uses
cfhttp to create an sql-like result set from a CSV file and, when a single
or double quote appears in a field, lucee appears to ignore the field
separators, line feeds , etc and keeps reading until it finds another
instance of the same character. This may be several lines/records later and
usually ends with a an error stating that there is an incorrect number of
columns in the input line.

I’m running on Ubuntu server 16.10 with the latest Apache2. Lucee was
installed with ‘lucee-5.0.0.252-pl0-linux-x64-installer.run’.

cfhttp is:

I’ve tried encasing the fields of the csv file in different textdelimiters
(’ " ` and others); tried different separator characters and I’m all but
tearing my hair out. I could write a preprocessing filter script but that
wasn’t needed in CF6 and I’m wondering why it’s needed in lucee.

What have I missed?

Let me know if more doco is required and I’ll be happy to post it.

Thanks,
Pete

As I stated in my post the error occurred while trying to create an
account. I’ll create a new post. Thanks.On Thursday, 8 September 2016 20:19:57 UTC+10, Andrew Penhorwood wrote:

No idea on why that happened. Create a new post here about not being able
to add a JIRA issue. At this point only people who are interesting in this
thread are reading it. Did you create an account on the JIRA site? It has
been awhile since I reported anything there.

Andrew Penhorwood

On Thursday, September 8, 2016 at 3:48:46 AM UTC-4, Peter Whitby wrote:

I tried to raise a Jira report. The account creation failed with the
following:

Any ideas?

Pete

On Thursday, 8 September 2016 17:47:19 UTC+10, Peter Whitby wrote:

Thanks Andrew. I’ll raise a bug report.

Regarding the doublequote textdelimiter - that will not work for random
doublequotes in the fields. I’ll ensure that there’s a sample of that in
the files I provide with the bug report.

My work around is to write a pre-processor and replace any quote or
doublequote with an appropriate html entity (eg &#39).

Thanks - Pete

On Wednesday, 7 September 2016 20:58:59 UTC+10, Andrew Penhorwood wrote:

I downloaded your sample.csv file and created a test. I ran it on ACF
10 and it parses correctly. At this point I think the problem is a bug in
how Lucee is handling the delimiters. I also ran my test on Lucee
4.5.2.018 express. The express version generated the same error that you
listed here. One possible workaround is to add double-quotes (") around
each column in the file creation process. When I did this to the
sample.csv and changed the textqualifier the file processed correctly.

You should open a bug report with Lucee (
https://luceeserver.atlassian.net/secure/Dashboard.jspa ) with a test
case so Mica can look into the issue.

Andrew Penhorwood