Parsing CSV data in JavaScript/TypeScript

What seems to be trivial can actually become challenging and complex.
The problem with CSV files is a somewhat vague standard or different interpretation of it, and therefore there is a large variation in the formatting of the CSV files.
There is no general standard for the CSV file format, but it is fundamentally described in RFC 4180 .
Theoretically, plain text is also valid CSV.
How are CSV files actually structured?
CSV stands for comma-separated values or character-separated values.
This means that various values (characters, text, numbers, etc.) are separated by commas (separators) which forms the structure.
Example:
Date,Time,Room Temperature,Outside Temperature,
01-02-2023,8:00,23.2,5.2,
01-02-2023,9:00,23.0,5.3,
01-02-2023,10:00,22.9,5.3,
However, there are country-specific differences. In English-speaking countries, a CSV file usually contains commas as a separator. In German-speaking countries, for example, semicolons are often used instead of commas. This is largely due to the fact that in German, for example, the comma is used as a decimal separator.
Now it doesn't actually matter which character is used as a separator, because with certain techniques, every character can be a valid character within a CSV file. However, using a country-specific separator simplifies the complexity of the CSV file and thus the processing of this file.
For example, if we want to use the separator character as a value, then the entire value must be packed into an escape character. A double quote character is usually used for this.
Example:
Text,"One, two and three",Next Value,
However, every character can be a valid character within the CSV file.
This also means the escape or double quote character requires special treatment. In such a case, the double-quote characters contained in the text are doubled and the entire value is then also wrapped in double-quote characters.
Example:
Text,"One, two and three","This is a quoted text: ""Some text""",
The double quote character can be distinguished from the escape character because it exists twice.
However, I have never seen such files myself and I would not recommend using such characters, as using them could tend to be problematic.
How can CSV data be generated?
Exporting CSV files is comparatively easy
So we need to double the escape character if it exists and wrap the text in the escape character if there is a separator or escape character:
const escapeCsvChars = (inputText: string, escapeCharacter: string, delimiter: string) => {
let outputText = inputText;
// If escape character is included it must be escaped by doubling it
if(inputText.includes(escapeCharacter)) {
outputText = outputText.replaceAll(escapeCharacter, escapeCharacter+escapeCharacter);
}
/*
* Complete line must be double quoted if it includes an Escape character or a delimiter character
*/
if(inputText.includes(delimiter) || inputText.includes(escapeCharacter)) {
outputText = `"${outputText}"`;
}
return outputText;
};
We can then process this function as follows by iterating through an array of strings. Using the example of an array that represents a single line of the CSV file:
const delimiter = ',';
const escapeCharacter = '"';
const singleLine = [
'Text',
'One, two and three',
'This is a quoted text: "Some text"',
];
let outputLine = '';
for(const column of singleLine) {
outputLine += escapeCsvChars(column, escapeCharacter, delimiter) + delimiter;
}
console.log(outputLine);
// Expected Output: Text,"One, two and three","This is a quoted text: ""Some text""",
The code snippet can then be expanded as desired, depending on individual requirements.
How can CSV data be parsed?
This part is significantly more time-consuming and complicated than the previous step.
To include every possibility, we need to iterate through each character.
If we know that there are no escape characters in a CSV file, then we could simply perform split()
to a string:
const splittedResult = line.split(delimiter);
Otherwise we just have to parse it laboriously:
const charAfter = 1;
/** Parses a line of CSV */
const parseCsvLine = (line: string, escapeCharacter: string, delimiter: string) => {
const result = [];
let startPosition = 0;
let hasEscape = false;
for(let index = 0; index < line.length; index++) {
const currentChar = line[index];
if(startPosition === index && currentChar === escapeCharacter) {
hasEscape = true;
continue;
}
if(!hasEscape) {
// If line does not have escape then we can simply search for the next delimiter
let currentPosition = line.indexOf(delimiter, startPosition);
// If line does not end with delimiter the position will result with -1, but it is end of line
if(currentPosition === noIndexFound) currentPosition = line.length;
const endPosition = currentPosition;
result.push(line.substring(startPosition, endPosition));
startPosition = currentPosition + charAfter;
index = currentPosition;
continue;
}
/*
* If we have escape we need to search for escape followed by delimiter, but when escape is used these chars could appear as normal text.
* And escape can also be used as normal text and is therfore duplicated
* We can simply check that by checking for duplicate escapes.
* All escape signs are doubled except for the first and the last sign and after the last there comes the escape
*/
const nextChar = line[index+charAfter];
if(currentChar === escapeCharacter && nextChar === escapeCharacter) {
// Just text - ignore it by jumping further
index++;
continue;
}
if(currentChar === escapeCharacter && nextChar === delimiter) {
// Found the end!
const endPosition = index;
// We need to correct the position to cut the escape character
const substring = line.substring(startPosition+charAfter, endPosition);
// Remove duplicate quotes/escapes
const withoutDuplicates = substring.replaceAll(escapeCharacter+escapeCharacter, escapeCharacter);
result.push(withoutDuplicates);
startPosition = endPosition + charAfter + charAfter;
index += charAfter;
hasEscape = false;
continue;
}
}
return result;
};
In summary, we can then call this function to convert a string into an array of strings. Including the simple approach from the start (to improve performance).
const lastElement = -1;
/**
* Converts a line of CSV into a usefull array of content
* CSV can contain it's own delimiters as character but then the whole string will be wrapped in the escape character
* Same happens if the escape character is included, but then the original escape character is used doubled.
*/
const convertCsvLine = (line: string, escapeCharacter: string, delimiter: string) => {
// Cannot parse if format does not match
if(!line.includes(delimiter)) throw new Error('Wrong file format: Missing delimiter');
// Delimiters need to be escaped if you want to use them as plain text - So, if not... Save some performance
if(!line.includes(escapeCharacter)) {
const splittedResult = line.split(delimiter);
// The line can end with delimiter which results in an unnessecary empty string
if(splittedResult.at(lastElement) === '') splittedResult.pop();
return splittedResult;
}
// Otherwise parse deeply
return parseCsvLine(line, escapeCharacter, delimiter);
};
const delimiter = ',';
const escapeCharacter = '"';
const testString = 'Text,"One, two and three","This is a quoted text: ""Some text""",';
console.log(convertCsvLine(testString, escapeCharacter, delimiter));
// Expected output: (3) ['Text', 'One, two and three', 'This is a quoted text: "Some text"']
The code snippet can then be expanded as desired, depending on individual requirements.