Is there a way to Limit spaces in cells to Standard Space durring import?

Posted by: georgeg on 4 April 2025, 4:34 pm EST

    • Post Options:
    • Link

    Posted 4 April 2025, 4:34 pm EST - Updated 4 April 2025, 4:39 pm EST

    Hi, Having a problem when importing files, we are getting a different character code for the space sometimes when importing a cell with space.

    Most of the time we are getting **Standard Space (ASCII 32 / Hex 20) **(represented by this screen shot of Word with the symbols enabled):frowning:1)

    George

    But some times we are encountering this: Non-breaking Space (Unicode U+00A0 / Hex A0): (2) :

    1. Standard Space (ASCII 32 / Hex 20): This is the most common representation of a space character. It is used in most text and data processing scenarios.

    2. Non-breaking Space (Unicode U+00A0 / Hex A0): This is a special space character that prevents line breaks at its position. It is often used in formatting to keep elements together on the same line.

    We’d like all spaces converted to the regular space because it makes it easeir to parse upstream in our system program that runs on C.

    Looking into this further… there are about 15 total different types of “space” characters your could run into:

    1. Standard Space (ASCII 32 / Hex 20): The regular space character used in most text.
    2. Non-breaking Space (Unicode U+00A0 / Hex A0): Prevents line breaks at its position.
    3. En Space (Unicode U+2002): A space that is roughly the width of a lowercase "n".
    4. Em Space (Unicode U+2003): A space that is roughly the width of a lowercase "m".
    5. Three-per-em Space (Unicode U+2004): One-third of an em space.
    6. Four-per-em Space (Unicode U+2005): One-fourth of an em space.
    7. Six-per-em Space (Unicode U+2006): One-sixth of an em space.
    8. Figure Space (Unicode U+2007): The same width as a digit.
    9. Punctuation Space (Unicode U+2008): The same width as a period.
    10. Thin Space (Unicode U+2009): Thinner than a standard space.
    11. Hair Space (Unicode U+200A): Even thinner than a thin space.
    12. Zero Width Space (Unicode U+200B): No width, used for word boundaries.
    13. Narrow No-break Space (Unicode U+202F): A narrow version of the non-breaking space.
    14. Medium Mathematical Space (Unicode U+205F): Used in mathematical notation.
    15. Ideographic Space (Unicode U+3000): The width of an ideographic character.
    

    The question is … is there a simple way to convert any space type in to the Standard Space (ASCII 32 / Hex 20) upon import or by selecting the cells? We really *don’t want the non-breaking *variety of space, we need it to be the Standard Space for our parsing down stream.

  • Posted 7 April 2025, 4:49 am EST

    Hi George,

    Thank you for the detailed explanation and examples — this really helps us understand the issue better.

    We completely understand the need to normalize all types of space characters to the standard space (ASCII 32) for consistent parsing in your downstream system.

    Currently, SpreadJS does not automatically convert these different space characters during import. However, we do have a suggestion you can use as a workaround for now.

    Workaround: Clean All Space Characters

    You can run a simple loop over the imported cells and use a regex like this to normalize any type of space character to Standard Space (ASCII 32):

    const normalizeSpacesInWorkbook = (workbook) => {
    	const sheetCount = workbook.getSheetCount();
    
    	for (let i = 0; i < sheetCount; i++) {
    		const sheet = workbook.getSheet(i);
    		const rowCount = sheet.getRowCount();
    		const colCount = sheet.getColumnCount();
    
    		for (let row = 0; row < rowCount; row++) {
    			for (let col = 0; col < colCount; col++) {
    				let val = sheet.getValue(row, col);
    				if (typeof val === "string") {
    					// Replace any type of space with standard space
    					val = val.replace(
    						/[\u00A0\u2000-\u200B\u202F\u205F\u3000]/g,
    						" "
    					);
    					sheet.setValue(row, col, val);
    				}
    			}
    		}
    	}
    };

    You can call this function while looping through your imported cells.

    Notes:

    • This works at workbook level.
    • It checks every cell in every sheet.
    • It replaces
    • Non-breaking space \u00A0
    • En/Em/Thin/Hair Spaces \u2000-\u200B
    • Narrow No-break Space \u202F
    • Medium Mathematical Space \u205F
    • Ideographic Space \u3000

    → Converts all of them to a normal space ASCII 32.

    Please let us know if you need further assistance.

    Best regards,

    Ankit

    sample (2).zip

    file.zip

  • Posted 8 April 2025, 10:59 am EST

    Yes sounds good. Thought we’d have to do something like that, Just want to make sure there wasn’t an “easy button” somewhere.

    Thanks!

    George

  • Posted 9 April 2025, 2:32 am EST

    Hi George,

    Glad to hear that this resolves your issue for now.

    However, I would like to highlight that SpreadJS is a JavaScript-based technology. Given the dynamic nature of JavaScript, it does not impose restrictions related to space characters, and this behavior naturally extends to our product as well.

    On the other hand, traditional C-based languages do not exhibit the same dynamic behavior, which may lead to issues in specific scenarios like this. Therefore, additional handling or processing is required in such particular cases.

    Since this a very niche requirement we do not have particular resolution for such processing within the product. Hence you need to implement this way.

    Please feel free to reach out if you need any further assistance.

    Best Regards

    Ankit

Need extra support?

Upgrade your support plan and get personal unlimited phone support with our customer engagement team

Learn More

Forum Channels