OfficeDev/office-js

#NAME issue for custom functions at random times on loading a workbook

Opened this issue · 35 comments

Summary

Following on from #4007 and as noted there, we are seeing a similar issue which severly impacts the confidence of users in our product for something we seemingly have no control over...

Our addin is a shared runtime addin that uses streaming functions and at random times, users just get #NAME for all of our functions.

We load and register the functions as early as possible in our addin and have tried waiting for an Office.onReady(...) and not waiting - it doesn't seem to matter.

This can start from either of the two scenarios noted below:

Scenario 1

  • User already has a workbook open with our functions loaded and working correctly
  • User opens (via Windows File Explorer or via File > Open) another workbook with our functions in it
  • All functions show as #NAME with the xldudf_ syntax
  • Our local logging shows that the calls to CustomFunctions.associate(<name>, <callback>) have been made

Scenario 2

  • User does not have Excel open and there are no rogue Excel processes
  • Opens Excel and loads an existing workbook with our functions in it
  • All functions show as #NAME with the xldudf_ syntax
  • Our local logging shows that the calls to CustomFunctions.associate(<name>, <callback>) have been made

image

We have console logging that on a certain action in the taskpane will attempt to re-associate the functions as was done on addin loading, and we have noticed that when it is in the #NAME state, calling associate again gives the [DuplicatedName] warning as if the functions were actually loaded and working:

image

We use this code with the function associate(id: string, functionObject: Function): void; signature to load the functions:

image

Resolution for users is fairly scorched earth:

  • uninstall the addin
  • close excel
  • start excel
  • open blank workbook (if a user opens a workbook here that has our functions, things stay broken when the addin is auto-installed because of workbook association)
  • install addin
  • create a test of our formula in the blank workbook to ensure its working
  • Load the workbook with our formula in it.

Your Environment

  • Platform [PC desktop, Mac, iOS, Office on the web]: Desktop
  • Host [Excel, Word, PowerPoint, etc.]: Excel
  • Office version number: Version 2404 Build 16.0.17531.20152
  • Operating System: Windows 11

Expected behavior

Functions should load every time, without fail.

Current behavior

Functions are randomly, at times, even when the users is in a working addin session and loads another workbook, returning #NAME

Steps to reproduce

I wish this was reproducible on demand, but it isn't as it's to do with the Officejs core failing to load the custom functions reliably every time.

@demyte we also face this issue in production and it negatively affects the user experience. So far we have not been able to isolate the issue or identify replication steps. On a related note, have you faced an issue where files that have been impacted by the #NAME error have an issue where custom functions then show a #VALUE! or similar error and completely fail to refresh (e.g. the custom function doesn’t get called at all). I documented this problem internally yesterday so that we can file it here, but the starting point was basically when functions failed to register. The only workaround on affected files is to either go to formula bar and press enter on each formula, or do a full recalculation followed by a Calculate Sheet on each affected sheet — something users can’t be expected to figure out on their own let alone do reliably on workbook containing 100+ sheets and hidden sheets.

@gmichaud Yes, we also get that from time to time (not as frequent as this current issue) and have no other way to resolve it beyond what you note.

I agree that it does seem either partially or fully related to this issue of the functions not correctly loading that we are seeing, so hopefully if we can get a resolution to one, we get it for both. 🤞

@demyte thanks for your reply, good to know that it’s not just us experiencing this problem. We have saved a file in this state and will create an issue on GitHub in the coming days to document the problem where custom-function cells no longer recalculate.

Hi @demyte , thanks for reporting this issue. I am not able to reproduce this issue on my side. Could you please help provide a screen recording if the sample file/data is not convenient to share? I think you are right that it may be related to loading of Office JS files.

Hi @MiaofeiWang

It is not something we have been able to create a reproducible scenario for at all as it is so random.

When we get a support call from a user, we are in triage mode and our aim is to get them back and working again asap.


Some thoughts...

The only consistent thing about it is that when it occurs it is on the load of a workbook.

It does not seem to be workbook related as on a reinstall of the add-in and reload of the same workbook, it's all fine.

The fact that when it is in this state and we (as shown above) go and re-associate the custom functions callback it shows as it already being associated, feels to me like there could be some race condition in the OfficeJS loading of the functions.json into Excel and the calls to CustomFunctions.associate(...) that is occasionally being triggered so the CustomFunctions runtime thinks they are loaded but Excel doesn't?

I know there is an internal registration call that ScriptLab uses (CustomFunctionManager.register(jsonMetadataString, code);) to just-in-time register functions into the runtime, could we use that to force reload our functions into the runtime?


We are really invested in getting this issue resolved for our users, as along with @gmichaud #4502, they are real confidence/trust breaking issues for our product, that, from what we know, are not to do with us, but with the OfficeJS implementation.

Please let me know if there is anything I can do to help get a resolution for this.

@MiaofeiWang we don't have any replication steps either and also get support requests for this issue every month. It's usually urgent and we don't have time to do much investigation as we have to get the customer's machine up and running again. I definitely affect the confidence/trust in our solution. Reinstalling the add-in and/or clearing the add-in caches fixes it. Is there a folder, log or registry hive that we should backup for you to be able to investigate this the next time it happens?

Thanks @demyte @gmichaud for the information. Agree that we are dealing with a race condition, which inherently makes it challenging to consistently reproduce the scenario. Currently, we lack specific logs that would allow us to trace each step that occurs.

@gmichaud could you please clarify if the #NAME issue of your add-in also occurs on streaming functions?

Regarding the dynamic registration feature used by ScriptLab, it employs a specialized method that is not yet broadly available for other add-ins. We apologize for any inconvenience this may cause. Our team is actively exploring the feasibility and necessary efforts to make this feature accessible to other add-ins in the future.

@MiaofeiWang we don’t use streaming functions so I can’t comment on that unfortunately

@gmichaud , thanks. May I know if there are entities (instead of primitive types like boolean, number and string) in the file that are output by your functions when the #NAME issue happens during workbook loading? Recently we fixed an issue that loading workbook triggers functions (with entities as output) unexpected re-calculation. We expect this fix will resolve some of the #VALUE! issues.

@MiaofeiWang we don’t return entities, but an important detail - our functions are dynamic-aware and can return either a single scalar value or an array.

@gmichaud @MiaofeiWang correction, actually we do return FNVs (for ISO-formatted dates) which are "kind of" "entities" because they are not simple booleans/numbers/strings. And we have allowCustomDataForDataTypeAny enabled in our manifest.

Our bug is exactly about #VALUE errors #4502 - @MiaofeiWang if the fix is already delivered what is the ETA (or ideally a build number) of the deployment so we can test? 🙏

FYI our support team just mentioned that over the last months we received tens of tickets with the functions stuck in the #NAME state...

@MiaofeiWang Is there any extra detailed platform logging we can enable to assist with this?

As a dev, I understand completely how hard these issues are to lock down to a cause, but we really need Microsoft to get this resolved and provide a robust experience here.

Also, we use allowCustomDataForDataTypeAny:true for our functions and some of them return entities and others just numbers/text. When the #NAME issue occurs, it's for all of our functions.

@wh1t3cAt1k Sure. The re-calculation issue was fixed for Excel on Windows with build >= 16.0.17723.20000. This will prevent from some unexpected re-calculation which may reduce possibility of #VALUE issue. But I have no enough confidence that this can resolve the issue of #4502 .

@demyte Totally understand your feeling. We are investigating the possible root cause of #NAME issue on streaming functions. But unfortunately no luck as we are not able to reproduce this even for one time. So one screen recording may be able to help us.

@MiaofeiWang

One thing we have noticed across all the support calls we are getting for this, is that it can often happen when the app magic link is used, especially for a user that has never used our addin before.

Ours is: ms-excel:https://xpna.app/assets/xpna-onboarding.xlsx

NOTE:

You do not need to be a subscriber to get the #NAME issue as the functions are loaded on addin start (before subscription validation etc...)

The steps would be:

  • Start with a clean excel that has never used our addin
  • Launch the link above
  • Go through the OfficeJS process to add the addin and accept the terms etc
  • You should see in A2 whether the formulas are working or not

Hopefully, following this code path through the source allows you to understand more readily where the issue could be occuring?

One more case today, from a different user…

One more today. This is a user that frequently works with our add-in, we have not pushed any update recently or changed anything on our end that could have triggered this issue. The side-panel and ribbon commands were all working fine -- it's just the custom functions that are not working. We asked the client to clean the web add-ins cache, and issue was resolved.

Reprinting the exact information provided by the client:

We just encountered an issue with my coworker trying to unhide one line and refresh the file I created. Below is a snapshot of what keeps happening to her. We thought that it may have to do with her downloading the recent Microsoft update on her PC, so we exchanged PCs and logged in separately on those.
She was able to login with her credentials on my PC, unhide the one row and refresh without any errors. I was able to login with my credentials on her PC and do the same without any errors. The issue is only occurring on her PC with her credentials. Note: This is the same file her and I are both reviewing, which is under the same location. Unknown

Update! We had another customer call in today, and this time the user was a bit more technical and was willing to give me an hour to troubleshoot. Same symptom as usual -- add-in is loaded, ribbon is there, but all custom functions show the #NAME! error. Excel Intellisense shows none of our custom functions either if you try to build a new cell (even in a new workbook).

Screenshot of Intellisense showing none of our custom functions, even though the add-in is loaded:
Screenshot 2024-06-17 at 5 44 01 PM

How it should look:
Screenshot 2024-06-17 at 5 22 28 PM

I asked the user to backup the %userprofile%\AppData\Local\Microsoft\Office\16.0\Wef folder, and then delete it manually. After restarting Excel and re-adding the add-in, everything was back to normal. I decided to use a folder comparison tool to compare both WEF folders - there are multiple differences (probably cached data and other add-ins that were installed at some point), but one thing in particular struck my eye - there's no folder for our add-in (store ID WA200002311) in the Wef\CustomFunctions\v1.7\Omex folder:

Screenshot 2024-06-17 at 5 46 58 PM

On a working system I decided to manually delete this folder from Wef\CustomFunctions\v1.7\Omex, and the end result is similar to what is experienced by our users -- custom functions stop working, and the only fix is to clear the add-ins cache and reinstall the add-in. I recorded a demo here: https://www.loom.com/share/b074d24a9ad249d9a496631a5f90212f?from_recorder=1&focus_title=1

The JSON files under CustomFunctions point to a Metadata folder located in [guid]\Omex[some binhex64 number]\Metadata -- on both systems the metadata is there, the only thing that appears missing are the files in Wef\CustomFunctions\v1.7\Omex... I can share the content of the WEF folder with Microsoft on request.

Hypothesis: during loading of the add-in metadata, an exception occurs which causes the process to be silently interrupted, resulting in a partially initialized add-in. Excel updates the add-in metadata at regular intervals, which explains why the problem is happening at random, to different users, even when we don't release any updates of our add-in.

Anectodally, we believe we have seen a higher volume of cases around the time we push a new version on AppSource (we've done that sparingly, last time was 2 months ago) but I am not sure there's any relationship. The user calling today has been definitely using the add-in daily without issues, so they would have had the most recent version.

@gmichaud amazing!!

Thanks for getting all that captured.

@gmichaud Thanks for the valuable information! The link files (for custom functions' JS and metadata) are under the path "Wef\CustomFunctions\v1.7\Omex". If those file are not successfully created, Excel cannot find the associated JS function.
We will investigate why and how often this issue happens (tracked by internal work item 9090753).

@MiaofeiWang this is now one of the most common support requests that we receive from the customers, the issue with ugly and doesn't have an easy workaround. According to Velixo support, we have dozens of tickets with the #NAME issue happened in the last several months. I hope this bug fix could be prioritised on your end!

Hi @gmichaud

I tried to replicate the issue by following your step. After I deleted the specific wef folder and reopened the file and re-install the add-in, all work well. Which means re-install add-in can resolve that issue. this is the recording:

Screen.Recording.2024-07-01.at.20.48.09.mov

The hardest part for us is how to consistently replicate this issue. It would be great if you could find some clue that cause the folder missing or some feature can consistently replicate that issue.

Thanks.

@penglongzhaochina would you be able to defensively code against this issue?

By this I mean something like, on addin load, if the manifest says there are functions but the correct folder and structure is not there, then create the correct folder and structure again?

This would then be self-healing and seamless to the end users.

@penglongzhaochina would you be able to defensively code against this issue?

By this I mean something like, on addin load, if the manifest says there are functions but the correct folder and structure is not there, then create the correct folder and structure again?

This would then be self-healing and seamless to the end users.

The addin loading already do that. From my recording you can see that re-install the add-in will create the folder which is missing and resolve the #name issue.

I’m also supporting @demyte idea. This issue is happening often enough to a large number of users across his and our add-in that it warrants adding specific measures into the existing code…

If the code was open sourced I would do a code review and look at all the places there the manifest files are getting cleared/updated, to try to identify scenarios where things can be wrong. There are multiple files getting cached, what if there’s an exception right before functions.json gets saved? Does it retry on next load or does it assume the cache is up-to-date?

Also - do you have detailed logging in your telemetry around this part of the code? Thanks for all your help… if we do find additional clues I will update this!

@penglongzhaochina I do not mean on addin reinstall, on every addin load.

Hi @demyte @gmichaud
I do want to help you resolve this issue. Making our customer successful is our duty. We can do nothing if we can't find a consistent reproduce step. I already debug the code around this functionality, it works well every time just as my recording. Maybe we can replicate this issue by deleting the wef add-in id folder, but this would not happen in the normal using of excel. We need to find what cause the folder missing or other reasons. We also can't write the defensive code before we find the root cause.

Thank you.

@penglongzhaochina in your video you are re-installing, which we all confirm works to resolve the issue, but it is not a workable resolution for users.

I am suggesting having a check on the loading of the addin - every time - that will make sure the folder structure and files etc.. are correct and then call for the functions to get loaded.

I feel like it is something in the core OfficeJS code is causing this to happen, which has come out within the last 3-6 months.

Hi @demyte, @gmichaud

Thank you for your suggestion and effort here. Your suggestion are very valuable. We will dive deep into this issue based on your suggestion. Will let you know once we have any progress.

Velixo internal tracking item: https://3.basecamp.com/3496385/buckets/16689259/card_tables/cards/6462474487

@demyte THANK YOU for suggesting this, yes this is a desired resolution for us as well.

By the way, in our experience, the stuck #NAME state has exacerbated recently indeed, but it may be just the fact that our user base is growing. Otherwise, we had reports from our internal QA of files stuck in the #NAME state from as early as the beginning 2023.

@penglongzhaochina in your video you are re-installing, which we all confirm works to resolve the issue, but it is not a workable resolution for users.

I am suggesting having a check on the loading of the addin - every time - that will make sure the folder structure and files etc.. are correct and then call for the functions to get loaded.

I feel like it is something in the core OfficeJS code is causing this to happen, which has come out within the last 3-6 months.

Hi @demyte @gmichaud @wh1t3cAt1k

Thank you for your patience. One-click of your add-in from "My Add-ins" will make sure the folder structure and files are correct and then call for functions to get loaded. This operation is as light as add-in loading. Please have a try from your side to check if it can resolve your issue.

@penglongzhaochina thanks for the update.

Is this going to work even if the addin is already loaded, i.e. here?

image

@penglongzhaochina thanks for the update.

Is this going to work even if the addin is already loaded, i.e. here?

image

Yes, One-click from here will resolve your issue.

@penglongzhaochina ok, thanks - I'll add it to our support documentation for users to be referred to.

My hope is really for this issue to go away all together, is there a reason that this cannot be done on add-in load every time, if it such a lite weight operation?

I am pushing for this because when a user encounters this, they are forced out of their normal usage, it is confusing for them and in general reflects badly on our product.

Also, the user may not reach out to us immediately for support and may waste time trying to find a resolution on their own and just get more frustrated.

Please consider making a complete fix for this.

@penglongzhaochina first of all, thank you for providing an easier workaround. I will let our support team know about it!

That being said, I completely agree with @demyte and we should recognise this for what it is, just a workaround...

From the perspective of our clients, the UX of completely randomly losing custom functions in the file, seeing #NAME all over makes non-technical users panic, and produces an impression of an unstable and broken product. It definitely affects our ability to sell and renew our service.

To reiterate, our add-in has thousands of unique monthly users, and we already have gotten (literally) dozens of support requests around the #NAME issue alone... We will do our best to teach the existing customers this workaround, but with such numbers and especially as people transition from our legacy COM offering to JS we will continue getting support escalations about this.

I strongly concur that Excel should be able to auto-recover from this situation even if it is yet difficult to understand the root cause of why it happens in the first place.

As @demyte mentioned, a relatively simple check-and-recover-if-needed on start-up would allow the user to have a perfect UX without disruptions.