yashmanne/an_analysis_of_nothing

Site Down: TypeError: sequence item 79: expected str instance, float found

Closed this issue · 3 comments

When you run self.scripts.groupby(by='SEID')['Dialogue'].apply('\n'.join)) inside recommender._generate_vectors (line 97),
the 79th sequence item is likely a NaN and needs to be replaced in order to get it working as a str.

However, I was unable to recreate the issue locally.

Update: a similar issue is seen here.
This error pops up only on Streamlit Cloud, which seemingly overrides the requirements of Pandas <2.0 and installs pandas-2.1.4 instead of keeping Pandas 1.5.3.

This stems from a reinstalling of packages because streamlit doesn't recognize that altair 4.2.2 is compatible. Instead, "Streamlit 1.18.0 is present which is incompatible with altair>=5.0.0. Installing altair 4.*" This leads to streamlit reinstalling packages to its own specifications, namely Pandas >0.18, and altair<5, allowing it to reinstall altair 4.2.2 and elect for the newest Pandas 2.1.4.

Proposed solution:

According to this blog post, I should remove the explicit definition of altair ==4.2.0 and instead use altair < 5.

If this doesn't work, I should upgrade streamlit to > 1.22.

Unpinning streamlit works. However, without explicitly defining the streamlit version in our requirements, the app remains vulnerable to future breakage.

Update: Now, we get a new error:

The service has encountered an error while checking the health of the Streamlit app: Get "http://localhost:8501/script-health-check": EOF
Streamlit server consistently failed status checks, 
Please fix the errors, push an update to the git repo, or reboot the app.

This is likely due to an OOM error resulting from poor caching with the session state. Streamlit cloud only allows for 1 GB of memory so we need to be more efficient with it.
More efficient initialization.

Short-term Solution:
Rebooting the app seems to work well but it eventually crashes after a set number of times. A longer fix is necessary.

A similar blog post shows that though the data is small, the code might use more. this can be checked with memory_profiler.

Long-Term Solution:
Pay for Streamlit Cloud premium.