/rca

Shared knowledge base of incidents Root Cause Analysis

Root Cause Analysis

When facing production incidents we usually need quick mitigation to put out the fire and there is no time to deeply look at the underlying causes, treating symptoms and not the problem.

The goal of an RCA process is to discover the real cause behind the incident to fully understand how to solve it, prevent it in the future and have a record of successful strategies in order to share knowledge and repeat things that worked.

Incidents could be defined as events that cause disruption to or a reduction in the quality of a production service or product feature which requires an emergency response.

In addition to discovering the root cause, we should strive to provide context and information that will result in an action or a decision: good analysis is actionable analysis.

The content on this repository should be shared once the incidents are mitigated or resolved so that potential vulnerabilities are not exposed.

In order to perform an Incident Analysis, you can choose the tool that better fits the situation, some common examples are: 5 Whys, Fishbone Diagram (many potential causes, cause & effect) or Change Impact Analysis

Incidents Severity is categorized on the following levels:

  • SEV-1: Critical issues impacting more than 50% of our users (a.k.a “Oh Fuck!”). The incident degrades the experience to a point in which the user decides to drop Decentraland Platform. Requires immediate resolution
  • SEV-2: Critical system issue actively impacting a limited number of users. The users can still interact in Decentraland word but they get frustrated by the inability to live a full experience. Requires immediate resolution
  • SEV-3: Stability or minor user impacting issue that requires immediate attention from the service owner, otherwise it might become a SEV-2 incident. Very restricted incident that is internally visible and should be mitigated as soon as possible; without extended user awareness and impacting non-critical flows
  • SEV-4: Minor issue requiring action but not affecting the ability to use the platform
  • SEV-5: Cosmetic issues or bugs not affecting the users’ ability to use the platform, but it's relevant to give awareness to the other teams

To add new incidents use the date of the event as the Id with the following format YYYY-MM-DD. If there is more than one event on the same date you may need to use a suffix as part of the file name.


Incidents Index

  • 2022-02-05 Wearables not loading on some users backpack due to corrupted dropped wearable
  • 2022-02-12 CDN proxies stopped working affecting the ability to join Decentraland and some sites
  • 2022-04-22 Infura outage caused problem with different Services
  • 2022-05-18 Some issues were detected after the explorer release on
  • 2022-06-06 Social metrics tracking discrepancies
  • 2022-07-12 Catalyst node continuously rebooted after an update rollout
  • 2022-07-25 Cloudflare XSS protection prevented some users to deploy scenes or smart wearables
  • 2022-08-02 The Graph indexing delay prevented users from changing their wearables
  • 2022-08-18 Scenes not loading in Europe region
  • 2022-09-01 Users not able to save passport
  • 2022-09-08 Users not able to see or chat with each other
  • 2022-10-17 Some 3D models not rendering
  • 2022-10-27 Marketplace failed to display NFTs
  • 2022-11-04 3d models from other scenes appearing on other scense
  • 2022-11-13 Users joined to the #mvmf channel noticing huge lags
  • 2022-11-15 NFTs with animated gif thumbnails have stopped showing thumbnails
  • 2022-11-17 Changes in user profile not updating in peer perspective
  • 2022-11-28 Scenes MessageBus not working in production
  • 2022-12-05 Chat & Friends service unavailable
  • 2022-12-07 Some Realms show partial info of others
  • 2022-12-07 Higher than normal crashes on desktop platforms (windows)
  • 2022-12-09 Wrong online users metric on the status page
  • 2022-12-27 SDK Preview doesn’t work
  • 2023-01-05 Desktop launcher doesn't launch correct version
  • 2023-01-09 Users unable to obtain their correct profiles
  • 2023-01-11 Users with many wearables are being shown an empty list
  • 2023-01-12 Some users are not able to list or make friend requests
  • 2023-01-12 Users unable to login to DG realm
  • 2023-02-01 Loading an avatar with Thunder Earrings is crashing the client
  • 2023-02-02 The teleport get freezed for all users using DEBUG_MODE
  • 2023-02-02 Transak widget not working
  • 2023-02-06 Social Service Migration
  • 2023-02-14 Reference client cannot be launched
  • 2023-02-27 Get Friends, Private Chat and Friends Requests not working
  • 2023-03-09-2 Ghost mode in builder
  • 2023-03-22 NFT names not displaying as alias
  • 2023-03-23 Chats showing out of order
  • 2023-03-28 User connections constantly reconnected to the same realm
  • 2023-04-13 Users are not visible in any realm other than Heimdallr
  • 2023-05-08 Some users are not loading
  • 2023-05-23 Cannot log in to Goerli network
  • 2023-08-17 Mic remains open when releasing the T key
  • 2023-08-24 Users are not able to join Decentraland
  • 2023-10-03 Marketplace search not working

Vulnerabilities Index

  • 2022-07-05 Potentially outdated prices provided by the implementation of ChainlinkOracle
  • 2022-07-06 Take over of broken or expired Links
  • 2022-07-13 Arbitrary Modification content stored on S3
  • 2022-07-20 Cloudflare bypass for Biz environment
  • 2022-08-11 Broken access control when deleting single items
  • 2022-08-12 Subdomain takeover of osquery.decentraland.org
  • 2022-08-23 Stored XSS - Execute Malicious Javascript on Victim's Browser
  • 2022-08-28 AWS Credentials leaked in Docker Image
  • 2022-11-07 SQL injection on governance API
  • 2022-11-18 Misconfigured SSO Function Allows Authenticated Access To Grafana
  • 2022-11-22 Dangling Call from wMana