HalcyonGrid/halcyon

Centralized Configuration

appurist opened this issue · 27 comments

I was going to reply in #99 where the discussion started but as that is closed and off-topic for the issue, I'm adding this issue here now on behalf of @Vinhold so that there is a place for the discussion.

Vin wrote:

I would also suggest that the idea which had been started partially a couple of years ago - to put in the process order to use the command line options to provide the same override options as is in Halcyon.exe to get Halcyon.ini file location and any local override copy or use option to get it from a web file.
I would very much like to see all the configs become DB driven web page source to give to the services as needed, providing one centrally managed configuration source, and use any actual ini files as alternate options.

This issue is for discussion of methods to provide configuration settings from a central source to both grid services and region servers.

My comments:
I love the idea of fetching/pulling (or pushing) region-specific configuration information such as port numbers and db/rdb settings. However, none of that communications code should live within region servers or grid servers unless it is being pulled from a Halcyon database table, not an external website. That said, the process that launches a region server (or grid service) could fetch the information and provide that to the servers in a number of ways. Each method has pros and cons.

I don't like the core servers fetching the info directly, because it is a dependency on an external process that may not be in place in simple cases, probably resulting in timeouts or errors or warnings at startup even in the case of a normal, simple run. I would much rather see the simple case where you double-click Halcyon.exe and it starts cleanly and quickly with no dependencies on external servers for configuration.

I think my preference is to do it like other DevOps installation, to have a core set of common configuration settings in a file, but then instance-specific (e.g. port numbers) provided by environment variables. This is a common DevOps approach for servers and microservices, and has good support in online hosts as well as Docker/Rancher, Kubernetes and others. It also works just as well if a region or grid service is restarted by a third party, such as if we provide direct support for running as an auto-started Windows Service or Linux daemon process. And a simple restart of the EXE from the same environment works.

I don't like the command-line arguments because it means if the server is just restarted, it binds a successful server run to a process that fetched the configuration, which probably wouldn't be applied on a restart. Now if that fetch wrote the new configuration in an INI file or similar, then a simple restart would work, although it may have stale configuration if the central location was updated if the fetch was not performed.

I do support a short-term addition to enable this process by adding support for the current/same command-line arguments to the central grid services. I was looking into this today. Because the code is mostly already there and any review of a better approach may take time.

In the long run though, It may be worthwhile reviewing the Halcyon.ini (and central services XML files), as well as the region.xml files in the Regions subfolder, for a new simpler approach. I'm thinking something like this:

  • all servers (region and central) search the current folder and parents for a halcyon.ini file, using whichever it finds first) and this one halcyon.ini file includes common settings used on most grids today.
  • then we define a new set of per-instance variables and values, in some modern format. This could be INI files, json, yaml, etc. Although I'd actually suggest INI format in this case because they are simple, and already in use on Halcyon servers. But any XML should be replaced.
  • Ideally we could even support per-instance variable definitions in the region/server configuration of the previous point. It is fairly trivial to do this without generalizing it into some kind of language, e.g. text = oldtext.replace("{{external_port}}", externalPort) and common today.

@appurist has given some interesting possibilities here that my brain is just too tired to think about tonight (I read too many lines of code today, I work on a Sunday who would have thought it possible), So I will have further thoughts on this when my mind is not so tired and I have had a chance to think this one through a little bit. However, what end result is decided, we need to remember to keep in mind that world owners setting up the platform to power their worlds should be able to expect a configuration process that is simple and painless. Additionally, configuration files such as the UserServer_config.xml or Halcyon.ini should not ever be in the bin directory. Ideally, these should be in a Config directory outside of the /bin folder to ensure that when deploying updates, the world owner doesn't accidentally overwrite the configuration files.

Maybe the solution to this is something similar to a control panel with a User Interface that works both on Windows and Linux that can handle the configuration and launching of the servers so that we aren't messing with configuration files.

At any rate, I will have more thoughts on this once my brain isn't so tired and I have had a chance to think about this one further.

@emperorstarfinder That is in fact one of the two key points as I see it, simplification, with more defaults or at least "standard config nobody ever changes" (presumably in halcyon.ini), and having some method to provide the commonly changed and per-region configuration settings through some mechanism other than dumping them into that same halcyon.ini, or a weird region.xml file.

Getting a little more specific, one idea I had (I started with this and got a bit off-track) is to have a second file like a region.ini that included only included the core db connection string, and the region UUID. With those two, everything else (region record including owner UUID, region name, ports, etc) could be looked up in the db.

So most sites would either run MyWorld or something else to create a region record (once), write the region UUID and db connection info into an ini file, and then start the server executable. It would be able to load everything it needs from that limited info.

For central admin, my original suggestion for fetching the info needed above from a central HTTP server basically meant to optionally run a "prelaunch" tool that ran right before the server launch and fetched the info from the HTTP site, saving it into this file above. So that file would be disposable, overwritten from the central HTTP server on sites that used that, otherwise a simple site would just leave the file there and use it repeatedly.

I think most halcyon.ini options don't change much, and a single one should be able to be shared across a grid. It's the connection info and unique region UUID that would then define the region-specifics from the db.

And taking that even further: once a region has db connection, it could remote-fetch the (former?) Halcyon.ini config. Since a region needs a db connection to start, we could basically say these two bits of info are the only ones needed to start a region. No halcyon.ini files, just region.ini with two lines.

I should also point out that a "prelaunch" tool wouldn't be needed if it only updated the region UUID and db connect string, unless the db credentials changed. The idea is to store the rest of it, the stuff that rarely changes, as region options in a db table. No command-line options needed, not even environment variables, although one for region UUID and one for db connect string would be idea, then we don't even need a 2-line region.ini file. The db becomes the central server, and MyWorld and other management tools could just update options there.

I realize this is way different than one is done now and probably not completely coherent from me yet.

@appurist Sorry it took me a little longer to respond. Got to love the day job and all the development headaches that go with it as I am sure you know all too well.

I think we might be on the right track with this idea, though I think I would want to see it flushed out more in idea formulation terms before putting the character to code. Right now there are configurable (such as the physics and scripting engines) that probably don't need to be overridden and probably are safe to just be coded in as predefined or hardcoded. There might be times that a change is necessary but that would be better done after very thorough testing to ensure we aren't breaking things in-world on a grid.

Another possibility for a "Prelaunch" tool would be to allow for the ability to check for platform updates and "installing" (for lack of a better word right now) of those updates into the local copy of the platform that runs the world.

I did review the OpenSim.Grid directory last night in my local copy of the Halcyon code and did start doing general code cleanup and added the wonderful curly brackets to the statements they belong in, cleaned up comments, and removed commented-out code that is just not necessary any longer, etc. I am just waiting now to see what my security scanning tools return for security results. But ultimately that stuff doesn't change the code which I think ultimately the entire server console code (i.e. UserServer, GridServer, InventoryServer, MessageServer, etc.) could use a full refactor to be much cleaner.

To expand on my comment to Appurist - The idea was to simply add into the three grid services the same startup command line processing that Halcyon.exe currently has to handle some command line entries. Specifically the path to where Halcyon.ini is to be found and where a local copy override file is to be found. This will allow one location on a network drive to be located with a local assignment for that server. The expansion idea would be to read a URL from a local setup Halcyon.ini which would have a URL for website configuration loading that it would have gotten from a local copy of Halcyon.ini.
This way web based management tools would handle the configurations by which server was asking for them, as I am currently doing for region startup to provide the region.xml data that was in a local file.
My current management operation provides very nice handling of servers in the collection, their external IP, internal IP addresses and port list assigned. Using the option of unique port assignment across all servers makes the port assigned to run a region in, is also the index to which server its assigned to. The end result is you can change which server you want to run a region by merely assigning the port number and starting it up. Need to change a region to a new server? No problem, close it, change the port and start it up in the new server. A fast change.
There is minor consideration in regards to Appurist's thought related to non-existing dependencies - like trying to read its info from the DB, or looking for a network mapped drive for a file that is not there. If the mapped drive was not there, the management program would not be telling it to start up. Also the DB that may have its config info should not be the grid DB, but in the management programs data space only. Keep clean separation between what is an external control process which anyone may create and the DB that the grid operation depends on. That way updates in either system does not wipe out the other one. I maintain and manage all the MyWorld website and grid management programming using a separate DB from the world one. Talk to both as is required, but my config settings remain in a separate DB from the world one.

For my purposes, the simple expansion of adding the command line options that Halcyon.exe has to the grid services would be a huge bonus for management control. Setting a URL to get specific service data that is currently in the xml files would be an even greater bonus!
I note also that each of the three grid services User, Grid, Messaging each has the DB String connection in it. Why was not that DB access used for the connection to save the shared friend edit access? Or is that because the grid services might be able to run multiple instances on different servers and have a local DB access vs. the Users DB Access? Thus required to have to places for where a DB may be defined? If so this whole possible map needs to be spelled out somewhere and how its to be done.

@emperorstarfinder Absolutely, I think this is significant enough that it probably warrants something more like an actual RFC doc with specific proposals more than the narrow focus of an Issue trying to resolve a specific problem. And there should be general agreement before any code is written.

@Vinhold I understand the request related to making the command-line options for the INI location uniform across the startups to allow a single halcyon.ini and I am proceeding with that as a separate PR. It's not as trivial a change as it sounds or I would have completed it on Sunday, but I do intend to proceed there.

But I see that as a specific incremental step on a longer path that would probably supersede and perhaps even retire that whole concept. I'm not referring to incompatible changes in the future, just that we have a chance to perhaps eliminate halcyon.ini for the most part (making it optional), or use it mostly for some more obscure edge cases, in the long run. For now we need it, it is the primary configuration tool. But I think we all want to simplify the configuration here, while increasing flexibility and the ability to manage it centrally in the more serious production environments such as MyWorld installations.

What I don't like about the current MyWorld setup is making the Halcyon code directly responsible for contacting the MyWorld website to fetch configuration info. In the short term, anything else (such as the prelaunch tool or script I mentioned) can fetch that data and generate an ini config file for Halcyon, especially if it's only stored in one place. In the longer term, I'd like to see that data in a central Halcyon-supported db table that a region can use even without a halcyon.ini file. That would mean MyWorld could for example write new records into this central table when a new region is allocated, and the region would automatically detect what MyWorld provided when it starts up. All the region would need is to know how to access the db, something it currently also needs to know. We'd just be trimming down the number of configuration settings needed (just the db access info and the region UUID).

So, after reading the last comment, I think I am going to have to concur with @appurist relating to the platform being made directly responsible for contacting the website to get configuration information. The MyWorld approach to fetching region information is similar to the option that some OpenSim frontends do as well and that to me does create some dependency issues on the website that I am not really in favor of. Though being able to do some region management such as restarting a region via the website is always a good idea.

I think we could better accomplish this by having the website create the region and insert that information into a central DB table with its configurable options. Then the Prelaunch application (or potentially something in halcyon itself) can read the database and spin up the number of region servers necessary and assign a region to each region instance. It could be possible this way because you would still have the platform knowing the database contact information. The only potential issue here would be if we are talking about region instances on multiple VMs unless we come up with a way to allow a means to be able to trigger something in each VM that then could fulfill the directed number of region instances per the instruction of the central service.

This is part of why I said I would want to see this better articulated and in a more full and complete way before we even begin considering how to put the code to file because it is important to get it right the first time. Plus better documentation is never a bad thing. @appurist I can be reached via discord if you care to get into the more technical weeds on this that maybe would be too confusing to put here.

Why wouldn’t the region servers be set up to only need to know the URL of the grid server? The grid server then can then be responsible for returning a configuration object that includes databases access info etc. this way the region server doesn’t even need a locally stored uuid.

Eventually it would then be good if the grid server could push a changed config to the region and have that applied live or with automatic restart.

Knowing the URL of the grid server to fetch info from there would work as well, however it is far more complicated to do it that way and creates several new problems. First it injects another hop and central component service only to obtain the database URL and credentials. The alternative is to continue doing what we're doing now, which is using a common local configuration file which (would, this is the change) provide a very very minimal amount of configuration, a very minor subset of the same info it provides today. The ultimate in factoring the config data to the minimum set.

More importantly, using the grid service to relay credentials also offers new security vulnerability possibilities by having a server which (for the first time) would answer a query by providing full db credentials (that is something none I would avoid at all costs). While we can work around this by limiting client IP ranges on the MySQL server, I really don't like that this becomes a new requirement of the Halcyon software. This is also a significant change from the current status quo.

Regarding the region UUID, the halcyon region server starting up needs to identify itself somehow. If it is not the region UUID, then it is some other form of instance ID, which from my perspective is avoiding the thing that the region UUID was designed for, to identify a region with an unchanging ID. Everything else, including region name, coordinates, port assignments etc are subject to change, the region UUID is the one thing that identifies a region and realistically cannot change. So it is the ID that the region holds up when it checks in (to both the grid service and the database) and says "what do you have for me today".

My proposal here is that we reduce everything else present in the region configurations to only contain possible overrides (or eliminate them if we conclude an override was for historical reasons and no longer valuable) and just fetch the info needed from a new table or additional fields on one of the existing region tables. If it doesn't belong in the regions table itself, I'd personally suggest a new regionconfig table to match viewer regionsettings etc. but this one would include things like the ports to use. However, the regions table already includes ports as well as other settings so it might only need a couple of columns more.

And probably the inverse of what you're suggesting, either a column for centralized service URLs, or perhaps a separate table since those config items are probably just one singleton record since it's probably the same info for all regions. The regions table already has an entry for region's asset server URL and User service URL, as well as the ports, so that might even be enough to leave this configurationless, if the region just knew the db host and credentials, along with its own UUID.

Take a look at:

https://www.hashicorp.com/products/consul (or vault if you are saving security information)

or

https://etcd.io/

Either play nice with C# and It's the way most modern service based infrastructure is deployed. Ideally you wouldn't only read values at startup time. You'd do those that made sense in real time so you can tune/change running server. Halcyon is a ways from that but it shows whats possible.

I use Vault with my daytime project and yes it's a good choice for secure storage of credentials. In this case though, it's not really a matter of needing secure storage, unless the instances were generic (all the same), and that brings us back to the matter of identification. Either a region instance needs to know which region it is, or a central service deduces that from some form of context information. I don't think the latter is viable so there is at least one piece of identification configuration to store at the instance.

Again, this is a really simple problem to solve, one that has already been solved with a combination of region.xml and halcyon.ini file (and other xml config files), but the suggestion here is to use the same method -- a local config file -- but to simplify that even further, optionally allowing it to be reduced to only two entries. The remaining configuration can be bootstrapped from those two items. It's not very complicated; it's just removing (or at least making optional) all the obscure current settings, and factoring the required ones to remove redundancy.

I think its safe to say all of the items in the current Halcyon.ini could be added to a database table called halyconconfig and then the Halcyon.ini could just handle the settings from the current UserServer_config.xml, GridServer_config.xml, MessageServer_config.xml, Whip.cfg in one file. The only things you would then need really is an independent Region.ini file (if we were to keep that) would be as follows:

RegionURL = '' // URL of the region info created by website

ConnectionString = '' // Database Connection Info

SecurityKey = '' // Secret Key that is currently in Halcyon.ini and the xml config files for the grid services

I think this probably would be the best way to do it because there are likely config settings that really aren't needed and much of the current Halcyon.ini configurable options really don't need an override if they are properly tested and known to actually be working correctly. The need of overriding those (if at all) would likely be rare.

This way grid owners could then set those configurable via a website front end's admin panel (i.e. MyWorld, another website front end, etc.) while making sure halcyon itself is only getting that info from the DB and therefore isn't really reliant on the website itself to get that info from the DB.

If we don't want the region consoles contacting the website backend for the region info then we would need to discuss a means for the GridServer to send a trigger to provision each region console and assign a region to it. Here though we would need to account for the fact that regions could be on multiple servers or VMs so you would really need some sort of crash cart style approach for that trigger to be sent to each server to spin up the regions.

One alternative I hashed out some parts of with Vinhold this morning over breakfast is as follows.

Assumptions and definitions:

  1. Simulators are distinct from regions due to the need to be able to have multiple simulator versions available and be able to dynamically move regions from one to another and back again.
  2. Simulator: the code that runs at least one region.
  3. Region: the virtual space located at a specific changeable coordinate on the grid, and identified by the region UUID.
  4. Due to a limitation in the PhysX implementation, and preference, a Simulator can only support at most 1 region at a time.
  5. Secure information should never be communicated across the network. The only exception to this is possibly secure information stored in the database and directly accessed by the simulator that needs it.
  6. "core" database is the main central database, as distinct from RDBs.
  7. While I don't like CHAR(36) for UUIDs, leaning instead to binary 16 UUID fields, the database is still MySQL 5.7 and some IDs may not always be UUIDs in a given use case even though the field was designed for it to be so.

Then I propose for simulators and regions, based on the work done above by all of you, and my discussions with Vinhold, the following. Please analyze with a fine tooth comb. I am not covering the cases of the other services just yet.

  1. That we require Simulators to have a unique ID distinct from the regions, stored in the Halcyon.ini under the key SimulatorId
  2. A new state, or a repurposed old state if available, is used in the Halcyon boot up state machine: Waiting On Region Config
  3. A new core database table is added sim_config that uses the simulator_id CHAR(36) NN as the primary key, and contains additional fields such as, but not limited to:
    • host_address VARCHAR 255 NULLABLE - to contain the current IP or domain name of the server that is running the instance
    • host_port INT NULLABLE - to contain the current port number of the simulator.
    • any additional fields needed for configuring a simulator but not the region that runs on it.
  4. A new core database table region_config is added that uses the region_id CHAR(36) NN as the primary key, with the following additional fields. It could be argued for putting this on the RDBs, but I think that's too complex for a low-impact table and prevents having a nice clean FK relation.
    • simulator_id CHAR(36) NULLABLE FK sim_config.simulator_id ON UPDATE CASCADE, ON DELETE SET NULL
    • and all the fields needed for defining a region.
  5. When the simulator reaches the Waiting On Region Config state it attempts to insert a new record into the sim_config table.
    • If the insert fails it compares the values in the DB to what it wanted to insert for the host_address and host_port fields, and FAILS boot if they don't match - this condition means that the administrator failed to update the INI when copying files.
    • If the insert fails and the values above do match, the simulator configures itself using the values from the database.
  6. The simulator then attempts to find a record in the region_config table that matches the simulator_id.
    • If there is more than one record, choose one. It could be argued that it should fail with a message and crash.
    • If there is no record, the simulator should wait for some time (5 minutes?) and restart at the top of this state - step 5 in this list. This is basically polling the database, waiting on a record.
    • If there is exactly one record it chooses that record.
  7. At this point the simulator and regions settings are loaded and the region continues with normal boot.

With this set up in place an administrator doesn't need to set up any new or special services on their region hosts to handle adding or removing regions: on physical hosts all expected possible simulators could be created, given unique ID, and started. They will then go into a boot-wait state waiting on region records to be assigned to them. Grids using cloud-based services could do likewise, keeping a selection of hosts "hot" and ready for grid expansion, adding new ones only when the buffer of unused simulators gets small.

Adding a new region to a grid is as simple as adding the record to the database and it will boot automatically within a few minutes.

Removing a region is little more complex: set the region_config entry's simulator_id field to null and then send a reboot command to the simulator it was pointed at.

Moving a region to a different simulator, whether to move it to a different server or to change versions: Change the region_config entry's simulator_id field to point to the other simulator then send a reboot command to the simulator it was pointed at.
This has a possible race condition where the old simulator's region could still be registered with the grid server when the new simulator tries to register the region. If I understand the process correctly this will simply result in the new server rebooting and trying again, at which point enough time should have passed that the older simulator has deregistered itself. If that's not the case, then it's probable that the new simulator will simply override the old and the only remaining problem would be if the old server hadn't finished serializing to DB before the new simulator tried to deserialize from DB. I don't have a nice solution for that possibility, but I also don't remember enough about the systems involved to know if it's even possible.

Since the load on the server of a region that's sitting in the Waiting On Region Config state is fairly small, it's also possible to load balance servers by simply choosing to move off or not add regions to a server that's got a heavy region taking up the resources without actually shutting down or removing the simulators.

It could be argued that the sim_config and region_config tables could be combined, due to the Halcyon limitation and preference of 1 region per simulator, and making the region ID field nullable. However that makes moving the region from one host to another a bit more laborious, and isn't as clean. It also means that if that limitation is ever able to be lifted the database is ready for it.

I like the idea of having a separate identifier for a deployed instance, although it feels a bit like the blending of two concepts. The (current) region UUID uniquely identifies a region contents and its attributes, the prims rezzed in the region as well as the name, owner, position on the map, etc.

The other concept of simulator, if separated from region, is not very different from deployment target. This is like the generic slot in which each server is subdivided for stacking purposes, and I believe we referred to one of these as the resource, subdivided into slots, so a combination of a resource (server) + slot (allocated subunit) defined the execution host location/allocation.

The other comments above aren't really getting into identifying deployments and those issues; they are only related to startup/execution environment. How that is provided is probably a whole separate topic, may depend on the management software. The mapping of deployment IDs to regionIDs in indeed a separate thing, so separate that I don't know if it's Halcyon's place to define that. There's no real reason for a region to know where it is running.

When the simulator reaches the Waiting On Region Config state it attempts to insert a new record into the sim_config table.

I think what this is effectively saying is that servers would auto-deploy themselves, if a sim_config record existed for them. I'll have to give that some thought but it seems inverted to me (polling-based rather than event driven). My first impression is that the creation of new regions will be tied to the external billing system and should be part of the management software, not Halcyon, as something is still going to need to allocate the machine it starts up on and provide the installation files, then add the sim_config record, and since these directly cost money it's not something you want pre-allocated and unused.

But this is a bit philosophical, and may be almost entirely related to my preference for business models that fire up paid customer products only on demand when an order is placed, rather than bunched together in larger blocks that act like hotels that will almost always be below 100% occupancy. The latter can be more efficient, especially in times of growth. But they can be also be problematic as holes begin to appear and regions are spread across multiple partially-filled servers. But I also understand though that part of the goal above is the attempt to make placement independent of the region, to allow easier reorganization.

There still needs to be some kind of external management software actually allocating these simulators and assigning regions to simulator IDs. I don't think it matters to Halcyon how that happens, and it should probably be independent of Halcyon. I don't really see why it couldn't or shouldn't be completely separate from a region startup. Maybe a direct question to clarify this for me here: Would Halcyon code be checking the state and recognizing this Waiting On Config state and behaving differently? It seems to me that this is part of the management tool; that it shouldn't be invoking a new Halcyon.exe until a new config is in place for it. And then it uses it's defined region ID to know what to load.

From what I understood the goal of this issue from the first post was to find a way to centralize the configuration for Halcyon servers. It follows therefore that we are discussing, albeit on the fringes, an aspect of deployment and management as the only reason to centralize configuration is for increasing the ease of deployment and management of nontrivial grids.

The current INI system has the concept of a common INI and a local INI which provides overrides to the common one. The current region XML system allows the XML to be pulled from any URL. Note the disparity of features. This is I believe the source of Vin's complaint: the XML can be generated from his MyWorld management system, but the INIs cannot. I personally don't agree with pulling config from a website, especially one that's public on the internets, but that's not my decision - the feature already exists for region XML.

My additional comments past the proposal block were to demonstrate the utility of the concept and clarify why I was making choices I was making in the proposal.

My first thoughts on how to get a region online was to go event-driven as I dislike polling. However to do that requires a server that's responsible for emitting the event. Databases don't do that. That is why my first proposal to this topic mentioned the grid server: it seemed like the logical unit to do the job.

As to "preallocated and unused" - that's the nature of hardware servers: it's either unallocated unused or preallocated unused - I like the latter better myself. When it comes to dynamically allocated cloud instances that's another story. However while my proposal allows for the preallocated unused condition, it doesn't force it: the grid implementers can do as they wish.

Your final comments remind me of an assumption I forgot to mention: minimizing the need for advanced management software on the region hosts. It seemed natural to me that if the config was moved to a central Halcyon-managed location, whether delivered from the DB or from the grid server, that the region could then handle the case that the config wasn't there yet - and the choice was either to crash out and be rebooted automatically by the OS, or other tooling, or just go into a wait state that doesn't churn disk and CPU quite so much.

Yes, an external tool could be made that watched for region config and then start up the region - but at that point the same tool could download and prep INI and XML files, inject them into the region then boot the region. This issue and Halcyon would play no part in it. However such a system would be very fragile: anytime Halcyon did anything that touched configs the tooling would also need to be adjusted. Not really our problem directly, but still.

I have been very busy working on the region host servers and rebuilding some this last week. So I have no had any time to add to this conversation.

The discussion I had with Ricky was about making it clear the distinction between the simulator (part that exists when executing the Halcyon code) from the region which is defined by the contents of the region.xml data linked to the content persistence tables. We inherited from OpenSim the idea that the simulator and a region were the same things, but that was not valid. SL also evolved over time from that idea to a clear separation when they went from a manual region definition to one where simulators run regions and set up went from hours of work to merely filling in the information of name, map location and region type and then assigning it to a simulator to be brought to life. I have simply found a path in existing Halcyon setup to do the same thing.

Halcyon.exe has two powerful features on the command line: Where to find Halcyon.ini and where to get an optional override file I mistakenly called region.ini. It should have been called simulator.ini and that may be changed later. Because it is simulator specific information overriding what is in Halcyon.ini that identifies the simulator when the world owner tells it to start up. The override ini also exploits the option to get the region.xml data from a website instead of a file. This made it possible to change from a fixed file format to a database driven management control providing to the simulator the region data to run. It also made it possible to rapidly change simulators and servers to run a region in by simple port number assignment on the management side and start the server simulator up to run it.

We must not lose sight of the world owner as the manager of the world. Keep the simulator simple and fast and allow the management to be external - however it may best be applied. Please also note that Halcyon uses the internet AND an internal network for communication. That internal network I am also using to access the website which will work equally well by internal IP address as well as the domain name. I suppose if there was some internal DHCP process, an internal name could be applied to be translated to the internal IP. So that configuration data is not exposed to the internet. The installations I have set up use a 10GB internal network that is 10x times the speed of the internet connections.

It is world owner management that controls what the simulators do and how they are set up, when they are started and shutdown - the one person who defines when new hardware servers are to be added and more Halcyon simulators created in them. That person determines when a simulator is to be started up and what region is assigned to it. This cannot be automated by Halcyon itself from somehow self starting and going to look at a DB table for what to do. Besides, using a database as a communication method is very slow and consumes a lot of resources while processes wait for something to do. There are many ways to run a world that are possible, and I have taken the path that makes management of the world easiest to do with some very powerful features that were simply not possible with the historical management methods. I have applied these concepts to world operation in this way only because someone put in the option to get region.xml from a web page instead of a file. The brick and the 2x4 made advanced civilizations possible, and so are the command line features for Halcyon.exe. This discussion is about providing the same control inputs to the grid services, which would allow control of them in the same way that the region simulators can be handled. Rignt now they are manually setup with a lot of that data duplicated in Halcyon.ini. The DB connection strings are important only if you can run multiple copies of the grid services acting as multiple gateways into the world, handling clusters of servers. The configurations needed are still files to define what they need to know on start up. But command line directions to find those files allows instructions on where to get additional information from either a web page or how to find it in a database table. It would be very nice to have that information as external controlled sources for world name, money designator and other related config information handled by the management website or any other management programming external to Halcyon.

he MyWorld website world management control system is based on making world management and expansion very easy to do. I have the simulator host servers set up as VMs in VMWare, with several VMs per hardware server based on its available resources. The simulators are assigned their port number in the region.ini as well as defines the URL to the webpage to get the region.xml data. My next level of process will be a services management service that would be set up in each new VM region host server, and it would set up and create the simulator installations and manage their start up and shutdowns as well as restarts based on commands from the website management tools. All of this must be external to Halcyon and not bloat them with only one possible answer to how management has to happen. Right now the problem is maintaining many copies of Halcyon.ini one per each region host VM server. I could just as easily point to a shared network drive for the common Halcyon.ini and use override ini files for each specific local data per server. This would provide the local server IP addresses and RDB connections for the simulators. Oh, that points out another problem: changing region server assignments will require relocating the region persistence data to the new RDB location. Not a good situation! That will need to be rethought out. There are several ways that can be handled. I have just reached the point of working out how to implement the RDB system.

My first thoughts on how to get a region online was to go event-driven as I dislike polling. However to do that requires a server that's responsible for emitting the event. Databases don't do that.

Typically a human or automated process is the event. For example, I don't think a region is going to be started up on its own without someone at least specifying the region name. Someone places an order on the (separate) web management site, or an administrator has pressed the New Region button on some admin page to create a new region, or an automated process of some kind has determined a new region needs to be moved from one resource to another (e.g. due to load rebalancing or something similar). There is always some specific (external) event and that is the code that could just add a new region to the database which either has resource indicators (machine 12, slot 4) and pings the management tool on that machine, or is autodetected by a simulator waiting to start up, as in your case above. Either way, there is an external event to add a region (or it could be to remove one, or move one, which is both add and remove). Something (management software) added the record to the database, and that same thing could invoke something to allocate a new instance.

In my preferred model, whatever added the region record to the database could also invoke a cloud API to create a new region with the region ID passed in, so the new simulator could just start up and find its own configuration in the database. Yes if there was already an instance waiting for that resource slot, it could slow-poll for this in the database and start once it detected it had a record configured for its slot. I think my earlier point was that all that is needed is the db credentials and a region ID, and even in this case, with a preallocated simulator running, the "preallocated slot ID" could in fact just be the region ID, and once it sees a record appear for itself, it allows the startup to proceed, with the config from the record.

Yes, an external tool could be made that watched for region config and then start up the region - but at that point the same tool could download and prep INI and XML files, inject them into the region then boot the region. This issue and Halcyon would play no part in it. However such a system would be very fragile: anytime Halcyon did anything that touched configs the tooling would also need to be adjusted. Not really our problem directly, but still.

To be honest, I don't know why a region would ever need to pull an INI or XML file from some external web server. That's not where we store the data, and web management software can easily write records into the database and trigger events with a POST to some URL. The fact that some feature exists that is not (ever?) used in Halcyon in any existing grid doesn't change that. This is a major update and new methods do not need to mimic existing ones.

Yes an external tool could be made to write the region config when a new region creation was triggered. It would probably actually be the website where the XML would have been pulled from in your description. I don't know why there's an assumption here that it would poll this, or something needs to monitor this other than the simulator instance starting up. In my view, simulators should just start up with no dependencies other than knowing which region it is and how to connect to the database to get the info for that region. Anything more than that is inviting multiple cooks into the kitchen.

With this approach, the main thing remaining is how the instance gets allocated in the first place, and for a cloud instance, it's clear how that would happen (REST API call from the web-based management software), and the region record would already be in place. If it was physical hardware, some human would be requesting this be added to the rack and in some way providing/copying over the initial installation (which is effectively same thing as cloud startup but less automated). I don't see why the simulator or a startup script couldn't just auto-generate a region ID and the startup script just invokes the web management page to let it know the region ID of the newly available simulator.

Having Halcyon core poll an external website itself here, plus the database, for configuration info, seems redundant and a bit backwards (at least two cooks in that kitchen).

The discussion I had with Ricky was about making it clear the distinction between the simulator (part that exists when executing the Halcyon code) from the region which is defined by the contents of the region.xml data linked to the content persistence tables. We inherited from OpenSim the idea that the simulator and a region were the same things, but that was not valid.

@Vinhold, OpenSim has (and as far as I know, still does) supports multiple regions per simulator. Halcyon does too, except that the PhysX physics design is limits itself to a single instance so there can only be one region with physics, thus Halcyon has (since the PhysX implementation) been limited to one region per simulator. However that is not something we inherited. I don't know why you're mentioning any of this unless you're suggestion we change this back, to support more than one region per simulator? That would require major physics work.

Your next comment suggests this is related to being able to automate the allocation of new regions, and that this is somehow related to the INI and XML region file. The most direct way to simplify automation of these files is to eliminate the need for either. Or at least, reduce it down to a single file with a region ID which can be looked up in the database and the host/credentials needed to connect to this database, which is the point of this issue.

This made it possible to change from a fixed file format to a database driven management control providing to the simulator the region data to run.

A more all-encompassing version of this is what is being suggested by this issue.

It also made it possible to rapidly change simulators and servers to run a region in by simple port number assignment on the management side and start the server simulator up to run it.

Overloading the use of a port number to identify a region may work in your setup but these are very independent values and in fact every machine could be using the same ports, as was the case in InWorldz and Islandz and many other grids. I wouldn't want to see port numbers become some new kind of region ID, however the web management software could maintain it's own relationship between these when writing a new region record into the Halcyon database, if desired.

We must not lose sight of the world owner as the manager of the world. Keep the simulator simple and fast and allow the management to be external - however it may best be applied.

The world owner has access to the database, and the management software, and simplifying the simulator by reducing the number of data sources for simulator configuration is the primary objective of this issue. Fetching from an external third-party website is adding complexity, possibly delays, error conditions and security concerns.

It is world owner management that controls what the simulators do and how they are set up, when they are started and shutdown - the one person who defines when new hardware servers are to be added and more Halcyon simulators created in them. That person determines when a simulator is to be started up and what region is assigned to it. This cannot be automated by Halcyon itself from somehow self starting and going to look at a DB table for what to do.

Exactly. This is the external event that contacting a third-party website would need to poll for. Instead, when that one person decides to allocate a new region (or a customer orders one), that management software could simply write the new information into the region-related record in the database, where the rest of the region info is. That person is still invoking it, and the simulator only sees the new data when that person saves the new region record. There is always either a human or an automated process that has decided to create a new region, and that's when the record would be written, and optional something else invoked (like cloud instance spawning).

There are many ways to run a world that are possible, and I have taken the path that makes management of the world easiest to do with some very powerful features that were simply not possible with the historical management methods. I have applied these concepts to world operation in this way only because someone put in the option to get region.xml from a web page instead of a file.

InWorldz did all of this too, just more automated than you are describing, with per-machine management tool that was triggered by the web management software and humans initiating operations on the web management software. I understand that you have needed to implement a management solution based on the existing (current) software and its limitations, and you have done very well given the current constraints. This issue attempts to go well beyond that, eliminating the need for these configuration files, moving a small subset of required data to a region-specific record in the database. There is no need to place arbitrary limitations and constraints on an approach solely because that's the way it is done now. This is a pie-in-the-sky look at what could be done, rather than how could we do a small tweak to what is currently done.

The DB connection strings are important only if you can run multiple copies of the grid services acting as multiple gateways into the world, handling clusters of servers.

I don't know what this means; the DB connection strings are almost the only configuration a region needs to start and operate with persistent data. When combined with a region ID, that's everything a simulator needs, because the DB connection string could allow the simulator to query the other info for that region (such as the region name, region owner, estate ID, X/Y grid placement, port numbers, internal and external IP addresses to use, prim limits, etc).

All of this must be external to Halcyon and not bloat them with only one possible answer to how management has to happen.

By having Halcyon fetch it's configuration from the database, the same database your management software must already be using, simplifies things for MyWorld and any other grid management software to come. I don't know how splitting this info, with redundant fields, across multiple configuration sources, helps MyWorld. Wouldn't you rather just write the region name, port numbers, etc into a record that the simulator sees at startup? Rather than juggling region.xml and halcyon.ini and database records?

Right now the problem is maintaining many copies of Halcyon.ini one per each region host VM server. I could just as easily point to a shared network drive for the common Halcyon.ini and use override ini files for each specific local data per server.

Or discard them as no longer needed, and just write a configuration record into the database. Done. Think of the database as your shared drive.

changing region server assignments will require relocating the region persistence data to the new RDB location. Not a good situation! That will need to be rethought out. There are several ways that can be handled. I have just reached the point of working out how to implement the RDB system.

Changing the region location wouldn't affect which RDB it uses. You can decide that at any time, in any relationship you want between regions and RDBs. Just make sure the RDB map table has a record for which one was chosen, and the relationship is completely arbitrary. For growth, for example, you could assign all existing regions to RDB1, and any new regions created could be assigned to RDB2. This is how the asset server scaling worked at InWorldz, with new assets being written to the highest numbered asset server (e.g. 4), and older ones being read from instance 1, 2, 3 and 4.

OpenSim has (and as far as I know, still does) supports multiple regions per simulator. Halcyon does too, except that the PhysX physics design is limited itself to a single instance so there can only be one region with physics, thus Halcyon has (since the PhysX implementation) been limited to one region per simulator. However, that is not something we inherited. I don't know why you're mentioning any of this unless you're suggesting we change this back, to support more than one region per simulator? That would require major physics work.

Yes, OpenSimulator still does run multiple regions per region instance, however, they can be bigger regions now along with the fact they have the option of multiple physics engines. However, I think that is a better discussion for a separate issue relating to how to handle the issues surrounding physics.

The DB connection strings are important only if you can run multiple copies of the grid services acting as multiple gateways into the world, handling clusters of servers.

I am not really sure what that means as well. For all grid and region (simulator) instances to connect to the database and get their information you would in fact need the DB connection strings regardless of how many copies of the grid services your running as gateways into the world.

To be honest, I don't know why a region would ever need to pull an INI or XML file from some external web server. That's not where we store the data, and web management software can easily write records into the database and trigger events with a POST to some URL. The fact that some feature exists that is not (ever?) used in Halcyon in any existing grid doesn't change that. This is a major update and new methods do not need to mimic existing ones.

I am going to agree with Appurist on this statement. There is no need for the region instances to be contacting a website backend only to have that website contact the server and serve up a .xml file for that region instance. This is overly redundant and does add an unnecessary call to the process of starting a region when the web interfaces can already put the information about a region (i.e RegionName, Estate Name, Estate Owner Name, etc.) into the database.

changing region server assignments will require relocating the region persistence data to the new RDB location. Not a good situation! That will need to be rethought out. There are several ways that can be handled. I have just reached the point of working out how to implement the RDB system.

Both InWorldz and Linden Labs use the same approach when it comes to RDB for regions and assets. It is the idea of being scalable that drives the need of using it. As a virtual world grows and expands it has to have the backbone architecture being capable of growing and expanding to support it. So this is in fact not out of line with the standards already being used in virtual worlds that are well established. In fact, we use a similar approach relating to scalability in A Galaxy Beyond.

I might have more on this after I get a few hours of sleep.

Ok. I wrote too much in one comment posted at nearly 1:00am. Lets see if I can clear up some of the points I was trying to get to.

  1. Regions and Simulator Distinction.
    This was not a technical association, I was running several regions in one simulator console with Halcyon before David put in the data tables for the running instance per simulator. I do not recall if that was also when PhysX happened or not. But that local data store prevented running multiple regions per console. The advantage to having one region running in each console had to with simple management when a region crashed it would take out all regions in the same console window. So for operational purposes, it is much better to run one region only per simulator, accessed by its console.
    The definition I am making here is that the region definition is what makes the virtual space which is run by the simulator. The simulator is setup in a server and given its identity when its defined. That requires the ini file to give it the port number or combination of internal IP + port number. I have done it both ways and its much better to deal with port number alone and use the rule that all port numbers are unique in the world. There are plenty of them to use. But that port assignment option is only one of a few ways to handle it. That is my intention: that there is more than one way to do this. As for the XML data that defines what the region info is, that can be put into a table in the Grid DB if you wish, but that has to be assigned to the internalIP+port or use the Unique Port option to identify the assignment. I have that data in the mysite DB in the regionxml table now. Yes its around the barn control, but that was how it was set up using the option to get the xml from a web page. I really do not care if you want to replace the XML with a DB record. The issue remains the same: You have to provide the ini file information to give the simulator its identity at server configuration time. The simulator is started up only when the grid manager tells it to start up, and has the region data already assigned for it. The simulator should never be started up on its own or automatically without having the region assignment applied. That is wasting server resources. Especially if someone uses the AWS type servers where you are charged by the minute for its use. (however its actually done!) That is not that much of an issue for the way I have servers set up. Those are the same price per month if they are used or not. I was not asking for a system redesign, only the ability to use existing Simulator features for the grid services configuration.

I am also using the port number assignment to identify the simulator startup short cuts on the desktop, which passes that to the control batch file (Halcyon.bat common to all simulators in the server) which instance opens the console window and checks the exit code to determine if the termination was intentional command (shutdown/quit or restart) or a crash had occurred and it must end or be restarted. The Halcyon.bat passes its parms on to the simulator instance as Halcyon.exe command line parms. to point to the Halcyon.ini location, and where the override ini is located. That provides to the simulator all that it needs to know about. The critical information is the server internal and external IP addresses and in the override ini file the instance identity as the combination of internal IP + port number or a unique port number, which can be how to find the region data assignment in the table or as I am doing now to ask for the xml data.

  1. The Grid Services Configuration
    This is where the rubber meets the road. The grid services each have their own xml data and Halcyon.ini to get their data to operate. The XML data is nearly all duplicated in Halcyon.ini. That means there are two DB connections for each service defined. Each is exactly the same in the configuration I am implementing. This can only mean one of two things:
    a. The two connection strings were intended to allow multiple instances of each service running in their own servers,
    b. The duplicate connection strings were a result of one programmer not understanding what another programmer had done and the problem was never cleaned up.

So why are the entries in the xml nearly duplicated in the Halcyon.ini? What purposes does the two means of configuration serve that cannot be placed into Halcyon.ini alone or that Halcyon.ini only provide the instance identity and that provides the means to get the dynamic data elements from a DB record instead? That would make control of the grid name, the money identifier, and other world specific information much better handled in the DB? The only information needed for them in Halcyon.ini are the 12charkeycode assignments, the external and internal IP addresses and the port numbers. As all of those are needed identities on start up to provide access to the dB dynamic data.
I am not opposed to simplification of data access and easier management of that in the DB alone. I was only looking for the simple solution that would be faster to use, other than an extensive change in the whole design. Whatever is done still requires me to rewrite and update all running worlds with web programming changes.

  1. RDB Problem I saw while making my first comment. I had in my server plan to set up an RDB MySQL instance per hardware server installed in the world. All simulator host VMs would have their simulators pointed to that RDB instance. This is to take advantage of the internal networking speed in the hardware server.
    The problem happens when a region is reassigned to a simulator that is in a different hardware server, there will not be any persistence data to be obtained by the reassigned region. Unless I have it copied from the first RDB instance to the new server's RDB tables. What I see you describing are instances of RDB servers who are set up on some other commonly accessed server and if there are more than one RDB instance, how are the regions to access their persistence data when they change simulators? The RDB data is I thought assigned to the simulators by the Halcyon.ini, not a region relationship. The whole RDB operation is not clear to me how its supposed to work.
    Currently the region persistence data is all in the main MySQL DB. So relocating regions on simulators has no problems other than loading the one MySQL server with all the data per region at startup and the persistence updates. That is starting to cause some user data access lagging as world gets a lot more data in it.
    Correction: Discussion of RDB operation should be in PM (Discord perhaps) and its documentation and implementation examples added to the Wiki here.