MV3D Development Blog » 2010

January 31, 2010

Storage Space

Filed under: Uncategorized — SirGolan @ 3:04 pm

I’ve been trying to come up with a better way to persist MV3D state info. Basically, I need a solution that meets the following:

Synchronous when saving data

Transactional

Extremely Fast when saving data

Maintains data consistency

Supports queries

Externally available (i.e. you can view/modify/query the data outside of MV3D)

Low CPU overhead

The current solution does support most of those. The four it’s probably worst about are queries, data consistency, CPU overhead, and external availability. It’s also just kind of wacky. You define a list of properties in your class that should be stored along with which of those properties you’d like to generate a search index for. When it’s time to save an object, it generates a uuid for it, then it stuffs all the properties you defined into a special object, pickles it, and then uses Axiom to persist it to sqlite. It uses index objects stored in Axiom which match the uuid with the value of the property in the object. Querying is also a little odd since you use magical properties on a query object (q.a == 12). Unfortunately, the object type you are querying for may not have an a attribute since the query object doesn’t really care. Another interesting part of the system is that it requires all servers to store their data locally. I’m not sure how I feel about this since some MV3D data really isn’t useful to other servers– at least as the design stands now. However, it does require that there be two servers for every item stored in order to recover from a catastrophic failure of a single server.

I’ve tried in the past to go with a completely Axiom based solution, but that didn’t work well because of the restrictions Axiom puts on any classes you mark up as Items that can be persisted. One thing I’d been thinking about was a dual object approach where you have the in memory object, and when it needs to persist, it stores all its persistable attributes into a specific Axiom class. This system could even use powerUps. This had a couple of downsides, one of which being that you had to define twice as many classes. It also ends up that you’d have to define your datastructure twice as well. The other option is to have a utility to generate that extra code, but I’m not a big fan of code generation, so I’d like to avoid that.

What I’m currently thinking of is adding a new service type to MV3D that would generally act like a data access layer (DAL). You’d mark up classes in Python with which attributes should be stored– and I’m thinking of combining this with the attributes that are sent over the network. In order to persist an object that had this mark up, you’d hand it over to this service. It’d then store the marked up attributes and return a unique ID you can use to retrieve the object later. While technically at this point, it really doesn’t matter where or how the data is stored, I’m going to go into that a bit. There would be several low level functions: register schema, upgrade schema, add object, update object, get object, and query. Basically, schema in this sense is the markup of a particular class which defines attributes to store and what type they are. For SQL based stores, this will be directly associated with a table. Schemas will be versioned so that when a newer version is registered, all data is upgraded. I see this happening by renaming the existing table, creating a new one with the new schema, and running code on each element to upgrade it.

After the schema is created, adding an object would just be a SQL insert and updating would be similar. Both of those could return the primary key of the row as an ID. Adding the primary key to the object’s class and the service’s location would give you a method of retrieving that object from anywhere. Querying for the user could make use of the class markup objects to give them the ability to do something like: store.query(Person, Person.name == “mike”).

Looking at the requirements I mentioned, the major ones that it fails are being synchronous and maintaining data consistency. I put synchronous when saving up there because I like to be able to wrap methods that modify data in a @autoStore decorator which stores the object after the method runs. If this were async, either that function would have to return a deferred, or you’d lose data consistency if there was a failure. It would also be sweet to be able to define the markup classes as descriptors which would a) store the object and b) update any network clients of the changes. A possible solution to this would be to strategically limit the remotely available functionality of the service to the low level operations I mentioned previously. Then to make them have nothing to do with converting objects into persistable data. This would require a local object/service to interact with the remote store. The local code could build a queue of transactions to send to the master store, and this queue could be persisted via sqlite. This way, the data would be synchronously stored to disk locally and then sent off to the remote store whenever.

One other issue here is that theoretically multiple servers could access the same object in the store at the same time. Yes, that sounds like a feature, but the aforementioned queue causes some issues with doing this. The data in the master store may not be 100% up to date. What might be interesting is to create a locking mechanism whereby when loading an object from the store, you acquire a lock on it. The hard part will be making sure that the lock goes away whenever you stop using the object and that failure conditions (such as your server crashing while holding locks on multiple objects) are properly handled.

Another issue is what to do if the store rejects a change that seemed like it would succeed locally. If we go with writes that return immediately and don’t wait for success, it would already be too late to tell the original caller that something went wrong. What are some of the reasons this would happen?

Remote storage server is down. Ok, just keep it in the queue until it’s back up.

Remote storage server is out of space. Keep it in the queue is probably best.

Schema mismatch or invalid data. This seems like the worst failure mode.

What do we do with a schema mismatch or invalid data? This indicates a fairly serious problem, so maybe it deserves a serious resolution– revert the object and all other objects related to the transaction that failed both on the remote store and in memory. That still doesn’t seem like a great way to resolve the issue, but it would maintain data consistency.

All in all, this seems like it’d be pretty challenging to implement, and it’d mean that I’d be maintaining a DAL/ORM as opposed to using a pre-existing one. I’m generally against re-inventing the wheel like that, but all the Python ORMs I know about are designed for webapps and don’t translate well to MMOGs.

Really at this point, I’m looking for someone to talk me out of this and tell me why it’s a horrible idea. Otherwise, I might just be crazy enough to try it out. One thing that’s bugging me is that the current mechanism works. I haven’t had any lost data issues or anything; however, I’ve found that sometimes it’s just easier to blow away the whole store than to say revert a change I made.

At the very least, I like how the new system abstracts out the DAL to a service that can optionally be on a remote server. This makes it so the underlying persistence technology can be pretty much anything. Another major benefit would be the ability to freeze objects by storing them and stopping simulation of them. Then you could unfreeze them later on a different server. What’s this good for? Well, making it so your character leaves the world when you log out for one. This isn’t currently possible without a load of hacks.

Comments (31)

January 24, 2010

Ship it!

Filed under: Uncategorized — SirGolan @ 3:09 pm

MV3D Version 0.40 will be officially released today. I’m building the Windows binaries and source release now. I’m happy with the release, but visually, it’s a bit of a step backwards. I don’t have any good screenshots to show, or a movie detailing the new features. There are new features though. They just aren’t visible, and if they are working properly, you should never need to worry about them.

The main theme of this release was completely refactoring the scalability and redundancy of MV3D at the core. This was a very big success. I now feel comfortable saying that MV3D is ready to scale. In a properly configured cluster, items will be both spread across the servers and have redundancy in the case of a failure. Items can be transferred from server to server based load balancing needs.

Looking at MV3D as a whole, there are still a few elements that need major work. The most obvious right now are tools, the client, and possibly data storage (sadly). It seems to me that tools to generate content are less useful if the data storage mechanism changes significantly (since either I’ll have to write a converter or you’ll have to lose all your stuff). On top of that, making the client look good and perform well is hard to do and mostly pointless without any content.

With that said, the logical order is to work on data storage first, then tools, and finally fix up the client. I’m hoping to not spend too much time on data storage. I have some ideas, and generally want to make whatever I come up with retain the same API. For tools, RED needs to be extended and generally made useful. There also needs to be a way to more or less export and import whole areas (or even whole realms) at a time. Any project bigger than a single person messing around will want to have this ability so that they can version control their world.

While fixing a recent bug, I came to realize that a tool for managing MV3D clusters is desperately needed. My repro steps for that bug involved checking in to SVN, updating two servers, restarting them, and then connecting to their SSH console. It would have been nice to be able to run 2 or more servers on my desktop for testing. So, this is another tool that should be forthcoming.

While I’m talking about that, the reason I had to repro the bug on the servers as opposed to in unit tests is because while there are unit test facilities for testing a full server against a real-ish client, there isn’t much in the way of full integration tests between multiple servers. Some of the problems I was tracking down ended up being issues that came up only when you started two servers, created some stuff, shut them down, restarted them, and tried to create more stuff. Integration test helpers for scenarios like this would be pretty awesome.

Another headache I ran into while rebuilding the demo servers was that since the starter world initialization code is written to work on a single server that provides all services, it doesn’t work well when you have a separate directory or account server. This made it so that I was basically typing the code on the console to create the world. No one should be expected to have to do that– a tool is needed that can bootstrap a set of servers to get them to the point where other tools like RED can be used to create a world.

There are plenty more things that need tools, but so I don’t bore people more than usual, I’ll move on to the client. The client seems slow. In reality, it gets 150-160 FPS, but it still feels sluggish for some reason. This is probably physics related, but it makes things look bad. The other big issue on the client is that since the server sends position corrections to the client’s avatar instead of the other way around, moving around can be a little clunky (even with the smoothing that’s done now if there’s a discrepancy). The UI on the client is very clunky and ugly. I’d really like to get that looking better. There’s tons of stuff really that needs help.

Sounds like a lot of work, doesn’t it? It definitely is. If you’ve gotten this far, you must be interested in MV3D, so why not contribute? There are a ton of open tickets now that would be excellent for someone looking to get started.

That’s all for now! If you try out the new release, just be warned that it’s visually the same as the last release. I’m hoping that as tools come online, I’ll be able to add more content to the demo world, but that depends on what code changes I have to make on the server to support the tools.

Comments (22)