srmeier/KnightOnline

Access violations

Closed this issue · 11 comments

ekran goruntusu 8

ekran goruntusu 6

Sorry for the Turkish VS2017.

In the first image it says that:
"at 0x017... on KnightOnline.exe a special case thas occurred.
0xC000... : Write access violation at position 0xCCC... "

And in the second image, it says that:
"Exception occurred: read access violation.
CN3UIWndBase::m_sRecoveryJobInfo.pItemSource is nullptr."

I often face with the errors in the first image, and they generally happen at some specific lines of code.

A call stack and locals/autos window would be very much useful for the first image.
The latter appears that the source item isn't set, but you give us nothing to go on as you don't say how you specifically produced that. Did it involve dragging the item or did you right-click it in the UI, for example?

Details are important.

Actually, the first type of errors happens randomly. I mean sometimes right after when I logged in to the game, sometimes during the game, there is no specific correlation between the events when the error occurs, as far as I see.

I don't not have detailed info about the errors right now, but I will take rigorous notes when these types of errors occur in the future.

Without a call stack at the very least, it's impossible to say anything concrete about the first one.
A call stack is important because it gives us a strong idea of where this is happening.

What we can see from the screenshot is the following:

  1. The crash probably happened in __VertexTransformed::Set().
  2. As it doesn't do anything other than set member variables, and it's throwing an access violation with address 0xccccccc (which in debug builds, means the memory is not initialised), assuming VS is even correct about the method it's crashing in, something is trying to use an address to a vertex that hasn't actually been set.

As __VertexTransformed is used often, without a call stack, we don't know which use case this was and thus can't further determine what is using an uninitialised pointer, nor why.

It's very important you always include this information with crashes. I'm not sure why they're even hidden from your IDE; these windows are always useful while debugging, which is why they should be enabled by default (I haven't used VS2017 though, but they've always shown up by default in all versions of VS up to 2015 at least).

The error says:
"Exception occurred: read access violation.
_Mycont was 0x4BF17B00."

screenshot 10
while
screenshot 12

screenshot 14

It is again a "Write access violation" error.

screenshot 15

screenshot 16

While

screenshot 17

Okay. I actually know why the second one's happening, as it's something I've fixed before in another project.

			m_pVB->Lock( 0, 0, (void**)&pVertices, 0 );

This is failing, which in turn is causing the following code to blindly run and crash.

It fails when the device is inaccessible, which tends to happen when you alt+tab or ctrl+alt+del.
What needs to be happening here is we need to be handling these cases and recovering from it.

What you'll find after implementing error handling these Lock() calls (not just this one) is certain resources will be lost because they're not managed by DirectX (and we don't handle it ourselves), so they'll need to be moved into D3DPOOL_MANAGED so that DirectX can restore them automatically for us.

Regarding the first one, I would move up the callstack to CGameProcedure::Tick() and check the locals there. It seems to be an issue with the packet handling, but it's not clear why. I don't recall how networking works in this project, so it may possibly be a race condition (packet being retrieved / removed from the queue while another is being added) but I couldn't say for certain.

std::deque isn't thread-safe. We should make it thread-safe with synchronization objects?
@twostars what do you suggest?

Right, but which threads are we talking about? Because the client operates in a primarily single-threaded manner.

Are we talking about packets? These are pooled in the main thread and handled on each frame (again, in the main thread).

So I don't understand how a race condition can occur if the queue is handled in one thread. Sounds to me like stacko.

Okay. Had to sift through the screenshots above because I didn't even recall seeing one about packets specifically, but what is there is fairly vague.

Looking at this project's source base, I can tell you one thing for certain: all packet logic is handled in the main thread.

We handle receive requests via windows messaging (WndProcMain gets called).
Socket requests are as such:

		case WM_SOCKETMSG: {
			switch(WSAGETSELECTEVENT(lParam))
			{
				case FD_READ: {
					CGameProcedure::s_pSocket->Receive();
				} break;

Receive(), of course, calls ReceiveProcess() which ultimately appends the packet to the queue:

					m_qRecvPkt.push(pkt);

The message pump is called via LocalInput::Tick() (... this is terrible, but ok), which polls events from SDL:

	while(SDL_PollEvent(&uSDLEvents)) {
...
			case SDL_SYSWMEVENT: {
				// TEMP: until things become less window's dependent
				WndProcMain(uSDLEvents.syswm.msg->msg.win.hwnd,
					uSDLEvents.syswm.msg->msg.win.msg,
					uSDLEvents.syswm.msg->msg.win.wParam,
					uSDLEvents.syswm.msg->msg.win.lParam
				);
			} break;

This is called via CGameProcedure::Tick().

void CGameProcedure::Tick()
{
	s_pLocalInput->Tick(); // Å°º¸µå¿Í ¸¶¿ì½º·ÎºÎÅÍ ÀÔ·ÂÀ» ¹Þ´Â´Ù.

Which is ultimately called by the main game thread:

	CGameBase::s_bRunning = true;

	while(CGameBase::s_bRunning) {
		CGameProcedure::TickActive();
		CGameProcedure::RenderActive();
	}

The packet being read from the queue is also handled in CGameProcedure::Tick():

	while (!s_pSocket->m_qRecvPkt.empty())
	{
		auto pkt = s_pSocket->m_qRecvPkt.front();
		if (!ProcessPacket(*pkt))
			CLogWriter::Write("Invalid Packet... (%d)", pkt->GetOpcode());

		delete pkt;
		s_pSocket->m_qRecvPkt.pop();
	}

So it's not a race condition; these are handled pretty obviously in the same thread.

So what is causing it? Well, likely something being modified when the packet itself is handled.
The queue itself is out -- that's only modified here and in ReceiveProcess(), neither of which will get handled directly by packets.

The screenshot above is terrible, so I can't say for certain, however the only other likely option is s_pSocket itself.

So what to do about it? I'd swap() the queue over to a localised queue so s_pSocket can be disregarded while actually processing the packets. That should cut that potential issue out.

As for whether or not that's actually the issue, I'm not sure. There's not enough data to go off to be sure.

The packet-related one is an overrun which was fixed by f5e6dc2.
It's caused by WIZ_TEST_PACKET (0xff) which can be invoked automatically by hotkey, but bad communication in general can easily cause this too.

The others are all separate issues so it's difficult to say much on them, but one issue for the various separate issues isn't terribly helpful, so I'm closing this one and opening up separate issues for the other 2 crashes: #147 and #148