Sign in to follow this  
Followers 0
SMT

[bug] random crashes #3, deadlocks

1 post in this topic

i've catched one crash when accessing invalid OnlineUser pointer. the reason is unguarded pointer usage in ClientManager::setPkLock, user become offline and was deleted in another thread

void setPkLock(const User::Ptr& p, const string& aPk, const string& aLock) {

				OnlineUser* ou;

				{

						Lock l(cs);

						OnlineIterC i = onlineUsers.find(p->getCID());

						if(i == onlineUsers.end()) return;


						ou = i->second;

						ou->getIdentity().set("PK", aPk);

						ou->getIdentity().set("LO", aLock);

				}

				ou->getClient().updated(*ou); // <- ou valid only when ConnectionManager::cs locked

		}
well, i've changed code into
void setPkLock(const User::Ptr& p, const string& aPk, const string& aLock) {

				OnlineUser* ou;

				{

						Lock l(cs);

						OnlineIterC i = onlineUsers.find(p->getCID());

						if(i == onlineUsers.end()) return;


						ou = i->second;

						ou->getIdentity().set("PK", aPk);

						ou->getIdentity().set("LO", aLock);

						ou->getClient().updated(*ou); // !SMT!-fix

				}

		}

and got deadlock after 30 minutes

consider threads:

_____________________thread 3052:

UserConnection::on(BufferedSocketListener::Line, aLine = "$Lock=... Pk=...")

...fire(UserConnectionListener::CLock()) <= Speaker<UserConnectionListener>::cs locked!

ConnectionManager::on(UserConnectionListener::CLock)

ClientManager::setPkLock()

...Lock l(cs) <= ClientManager::cs locked!

Client::updated

...fire(ClientListener::UserUpdated() <= Speaker<ClientListener>::cs locked!

_____________________thread 2268:

NmdcHub::onLine(aLine = "$Search...")

...fire(ClientListener::NmdcSearch()) <= Speaker<ClientListener>::cs locked!

ClientManager::on(NmdcSearch)

...fire(ClientManagerListener::IncomingSearch()) <= Speaker<ClientManagerListener>::cs locked!

_____________________thread 844:

NmdcHub::onLine(aLine = "$MyInfo $All ...")

NmdcHub::getUser()

ClientManager::putOnline()

...fire(ClientManagerListener::UserConnected() <= Speaker<ClientManagerListener>::cs locked!

QueueManager::on(ClientManagerListener::UserConnected)

ConnectionManager::getDownloadConnection()

...Lock l(cs) <= ConnectionManager::cs locked!

_____________________thread 2216:

BufferedSocket::checkEvents()

BufferedSocket::fail()

...fire(BufferedSocketListener::Failed()) <= Speaker<BufferedSocketListener>::cs locked!

UserConnection::on(BufferedSocketListener::Failed)

...fire(UserConnectionListener::Failed()) <= Speaker<UserConnectionListener>::cs locked!

________________________________________________________________________________

___

as you see, each thread locked by next and last thread locked by first.

huh, it's really was not easy to debug, deadlock appeared when Apex got 96 threads.

i want to see locks hierarhy graph from program architect (every upper-level lock should not locked if lower-level lock is already locked), but i don't know who developed original 'client' library code

the only solution is not to pass OnlineUser& into on(UserUpdated) event, but User::Ptr instead. bugfix code will be available on new version (affects many files but now my working copy contains a lot of unfinished things, unrelated to this problem, needed to be done)

ps:

some functions in NmdcHub, also invokes getUser(nick) and this OnlineUser pointer is not guarded. i think, all ClientListener events, that passing OnlineUser& should be rewritten in same way

Share this post


Link to post
Share on other sites
Sign in to follow this  
Followers 0