I got a call saying the database server is inaccessible because tempdb is full. I get online and try to connect because I want to see the error for myself. Even when someone gives me a screenshot; I can’t help myself, I want to see it firsthand. I confirm, yes, it’s that error they’ve given me:
I can’t connect via SSMS. Now, I could have just restarted the SQL Server service to clear out tempdb and start over, especially since it’s a production issue causing a customer outage. The problem with this is that I won’t know who the problem child or children are then. I will have no idea what filled up tempdb or how we can prevent it in the future.
Now is where I thank my past self because I made sure that the dedicated admin connection (DAC) is enabled on all of my SQL Servers. If you don’t know what that is or how to enabled it, click here and read Kendra Little’s explanation. The TL;DR is you can connect via DAC even if you can’t connect with an error like in the above screenshot (and others where the server may be too busy to allow connections), but DAC has a specific way you need to connect.
As part of this post, you will learn how to:
How to connect with the DAC via SSMS
- First, you have to be remoted into the machine where SQL Server is running.
- To connect with DAC in SSMS, you will need to connect with the Database Engine Query button, which you can see highlighted below:
- When the login dialog box pops up, you will need to prefix your connection with Admin: as shown below:
- Once you’ve connected, you will see something like the following screenshot in SSMS, which shows you are connected with the DAC:
Important note: You can make one connection to the DAC. If you try to connect with it a second time, you will get an error:
Good news though, one is all you need to get in there and see what’s going on.
Querying to find the root cause of a full tempdb
The next challenge with a full tempdb is that you can’t join anything to anything else. Not even system tables or views. If you are used to using sp_whoisactive or sp_who2, you can’t use them with a full tempdb. You have to resort to unjoined queries to get to the root cause.
To begin with, I used sp_who to get a list of connections and what dbname they are using. This is when I noticed a user creating a table in tempdb:
At this point, I assumed that this user may be the culprit, but I also continued to look around if there were any other users doing anything suspect. I also looked at any sessions that were in a state of awaiting command on tempdb:
Count of connections by user
I noticed with sp_who there were a LOT of connections from the application, so I did a query to get a list of total connections by user:
SELECT a.* FROM (SELECT DB_NAME(dbid) as DBName, COUNT(dbid) as NumberOfConnections, loginame as LoginName FROM sys.sysprocesses WHERE dbid > 0 GROUP BY dbid, loginame) a ORDER BY a.DBName, a.LoginName;
The above query will give you results like this:
There is nothing very alarming about those connections since they are on my test system. The application was connected 104 times on the production system, which seems a bit more alarming to me. Thing is, I don’t know how many connections it normally has so I had to continue on with specifically looking at tempdb connections.
Determining what query each session is executing
Back to sp_who results, I noted the spids on any sessions that were “using” tempdb even if the session was in a status of sleeping. At this point I ran the following query on each spid:
DBCC INPUTBUFFER(63) GO
This returns something like the following screenshot, which gives you the EventInfo. The EventInfo is the query the user is trying to execute.
In this case, I created a query to fill up tempdb (thank you to dbdigger for that query) so that’s what you are seeing in the results above, and in this case, I know it’s the culprit. With the production incident, it seemed to be a ROW_NUMBER() OVER PARTITION BY query that caused the issue. We grant direct database access to users in many cases (I know it’s not a great idea and it predates my time there), so it can be a crap shoot what queries are running.
While I was digging around, I wanted to know exactly why the application was using tempdb. It turns out it was some stored procs. I needed to see the stored proc definition, which you can’t do via the SSMS GUI because you are connected via DAC. You won’t be able to navigate via the GUI. In this case, I had to use the following query to get the stored proc definition:
SELECT Name, OBJECT_DEFINITION(object_id) AS [Definition] FROM sys.procedures WHERE OBJECT_DEFINITION(OBJECT_ID) LIKE '%storedprodname%' GO
The stored proc is creating temp tables, so not a big surprise, but it does this all the time, day in and day out, so I assumed it was still a one-off user query that had been filling up tempdb.
I went ahead and killed that user session.
That didn’t seem to allow the app to work again. This is the app that external clients use to get data they pay us for, so I didn’t want to sit around waiting any longer. I went ahead and killed all tempdb queries to make sure tempdb freed up as fast as possible.
SELECT 'KILL ' + CAST(session_id as varchar(100)) AS Sessions_to_kill FROM sys.dm_exec_requests where session_id in (63,64,72) GO
Then things started to flow again.
It was an eye-opening experience. For the user, who got cut off from production access, but also for me because I never had to use the DAC before. I didn’t realize that my normal troubleshooting queries wouldn’t work. Again, thank you past self for enabling the DAC. I knew I might need it someday but never thought about that specific day until it was upon me.