Sunday, June 6, 2010

Troubleshoot Tips

Even though you have the whole knowledge of this technology, it is impossible to implement a protocol stack or test cases which does not need any troubleshooting. In reality, most of the engineers has relatively good knolwedge on a specific area or specific layer but much less knowledge on other layers. However when a problem happens, we usually have to analyze across multiple layers meaning that we need knowledge on several layers and detailed interrelations between those layers.

In a word, there is no way to troubleshoot in a single shot and no short cut for it. A third of them came from the knowledge and a third of them came from experience and the other third came from the combination of these two.

In this section, I would try putting down some troubleshooting tips mostly based on my experience.

Tools for troubleshoot

The more tools you have, generally the easier to troubleshoot. I hope I can get at least the followings tools as minimum (in many case even this minimum are not meet giving me more headache though)

i) Logging tools on network emulator (It should have not only signaling log (L3 and above) but also all the lower layer log as well)
ii) Logging tools on UE (In this case as well we need not only signaling log but also all the lower layer log)
iii) RF Vector Sepctrum Analyzer (This should have good qulaity of zero-span with triggering capability. It would help a lot at RACH process or handover process troubleshooting).

The most important initial 5 steps

The most important 5 steps for registration are as follows :
We have to know every details of these process and all the factors influencing this process.

i) RACH Preamble
ii) RACH Response (Msg 2)
iii) RRC Connection Request (Msg 3)
iv) RRC Connection Setup
v) RRC Connection Setup Complete

First thing we have to consider is timing requirement between each step and the following step. Time Interval between i) and ii) is 0~12 sub frames. The requirement between ii) and iii) is 6 sub frames. The network should complete the lower layer configuration for Msg3 reception at least 4 sub frames before the msg3 comes into the network.


"No Service" on Power On

When you turn on the UE connected to the network simulator, you will see "Searching Network... " message for several seconds and you will be sweating a lot for this period if you are protocol stack developer or test case developer.

If it goes to next step and UE start registration, you will be happy and the problem happens when it stop searching and "No Service" message pops up.

First step would be to read section 5.1, 5.2 of 36.331 and get the clear understanding of what is the expected procedure on UE and Network side.

If I have the UE logging tool, I would first check it to see if the UE correctly decoded MIB, SIB1, SIB2 at least. When the UE side log is not available or UE log shows that any one of these are not recieved, we have to see the network side log or protocol stack source code if it is available. In most case you would see that MIB, SIB1, SIB2 is not missing. Then why UE fails to decode them ?

Two possibilities that I can think of
i) The scheduling information on SIB1 for other SIBs so that multiple SIBs overwrite each other.
ii) There is no problem in the scheduling, but UE has some issues with being tuned for the specific schedule. (This kind of situation would not happen when the technology is mature but possible at the initial phase of technology like LTE and I have experienced this situation).

Network detected but no further progress

"No Service" message shown on UE screen but no registration process starts. The first item you have to check at this stage is to check whether UE sent RACH or not. How do we verify this ?
i) Check UE log if the log says "RACH" get transmitted
ii) Check Network emulator log if it received "PRACH" signal (You need to have Network emulator which has very detailed logging capability to show this).
iii) Use spectrum analyzer to detect PRACH from UE (Since the signal analyzer does not know exactly the PRACH signal comes in and the PRACH is a burst type of signal, put the spectrum analyzer in zero-span mode and set the proper trigger for it).

UE keep sending PRACH

In normal case, if UE send PRACH and network send RACH Response and UE is supposed to stop sending PRACH and initiate RRC Session by sending 'RRC Connection Request'. If UE keep sending PRACH, it means there is some issues with processing 'RACH Response' process.

i) Check Network emulator log if it received "PRACH" signal (You need to have Network emulator which has very detailed logging capability to show this).
ii) Check Network emulator log if it sent RA Response.
iii) Check Network emulator log if the timing requirement between PRACH reception and RA Response has been satisfied. (Even though network sents the RA Reponse, UE keep trying RACH process if network sent it too late).
iv) Check UE log if the log says "RACH Response" recieved
v) Check UE log if the PRACH transmission and RA Response has been done within the timing requirement.

Unfortunately in this case it is hard to use a spectrum analyzer because downlink signal has so many trains of other signals to make it hard to set the trigger on the spectrum analyzer side to detect the RACH response unless the spectrum analyzer has specific decoding capability so that it can use RACH response itself as a trigger.

Another possibility would be the following case,
i) UE transmit PRACH Preamble
ii) Network sent RACH Response
iii) UE properly decode RACH response
iv) UE sent 'RRC Connection Request'
v) Network failed to decode 'RRC Connection Request' and does not send 'RRC Connection Setup'
vi) (Timeout for 'RRC Connection Setup') UE reinitiate PRACH process

If you see the network side log, you would not see 'RRC Connection Request' even though UE log says it sent the message.

The most common cause for this situation would be related to step iii) and iv). If you read the 'Understand RACH !' section, you would remember that RACH Response message carries 'UL Grant' which basically carries the resource allocation for 'RRC Connection Request' message. If UE uncorrectly decoded 'RA response' message, it will send 'RRC Connection Request' message in the wrong locations and network would fail to decode it even though UE sent the message. Another possibility can on network side. If network sent wrong RACH Response message (wrong UL Grant) which is different from it's MAC Layer setting setting for UL CCCH, it would fail to decode it. This kind of problem would happen pretty often when you create test case for UE testing. If you have a working test scenario on a certain system bandwidth and then just changed the System Bandwidth and all of the sudden RACH process fails.. in this case the first place you have to check would be UL Grant field of RA Response message.