Sunday, June 6, 2010

Troubleshoot Tips

Even though you have the whole knowledge of this technology, it is impossible to implement a protocol stack or test cases which does not need any troubleshooting. In reality, most of the engineers has relatively good knolwedge on a specific area or specific layer but much less knowledge on other layers. However when a problem happens, we usually have to analyze across multiple layers meaning that we need knowledge on several layers and detailed interrelations between those layers.

In a word, there is no way to troubleshoot in a single shot and no short cut for it. A third of them came from the knowledge and a third of them came from experience and the other third came from the combination of these two.

In this section, I would try putting down some troubleshooting tips mostly based on my experience.

Tools for troubleshoot

The more tools you have, generally the easier to troubleshoot. I hope I can get at least the followings tools as minimum (in many case even this minimum are not meet giving me more headache though)

i) Logging tools on network emulator (It should have not only signaling log (L3 and above) but also all the lower layer log as well)
ii) Logging tools on UE (In this case as well we need not only signaling log but also all the lower layer log)
iii) RF Vector Sepctrum Analyzer (This should have good qulaity of zero-span with triggering capability. It would help a lot at RACH process or handover process troubleshooting).

The most important initial 5 steps

The most important 5 steps for registration are as follows :
We have to know every details of these process and all the factors influencing this process.

i) RACH Preamble
ii) RACH Response (Msg 2)
iii) RRC Connection Request (Msg 3)
iv) RRC Connection Setup
v) RRC Connection Setup Complete

First thing we have to consider is timing requirement between each step and the following step. Time Interval between i) and ii) is 0~12 sub frames. The requirement between ii) and iii) is 6 sub frames. The network should complete the lower layer configuration for Msg3 reception at least 4 sub frames before the msg3 comes into the network.


"No Service" on Power On

When you turn on the UE connected to the network simulator, you will see "Searching Network... " message for several seconds and you will be sweating a lot for this period if you are protocol stack developer or test case developer.

If it goes to next step and UE start registration, you will be happy and the problem happens when it stop searching and "No Service" message pops up.

First step would be to read section 5.1, 5.2 of 36.331 and get the clear understanding of what is the expected procedure on UE and Network side.

If I have the UE logging tool, I would first check it to see if the UE correctly decoded MIB, SIB1, SIB2 at least. When the UE side log is not available or UE log shows that any one of these are not recieved, we have to see the network side log or protocol stack source code if it is available. In most case you would see that MIB, SIB1, SIB2 is not missing. Then why UE fails to decode them ?

Two possibilities that I can think of
i) The scheduling information on SIB1 for other SIBs so that multiple SIBs overwrite each other.
ii) There is no problem in the scheduling, but UE has some issues with being tuned for the specific schedule. (This kind of situation would not happen when the technology is mature but possible at the initial phase of technology like LTE and I have experienced this situation).

Network detected but no further progress

"No Service" message shown on UE screen but no registration process starts. The first item you have to check at this stage is to check whether UE sent RACH or not. How do we verify this ?
i) Check UE log if the log says "RACH" get transmitted
ii) Check Network emulator log if it received "PRACH" signal (You need to have Network emulator which has very detailed logging capability to show this).
iii) Use spectrum analyzer to detect PRACH from UE (Since the signal analyzer does not know exactly the PRACH signal comes in and the PRACH is a burst type of signal, put the spectrum analyzer in zero-span mode and set the proper trigger for it).

UE keep sending PRACH

In normal case, if UE send PRACH and network send RACH Response and UE is supposed to stop sending PRACH and initiate RRC Session by sending 'RRC Connection Request'. If UE keep sending PRACH, it means there is some issues with processing 'RACH Response' process.

i) Check Network emulator log if it received "PRACH" signal (You need to have Network emulator which has very detailed logging capability to show this).
ii) Check Network emulator log if it sent RA Response.
iii) Check Network emulator log if the timing requirement between PRACH reception and RA Response has been satisfied. (Even though network sents the RA Reponse, UE keep trying RACH process if network sent it too late).
iv) Check UE log if the log says "RACH Response" recieved
v) Check UE log if the PRACH transmission and RA Response has been done within the timing requirement.

Unfortunately in this case it is hard to use a spectrum analyzer because downlink signal has so many trains of other signals to make it hard to set the trigger on the spectrum analyzer side to detect the RACH response unless the spectrum analyzer has specific decoding capability so that it can use RACH response itself as a trigger.

Another possibility would be the following case,
i) UE transmit PRACH Preamble
ii) Network sent RACH Response
iii) UE properly decode RACH response
iv) UE sent 'RRC Connection Request'
v) Network failed to decode 'RRC Connection Request' and does not send 'RRC Connection Setup'
vi) (Timeout for 'RRC Connection Setup') UE reinitiate PRACH process

If you see the network side log, you would not see 'RRC Connection Request' even though UE log says it sent the message.

The most common cause for this situation would be related to step iii) and iv). If you read the 'Understand RACH !' section, you would remember that RACH Response message carries 'UL Grant' which basically carries the resource allocation for 'RRC Connection Request' message. If UE uncorrectly decoded 'RA response' message, it will send 'RRC Connection Request' message in the wrong locations and network would fail to decode it even though UE sent the message. Another possibility can on network side. If network sent wrong RACH Response message (wrong UL Grant) which is different from it's MAC Layer setting setting for UL CCCH, it would fail to decode it. This kind of problem would happen pretty often when you create test case for UE testing. If you have a working test scenario on a certain system bandwidth and then just changed the System Bandwidth and all of the sudden RACH process fails.. in this case the first place you have to check would be UL Grant field of RA Response message.




Wednesday, June 2, 2010

DCI

When you study the physical frame structure of LTE, you may be impressed by flexibility (meaning complexity in other way) of all the possible ways of resource allocation. It was combination of Time Domain, Frequency Domain and the modulation scheme. Especially in frequency domain, you have so many resource blocks you can use (100 Resource Blocks in case of 20 Mhz Bandwidth) and if you think of all the possible permutation of these variables, the number will be very huge. Then you would have this question (At least I had this question).. How can the other party (the recieving side) can figure out exactly where in the slot and in which modulation scheme that the sender (transmitter) transmit the data(subframe)? I just captured the physical signal but how can I (the reciever) decode this signal. This is where the term called 'DCI(Downlink Control Indicator)' comes in.

It is DCT which carries those detailed information like which resource block carries your data and what kind of demodulation scheme you have to use it to decode data and some other additional information. It means you (the reciever) first have to decode DCI and based on the informat you got from the DCI you can decode the real data. It means without DCI, decoding the data delivered to you is impossible.Not only in LTE, but also in most of wireless communication the reciever has the same problem(the same question). In WCDMA R99, Slot format and TFCI carries those information and in HSDPA HS-SCCH carried those information and in HSUPA E-TFCI carried it.

In terms of protocol implementation with respect to carrying these information, R99 seems to be the most complicated one. You had to define all the possible combination of resource allocation in the form of TFCS (a kind of look-up table for TFCI) and you have to convey those information through L3 message (e.g, Radio Bearer Setup message and RRC Connection Setup message) and the transmitter also have to configure itself according to the table. A lot of error meaning headache came from the mismatches between the TFCS information you configured in L3 message and the configuration the transmitter applied to itself (transmitter's lower layer configuration). It has been too much headache to me. HSDPA relieved the headache a lot since it carries these information directly on HS-SCCH and this job is done by MAC layer. The resource allocation information carried by HS-SCCH is called 'TFRI'. So I don't have to care much about L3 message.. but still I need to jump around the multiple different 3GPP document to define any meaningful TFRIs. And other complication was that even in HSDPA we still using R99 DPCH for power control and signaling purpose, so I cannot completely remove the headache of handling TFCS.Now in LTE, this information is carried by DCI as I explained above and we only have to care about just a couple of parameters like Number of RBs, the starting point of RBs and the modulation scheme and I don't have to care anything about configuring these things in RRC messages. This is a kind of blessing to me.

As one example showing how/when DCI is used, refer to http://jaekuryu.blogspot.com/2010/01/lte-signalinig-essentials.html section "Uplink Data Transmission Scheduling - Persistent Scheduling"

Types of DCIs

DCI carries the following information :
i) UL resource allocation (persistent and non-persistent)
ii) Descriptions about DL data transmitted to the UE.

L1 signaling is done by DCI and Up to 8 DCIs can be configured in the PDCCH. These DCIs can have 6 formats : 1 format for UL scheduling, 2 formats for Non-MIMO DL scheduling, 1 format for MIMO DL Scheduling and 2 formats for UL power control.

Format 0 : UL SIMO and UL Power Control. This functions as a Grant for UL transmission
Format 1, 1 A : DL SIMO and UL Power Control
Format 2 : DL MIMO and UL Power Control
Format 3 : UL Power Control Only (for multiple UEs)
Format 3A : UL Power Control Only (for multiple UEs)


DCI has various formats for the information sent to define resource allocations. The resource allocation information contains the following items in it.
i) number of resource blocks being used
ii) duration of allocation in TTI
iii) support for multiple antenna transmission

What determines a DCI Format for the specific situation ?

There are two major factors to determine a DCI format for a specific situation as follows :
i) RNTI Type
ii) Transmission Mode

This means that you cannot change only one of these parameters arbitrarily and you always have to think of the relationships among these when you change one of these parameters. Otherwise you will spend a long time for troubleshooting -:)

Those tables from 3GPP 36.213 shows the relationships between RNTI Type, Transmission Mode and DCI format.





Any relations between DCI format and Layer 3 signaling message ?

Yes, there is a relationship. You have to know which DCI format is required for which RRC message. Following tables from 3GPP 36.321 shows the relationship between RNTI and logical channel and you would know which RRC message is carried by which logical channel. So with two step induction, you will figure out the link between RRC message and it's corresponding DCI format.


For example, if you see the "Security Mode Command" message of section 6.2.2 of 36.331, it says

Signalling radio bearer: SRB1
RLC-SAP: AM
Logical channel: DCCH
Direction: UE to E-UTRAN

If you see the table, you would see this message is using C-RNTI. and you will figure out the possible candiates from table 7.1-5 of 36.213 and if you would have detailed information of the transmission mode, you can pinpoint out exactly which DCI format you have to use for this message for a specific case. Assuming TM mode in this case is TM1 and scheduling is dynamic scheduling, if you see Table 7.1-2 you will figure out that this is using C-RNTI. With this RNTI Type and TM mode, if you see table 7.1-5, this case use DCI Format 1 or DCI Format 1A.

DCI Decoding Examples

Example 1 > DCIFormat 0, value = 0x2584A800

You can figure out the Start of RB and N_RB (Number of allocated RB) from RIV value.

How can I calcuate Start_RB and N_RB from RIV. The simple calcuation is as follows :
i) N_RB = Floor(RIV/MAX_N_RB) + 1= Floor(1200/50) + 1 = 25, where MAX_N_RB = 50 in this case since this is 10 Mhz System BW.
ii) Start_RB = RIV mod MAX_N_RB = 1200 mod 50 = 0