Home/Support/Support Forum/How can we monitor packet statistics on the ConnectWiME-9210?
Welcome to Digi Forum, where you can ask questions and receive answers from other members of the community.

How can we monitor packet statistics on the ConnectWiME-9210?

0 votes
This question is targeted at determining a method of performing low level monitoring of the WiFi network "health" on a ConnectWiME-9210.

This would be similar to using the void GetEthStats(Stacktest_stats_t *stats) function if using a wired version.

History - We use both the ConnectME-9210 and ConnectWiME-210 products. We have had a number of occasions where the WiME shows all the correct indications of being connected (Amber light on Solid, wlan_stats showing good signal strength, SSID, IP address, netmask etc but all data activity is dead. No blinking green light, internal logging of connections show failure to access the internet (after reboot and viewing log files) and all other threads appear to be working correctly. Device may have been up for weeks and weeks or a few days.

A power cycle is required to correct the condition and it may not occur again for days for weeks or months. We have the watchdog timer enabled on the devices also, but whatever is happening does not interfere with the thread scheduler which would cause a reboot.

So as a intervening measure we would like to be able to come up with a method that can determine if the Redpine to Digi interface or some another pipe is no longer functioning correctly and reboot the device to restore network connectivity.

Ideally we would like to simply take down the WiFi driver and restart it as to not interfere with the other functionality and ongoing tasks running on the box. There is a Redpine reset function exposed however what is missing is the steps required to gracefully take the network down, reset the Redipine, and bring things back up.

The goal is not to reset things if we simply have no connection, but rather first determining that the feedback from the driver via the naWlnGetStatus function by examining the variables like wlnStatus.state, wlnStatus.rx_signal, wlnStatus.bss_addr[x], wlnStatus.ssid etc for a "valid" state, then examine data activity via IP data activity or other. If the two reported statistics conflict as to the state of the network then reset it. We would then log the event before the reset so that we will have history about the event etc.

We are still trying to determine if this is a hardware related issue or not as when units have been replaced in the field the problem seems to have been resolved. The catch is that we then set up the devices at our lab and they run without issue for weeks.

Another noteworthy piece is that the Wired 9210 devices never have this behavior running the same user software, but compiled for each version.

We are open to any and all advice and if anyone has any history using the Connect WiMe-9210 please share your experience.

I would much rather determine the cause of the issue than use the "wifi watchdog" approach as it tends to simply hide real issues and I am against that however reliability is foremost especially with devices that are being controlled remotely from great distances.

At a minimum a good 'hook 'in the data stream might provide the clues as to what is broken and where.

Thanks!
Brooks
asked Jul 7, 2014 in NET+OS by bdorroh New to the Community (32 points)
edited Jul 7, 2014 by bdorroh
Has there been any resolution to this issue?  In our application that is also running on the Digi Wi-ME 9210 module we are seeing this same behavior.  We also did not see any of these issues with the wired ME 9210 modules.
Hello

     Digi International issued an update to the redpine driver (the wireless driver for the wi-me9210) through the package manager, as follows:

     In previous releases the description of the API naWlnGetStatistics did NOT include support for the wi-me9210. This has been corrected. The only caveat is that fields of the filled in structure tx_bc_frames and rx_bc_frames will always be 0(zero). So you can look at fields tx_bytes, tx_frames, rx_bytes, rx_frames to get a feel for whether the driver has "locked up" or not.

In previous versions of NET+OS that supported the wi-me9210, the API naWlnStopDriver was not supported on the wi-me9210. Thus a method for stopping and restarting the driver was not supplied. This has been corrected in this release. If you perceive that the driver is "locked up" as discussed above, with this NET+OS patch you can now stop and restart the wireless driver.

This is available as an update to NET+OS V7.5.2.

Hopefully this will help you with your issues.

Please log in or register to answer this question.

1 Answer

0 votes
I'm afraid there is not much available for netos to extract statistics from RedPine radio that this module is using.
The only one that comes to mind is:
int naWlnGetStatus(wln_status * status);
Arguments
Name Description
status area to store data


Return values
Value Description
0 successful

negative error


See also
wln_status
Version information
Available since version 6.0

Notes
On platform connectwime9210, the fields tx_rate, rx_rate and tx_power are 0s, which is returned by the Redpine driver. The Redpine driver automatically adjusts those values at the runtime.

Linux has
IOCTL Name: OID_RPS_GET_STATS

Description

This command can be issued only when the driver is operating in the RF Evaluation mode.

This ioctl requests the Lite-Fi™ device to return various vital statistics of the wireless connection. The following structure describes the statistics of the wireless connection:

typedef struct _RSI_API_WLAN_STATS {
u16 tx_frames;
u16 rx_frames;
u16 tx_retries;
u16 rx_retries;
u16 rx_duplicates;
u16 crc_errors;
u16 buffer_full;
u16 cca_stuck;
u16 false_rxstart;
u16 false_cca;
u16 rx_data;
u16 tx_rate;
u16 false_trigger;
u16 signal_power;
u16 noise_pwr;
u16 rssi_loc;
u16 tx_ack;
u16 rx_ack;
u32 false_cca_time;
u32 idle_time;
u32 tsf;
u16 time_in_backoff;
u16 titout_drops;
u16 beacon_loss_cnt;
} RSI_API_WLAN_STATS;
tx_frames - number of successfully transmitted frames. (data, management + control frames).
rx_frames - number of successfully received frames (data, management + control frames).
tx_retries - total number of retransmitted frames.
rx_retries - total number of retransmitted frames.
rx_duplicates- total number of duplicated frames.
crc_errors - total number of CRC errors.
cca_struck - When CCA signal goes low, but no frame is received after timeout.
false_rxstart - Whenever CCA signal goes high, while receiving Rx vector.
false_cca - Whenever CCA goes low and before the interrupt comes, the CCA goes high again.
rx_data - Number of data frames received.
tx_rate - The rate at which the frame is transmitted.
false_trigger - Whenever CCA goes high, while receiving rx header.
signal_power - Signal power of the received frame.
noise_pwr - Noise power of the received frame.
rssi_loc - Calculated RSSI of the received frame.
tx_acks - total number of transmitted ACK frames.
rx_acks - total number of received ACK frames.
false_cca_time - Total time spent in false CCA.
idle_time - Total idle time.
tsf - Desired time interval at which credibility of the frame to be checked.
time_in_backoff - Total time spent in backoff.
titout_drops - Number of times packets have been dropped due to backoff.
beacon_loss_cnt - Number of beacons that are lost or not received with in the time interval.

IOCTL name OID_RPS_GET_STATS
Default value N/A
Input parameter None
Output parameter Output size: 36 bytes. A variable of type RSI_API_WLAN_STATS to get the statistics.
Explicit reset required No
Usage

To get the current To get the statistics, enter the following at the command prompt:



# iwpriv wlan0 get_stats
answered Jul 15, 2014 by LeonidM Veteran of the Digi Community (4,457 points)
The API reference manual excludes the connectwi-me9210 from supported platforms for API naWlnGetStatistics. This turns out to be 66% true. naWlnGetStatistics returns a structure containing 6 fields as follows:
tx_bytes, tx_frames, rx_bytes, rx_frames, tx_bc_frames, rx_bc_frames.

Due to restrictions in the redpine WiFi driver. tx_bc_frames and rx_bc_frames fields are not available anf thus will always return 0(zero). On the other hand, the other four, tx_bytes, tx_frames, rx_bytes and rx_frames values returned are correct.

So four of the six fields contain correct values while two of the six fields are not filled in and remain 0.
Thank you both for the responses which are valuable. We did discover this after going through support who clarified that the documentation was not 100 percent correct, and the this data was available. After which we crafted up a routine to monitor the 'pipe' for data activity.

Technical support has reproduced the issue and raised it to the engineering level. Apparently this condition was already known (at the engineering level) and the method we came up with to help diagnose the issue was already in use by other customers to detect a condition whereby the only solution is to reboot the device.

This method is full of pitfalls and not a desired solution, simply as I called it before  "a wifi watchdog" routine.

The key is determining when to throw in the towel and reboot based on no data activity. Coming up with the correct metrics to determine this is not 100 percent accurate.

- State One - The device reports being connected but no data transfers.
Clearly the first approach was to look at the Status values and if they indicated a good SSID and Signal Strength  and then look at then Statistics to see if there was no data activity on either tx_bytes or rx_bytes for X time period, then Reboot. We have seen this numerous times. This condition denotes a clear problem and a short timeout before rebooting would be justified.

- State Two - The device reports not being associated with a AP but the AP is on and working fine.
This condition is unknown and could be the same results as though the AP is powered OFF. The  tx_bytes and rx_bytes remain at 0. Reboot periodically at a longer time out value? We have seen this condition a few times.

The timing of the reboot must be coordinated with other ongoing activities and functions on the device, which during these conditions all appear to be functioning fine. There are numerous things to check and consider before randomly rebooting the device, which adds additional complexity to our code simply to get around this issue.

We have had other instances of the WiFi simply not wanting to join a open router in other environments, all other devices join fine. Moving the router to another location (site) and touching nothing on the router,  it joins just fine. Rather suggests that some other local wireless device data activity is somehow preventing the digi wifi from negotiating correctly or putting it in a state that is similar to the above conditions.

I am afraid the simple answer is that the WiFi driver interface still needs work and additional testing. While it may work fine in most conditions, we have recently seen about a 10% non-joining or drops connection issues with this module. The number may be higher as some customers may simply be power cycling the device on their own and not reporting it to our support staff. Our cloud metrics seem to collaborate this.

If either of you sharp guys have some thoughts on the above I would appreciate any and all.
...