Home/Support/Support Forum/New bootloader image won't correctly recover using DHCP/TFTP
Welcome to Digi Forum, where you can ask questions and receive answers from other members of the community.

New bootloader image won't correctly recover using DHCP/TFTP

0 votes
I've been working on testing the ability to recover bad image.bin uploads using the bootloader's DHCP/TFTP recovery mechanism. I'm using tftpd32, which didn't work on Vista, but on XP it worked to upload a new image on modules that had an OLD bootloader image. But after updating the bootloader using a newly compiled (from NET+OS v7.3) rom.bin file, the recovery sequence does send out a DHCP discovery request, and the DHCP server program offers up a valid IP address, but the module ignores the offer and tries again 2 more times before failing and moving on to attempt a serial recovery.

When capturing debug code during the recovery attempt, I see "TFTP...", indicating the the downloadImageUsingTftp() routine is being called in blmain.c. Then after the 3 attempts at getting IP parameters via DHCP, I get the message "DHCP fails", indicating that the board_initialize_dhcp() call failed.

Has anyone else had this problem using a bootloader built from v7.3?
asked Oct 23, 2008 in NET+OS by jfichtner New to the Community (12 points)
recategorized Dec 18, 2013 by tuxembb

Please log in or register to answer this question.

29 Answers

0 votes
Yeah, maybe my problem is different. I do have a new image, but...

The release notes of one of the patches says this:

- - -
Title
TFTP recovery failure
Case: none
Date Fixed: 04/18/08
Description
Image download via TFTP completes, but does not execute.

Solution
Changed all members of the tftpc_conn_t structure to "volatile" as this structure is
accessed both from the bootloader code and the Ethernet interrupt processing code.

- - -

This tells me that even though I am downloading the new code, chances are I'm never executing it. If this is the case, I now have a ton of modules that have code in them that can't be replaced (they are soldered onto a PCB and can't use the serial recovery console, at least not without a ton of work to make my app pass through serial data from that port to another).

Jeff.
answered Dec 11, 2008 by jwormsley Community Contributor (78 points)
0 votes
It seems like that bug would have to be present in the pre-loaded bootloader (scary?), not in your image. Have you tried uploading an app with FTP (like the sample) and uploading the bootloader (rom.bin) built with your application, and then uploading your new image via recovery?

Though, if this were the case even the FTP app wouldn't take, and your modules would basically all need to be RMA'd, as no image would "take".
answered Dec 11, 2008 by nfgaida New to the Community (27 points)
0 votes
I don't know for sure, but I suspect that the "factory fresh" modules ship with a bootloader compiled from the latest NET+OS. And as the NET+OS 7.3 TFTP recovery issue patch documentation explains, a bootloader built with v7.3 does not use the image uploaded via TFTP (e.g. recovery doesn't work!). So I'm betting that there are a set of modules out there that have that bad bootloader image on them.

I'm currently planning to upload my own bootloader image immediately after uploading my application image just to be sure that I know that the installed bootloader image will work properly (unless hardware changes).


Message was edited by: jfichtner
answered Dec 11, 2008 by jfichtner New to the Community (12 points)
0 votes
Well, the original bootloader present when the modules were shipped accepted my 7.3 no patch image.bin. It would appear that, via TFTP, those modules will not accept any other bin. And since the firmware is locked for some other reason that prevents me from doing a normal FTP, it looks like I'm well and truly screwed.
answered Dec 12, 2008 by jwormsley Community Contributor (78 points)
0 votes
The bootloader that comes on the modules is not what's used to upload a new image.bin. There is an application running that allows image uploads via FTP, or via the netosprog.exe program. The problem (it seems) is that the normal TFTP bootloader recovery method doesn't work with these modules, so since you've uploaded an image that doesn't work, the module is now a paper weight!

This news does scare me, though, since it seems to mean that Digi actually changed something in hardware that is not compatible with images compiled under previous NET+OS versions. And they did so without any warning. So in a year when they do that again, my production image will now have to be updated to the newest NET+OS? And every year after that? I really hope that is not the case. Especially if there's no advanced notice. This would mean that my company would have to stop production until I ported to the new OS and fully tested! That is very, very bad news indeed.
answered Dec 12, 2008 by jfichtner New to the Community (12 points)
0 votes
Have either of you confirmed this with Digi (the hardware change that is).

It seems like that would be a pretty major change.

The ConnectMe modules that we just purchased came with the FTP app from 6.X (at least, that is what it said when I logged in). I was able to upload my initial firmware via the FTP, and (on a working module) upload emergency firmware via TFTP. So maybe the modules I have aren't the "new new" ones that have this issue that you are seeing? It seems odd that something that huge would make it past quality control.
answered Dec 12, 2008 by nfgaida New to the Community (27 points)
0 votes
I definately have not confirmed, nor have I seen the suspected hardware change issue. Only the TFTP recovery issue.
answered Dec 12, 2008 by jfichtner New to the Community (12 points)
0 votes
The original issue I had revolved around DNS lookups. My 7.0 app worked fine on older modules, up until the module which has an R under the end of the bar code. The next module we got had a T under the end of the bar code. With these modules, DNS lookups no longer worked. This was confirmed and I worked with Charlie to get a fix. The fix was two-fold. One, I had to make a few changes to my app to disable secondary interfaces, and two was to upgrade to 7.3. However, with my 7.3 apps, I am finding that they will work for a while, then lock up as described earlier (and that may or may not be related to 7.3, or my code) , and when that happens, I can't recover because the TFTP recovery method doesn't work. If I could recover, I could probably figure out if the lockup I am experiencing is caused by 7.3 or my code. But since I can't recover, I'm doing nothing but making $50 bricks.
answered Dec 12, 2008 by jwormsley Community Contributor (78 points)
0 votes
So you're saying that the modules work for a while, then "lock up" permanently? In other words a reboot doesn't fix it? If that's the case, my first thought would be that some sort of non-volitile setting or state is being set somewhere that is causing the module to hang in some call. I might be stating the obvious here, but I just wanted to be clear.

Do you have a JTAG unit and debugger to test on? Is the situation repeatable enough that you could use the IDE to step through on startup, after the situation occurs, and find out where it hangs?

Another thought is that if your application runs long enough, can you use it to immediately update your bootloader to one that works (via FTP)? That way you could recover when it starts hanging again?

But now I'm thinking that if it's some nonvolitile setting that's causing it, then replacing the firmware image with an identical one won't fix the problem.
answered Dec 12, 2008 by jfichtner New to the Community (12 points)
0 votes
Yes, once it has locked up, it never returns, but I don't know what triggers it. Anecdotally, I'd say you are right, something that updates the nonvolatile storage causes it, as most reports are that it happens when network settings are changed, similar to the situation that occurs when someone sets an incorrect static IP, but not the same, because in that case something like netosprog can get it back, and in my case you never get a TCP stack up for netosprog to talk to. I can verify no TCP stack comes up via ethereal/wireshark. Not a peep from anything with a Digi MAC.
answered Dec 12, 2008 by jwormsley Community Contributor (78 points)
...