Machine crash - need more cache, memory, cpu??

Status
Not open for further replies.
Hi fellow techies!

I'm an application and interface programmer (I'm a software geek) who has a problem with a real-time application -- real-time is not my specialty. I have two questions: 1) what should I be looking at to isolate the problem, and 2) what would be the best hardware fix -- increase CPU, RAM, cache or more processors. (I will be trying to optimizing the software too.)

Here's the background:

The application controls a motor, handles two video inputs, and does some image processing on those images, ie, resizing, stitching them together into bigger pictures, mouse tracking overlays, etc. The software is multi-threaded running on a duel core CPU.

After running a while, the whole application crashes on a function that we call a zillion times: BitBlt. Different threads can be the initial starting point in the crash, but the crash is aways on BitBlt when it checks for a valid destination. For example, one crash was from a video thread and another was on the stitching thread.

When we run it with just one video input, it runs fine. Here's a picture of the performance monitor:

View attachment 22496

The black line hovering around 60% is the CPU (% Processor Time). The green line at the top that dips like an icicle is the cache (copy reads/sec). The yellow line at the bottom is the cache (fast reads/sec). And the olive line that peaks with the yellow line is the memory (pages/sec).

The CPU dip on the left side of the chart is when we are going from one group of images to a second group and we put the video to sleep while the motors are moved. The yellow/olive line jumps that follow immediately are from another thread that is stitching the final composite image and outputting it.

When we add the second video input, the CPU maxes at 100% and it looks like the cache is doing a lot paging:

View attachment 22500

The final stitching (the yellow/olive jump in the middle) happens a few minutes after the start of a new image group (which has fallen off the chart). Shortly after, you can see the CPU (black line) taking a dive as the program crashes. Note: it takes a couple of image groups before the crash happens -- about 10 human minutes.

To stop the CPU from pegging, I'm putting the non-critical video into a long sleep loop. (I can't take down the video thread entirely because I lose data.) And I also take down a non-critical tracking mouse overlay thread. Things look a bit better:

View attachment 22501

Now the CPU hovers around 96% and the cache (copy reads/sec) have settled down. In this scenario, you can see two distinct threads: the switch between image groups (CPU dip) on the left middle and the final compositing thread on the right middle.

Now back to my questions:
1) What would be reasonable settings in the performance meter to help isolate the issue? (I'm not a hardware person and could use some direction.)

2) What would be reasonable to add to the hardware to give us a better safety margin to keep the software from crashing? Do we need more cache? Change from a duel-core to a quad? More RAM?

Or is threre a system setting we should be looking at, like page size? (We are running XP pro.)

Thanks so much for any help.
 
I'm not good with hardware specs and the guy who knows has left for the day. I think we have: Intel board (DQ965) with E6400 Dual Core CPU (2.13 GHz with 2M cache).

We just got some quads (Q6600) in this afternoon and the hardware guy installed them in the motherboard before he left. I ran the software and it ran faster (CPU running about 60%) but it died at just about the same spot (just getting there faster).

We also tried just adding more RAM (and increasing the paging size) with the same CPU, but it ran slower (with a lot of cache paging). It also died.

Help!

What should I be looking at to help identify the problem???

Thanks.
 
As far as cooling, it has two fans that pull air through the box with the boards aligned to allow air flow -- as the hardware guy explained it to me. (These are custom builds.)
 
Status
Not open for further replies.
Back