Excel file builder evaluating desktops

By Doug8765
May 30, 2015
Post New Reply
  1. Hi -
    I have not figured out whether I should build my own computer, but I'm trying to evaluate the ads I see for desktops. My problem with desktop ads is that when proffered computers go up in price they always dramatically increase disk space, video power, memory and CPU, but I think I only need enhanced memory and CPU. I don't know how to evaluate the offerings.

    I am an Excel file builder. I spend a lot of time building 100-200 megabyte Excel files and they just crawl. So frustrating. New cell entry is slow, navigation is slow, but I don't have more than 3 gigabytes of core Excel files. My backups are far more than that, but when the prices of a new proffered desktop heads into the most expensive then they seem to cost much more for features I will not need. What configuration do I need to do to get far faster Excel performance? Am I on the wrong track completely? How much faster are current desktop offerings than my HP p6240f?

    My current system is a HP p6240f, Intel Core 2 Quad CPU, Q8300 2.50GHz with 8 GB installed memory running Windows 7.

    I always appreciate the feedback from TechSpot forums.

  2. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

    Hmm; Yes large spreadsheets eat resources. Typical systems are 2+ Ghz and 4GB ram

    A single sheet with 1000's of rows is basically unusable anyway and the user has to use 'find' to even locate where to begin.

    The typical solution is to use a roll-up to summarize data elsewhere.

    Take a large company Pnl as an example

    Major items in the PnL are
    • ordinary income
    • gross Profit
    • expenses
    • net ordinary income
    • other income/expense
    • net other income
    • net Profit

    Imaging the size of the income and expense sections if there are several departments and perhaps regional breakouts! Placing each of the above in its own spreadsheet and rolling-up the results allows access to the level of detail needed by the user.

    Learning to roll-up begins with limited data and adding spreadsheets to the master file (thus the file is still has large, just better structured)

    When that is mastered, the project is exported into one file per spreadsheet and the master sheet need only add the excel reference for an external file

    Maintenance can then be performed on the unique spreadsheets without the necessity of accessing ALL the data ALL the time.
  3. Doug8765

    Doug8765 TS Booster Topic Starter Posts: 189

    Hi J.O.Beard -
    I agree that it would be helpful if my models were such that I could do that. I tried an experiment recently with a reasonably large dataset, trying to keep the data and the model completely separate, but that entailed massive numbers of offsheet references, which only slowed everything down even more plus it made modifications more difficult.

    I'm about to double the amount of data - part of the reason I'm looking for a hardware resolution - but it's possible that the roll-ups that you describe are about keeping the offsheet references as summary/agglomeration values and thus few in number and not moved about much. My sheets need access to all the raw data, so that's been not available, but I should rethink.

    One of the tactics I've used successfully is to add new data in brand new sheets, kind of like scaffolding, so I get the instant response time. Then I paste the whole giant piece into the model and fixup the linkages. That works pretty well.

    New systems seem to always add so much more disk space and video power when CPU and memory gets scaled up, which makes the cost unjustifiable.

    Always glad to read your replies.

  4. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

    Divide and conquer is a fundamental principle in 'tuning' (systems, programs, and databases) so I would struggle to factor the data into independent files (perhaps CSV) so the main report could be obtained by
    • delete report.csv
    • append file1.csv to report.csv
    • append file2.csv to report.csv
    • ...
    • append fileN.csv to report.csv
    each fileN.csv is the easy to maintain and update and the final report.csv easily imported for analysis

    The issue then becomes the formulas in the cells. How prolific are they?
  5. Doug8765

    Doug8765 TS Booster Topic Starter Posts: 189

    Hi -
    I write lots of formulas. I try to make everything dynamic and formulas are the way to do that. I roughly guess that I use 10x formulas for each data point. Some more, some less.

  6. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

    Ouch; that's where the processing goes!! If every cell has just one formula and the sheet has 1000's of rows; GULP.

    Number crunching (as opposed to graphing) in the magnitude is usually done with an
    Array Processor. Big main frames have these but not sure you can get a pci card with it.
  7. Hi -
    Are we writing about a special purpose computer - an array processor? This is the first I've heard of Excel used in conjunction with an array processor. Seems like it would cost just as much as buying all that storage equipment, but perhaps it might work faster. I'll google it.

  8. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

    An Array Processor is much like a floating point chip or the GPU; special purpose ALU (Arithmetic Logic Unit). The APU is dedicated to number crunching.

    If one is could be found, then this would create a co-requsite of code in Excel to implement it. Probabilities are low that both would be available.
  9. Hi -
    In the course of googling about array processors, I found that I've been using a volatile function - INDIRECT().

    Something to think about.

  10. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

  11. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

    Hmm; I also just found
    Excel sheet has 17179869184 cells and there is no reason why we should not use a few to support us in our formulas or models. Use helper cells, they keep your worksheet simple and easy to understand.​
    The issue is the fixed, finite limit of cells
  12. Doug8765

    Doug8765 TS Booster Topic Starter Posts: 189

    Hi -
    First, the helper cells thing. I've no problem breaking down a complex formula into intermediate steps. It's a form of documentation. Makes debugging easier. Doesn't always mean that I don't remember to do so all the time. 8->

    I've had time to think about INDIRECT(). There's no way I'll stop using INDIRECT() - or INDEX(). I do all this Excel work for dynamic responses. If a performance hit is the price then a hardware upgrade sounds like the alternative.

  13. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

    Well, if the budget can stand it, go for a server class box
    • 8 cores with hyper-threading
    • 16gb memory
    Remember however, if you publish the final results to others, they too need this horsepower.

    Speaking of publishing (aka redistributing): -- consider export to PDF which obviously creates a R/O copy of the work (suggesting the table needs a publishing date)
  14. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

  15. Doug8765

    Doug8765 TS Booster Topic Starter Posts: 189

    Hi J.O.Beard -
    I get embarrassed when something as simple as getting a new version of the Excel software slips by me. I'm using Excel 2007. This first of the links you posted for me - the msdn article - makes clear that some nice improvements for larger files have been built into Excel 2010 and Excel 2013. MS Office 2010 would be less of a budget hit. I'll look further at 2010 and 2013.

    I post results of my work, but not my files.

    The default setting for number of processors is the number of processors in the computer, so I'm already using all 4. Thanks for suggesting a higher number and higher amount of memory.

    The volatile functions problem does not concern me. RAND() is the truly volatile function, since by definition it changes value every single recalc - every downstream cell is recalc'd. I don't put it in key places of large files. I never use NOW() or TODAY(). Most volatile functions are not like RAND(), for instance INDIRECT(). INDIRECT() - which I use a lot - usually does not change, unless there's a geometry change in the sheet. If the values don't change then there's no recalc - the cell will be looked at but not recalc'd.

    Thank you.

  16. jobeard

    jobeard TS Ambassador Posts: 9,156   +598

    Common principles in optimization are
    1. do it once
    2. do it right
    and I see
    you're on that path.

    Personally, I disdain doing the MS thing of recommending "more hardware" - - especially when some known solutions can lead to better results or at least more efficiently derived results.

    There were some write-ups on using trade-offs to improve resource consumption. Try some on smaller tables until you convince yourself that you're at a diminishing ROI. Then punt - - hey your time is valuable too :)
  17. Doug8765

    Doug8765 TS Booster Topic Starter Posts: 189

    Hi J.O.Beard -
    I'm a fan of optimization - from my assembler days. Find something that works and lean on it. Excel works. Thanks for reminding me.

    I've appreciated your help before. Thanks again.


Similar Topics

Add New Comment

You need to be a member to leave a comment. Join thousands of tech enthusiasts and participate.
TechSpot Account You may also...