coding query for geeks

    First Previous
  • jrmat 5 Sep 2019 14:31:36 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    A friend of mine is producing a staffing rota, bit it's a nightmare. There's one person per place, however, many special conditions, some places are only open at certain times, some people work at different times, so e people need to work in a certain place on a certain time, unless they're on holiday etc etc.

    To solve this I created a spreadsheet with all the above 'rules' and a way of scoring a solution.

    Then using some code in R I would read in the rules, generate a random schedule, score it. I'd run this as many times as I could and hold onto the best scoring solution at the end of the run, of say 10k runs.

    The problem is that it took ages to run, seemingly getting slower per run (loop iteration). So I'm trying to find another way of running it.

    I thought of using C, which would be much quicker. However the user is not a techie and the rules, such as holidays, would change, so ideally I'd like to have an excel front end and then somewhere in the backend all the 'magic' happens. Is it possible to have the front end in excel and then using VBA hook up to some c code, run the loop, then spit out the best fit solution back to excel?
  • Phattso 5 Sep 2019 14:44:42 24,819 posts
    Seen 2 hours ago
    Registered 16 years ago
    The R code should be plenty fast for something like this. If itís getting slower each run that suggests a coding flaw?
  • jrmat 5 Sep 2019 14:58:29 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    That's what I thought. Ideally I'd like to run it for a loop of a million. I'd thought that'd run in minutes. I'll revisit my code but has anyone run a loop millions of times without any degradation?
  • nudistpete 5 Sep 2019 15:00:02 861 posts
    Seen 2 hours ago
    Registered 2 years ago
    Phattso wrote:
    If itís getting slower each run that suggests a coding flaw?
    This sounds right. Generally if something's getting slower you're running out of memory or you're doing something daft.

    I'm not sure what debugging is like in R, I've only ever written scripts to produce datasets in obscure formats in it, but a couple of print statements in various places (and after every 100 loops) should help you find out where it's an arse.

    If you need to go faster, look at C# or Java. Both are quick to learn and are quite quick for stuff like this, there are plenty of ways of reading from/writing to excel sheets in C#.
  • TechnoHippy 5 Sep 2019 15:05:55 15,382 posts
    Seen 1 hour ago
    Registered 16 years ago
    Are you storing lots of things in arrays that are growing with each iteration?
  • neilka 5 Sep 2019 15:28:51 22,656 posts
    Seen 42 minutes ago
    Registered 14 years ago
    10 PRINT "GET BACK TO WORK"
    20 GOTO 10
  • jrmat 5 Sep 2019 16:05:57 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    TechnoHippy wrote:
    Are you storing lots of things in arrays that are growing with each iteration?
    I'll have to revisit my code, but I'm sure I'm not. I store the best scoring schedule, but that's it.
  • jrmat 5 Sep 2019 16:07:00 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    To add I've done a similar exercise a few times and similar things have happened. It might be an error in my code but the code isn't that complex.
  • senso-ji 5 Sep 2019 16:11:26 9,791 posts
    Seen 22 minutes ago
    Registered 11 years ago
    Post the code and we'll take a look at it. I'm not an expert in R but I have dabbled in it once or twice.
  • jrmat 5 Sep 2019 16:41:27 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    Will dig it out when I get home.
  • TechnoHippy 5 Sep 2019 16:44:34 15,382 posts
    Seen 1 hour ago
    Registered 16 years ago
    jrmat wrote:
    TechnoHippy wrote:
    Are you storing lots of things in arrays that are growing with each iteration?
    I'll have to revisit my code, but I'm sure I'm not. I store the best scoring schedule, but that's it.
    If you're dynamically resizing the array then that can become a performance hog over time.
  • jrmat 5 Sep 2019 22:11:41 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    Here's the code below:



    # generate random number
    # checks:
    # 1. does it conflict with any pre-required dates
    # 2. does it conflict with any previously generated for that shift
    # does optom work that shift?
    # is practice open?
    # when completed run fitness algorithm
    # work am and pm in same place if practice is closed in pm and optom working all day

    # optom has to work at these practices at these times
    file_omust
  • jrmat 5 Sep 2019 22:13:12 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    Argh, anyone know how to get around the problem of using angled brackets?
  • monkehhh 5 Sep 2019 22:26:18 5,142 posts
    Seen 6 minutes ago
    Registered 11 years ago
    Try Pastebin and post the link?
  • askew 5 Sep 2019 22:31:34 19,770 posts
    Seen 55 minutes ago
    Registered 14 years ago
    If that's R, it's very good at self-documenting ;)
  • jrmat 5 Sep 2019 22:37:08 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    Thanks, here's the link: https://pastebin.com/ucuM5efy
  • senso-ji 5 Sep 2019 22:53:40 9,791 posts
    Seen 22 minutes ago
    Registered 11 years ago
    Ok. Well, first off, running a nested for loop for thousands of iterations is going to slow things down. Your algorithms' time complexities will be n^2. I don't know how long it takes to check each permutation but 10k squared is going to be sluggish.

    I don't have the data at hand, but is there a way to store some of the data in a dictionary/hash table instead, and look that up when checking for each permutation?

    Another method is to maybe sort through the data first, then it might only need to loop through the array once, if that helps.
  • IMO 6 Sep 2019 07:21:17 7,021 posts
    Seen 58 minutes ago
    Registered 15 years ago
    I'm no R expert so take all of this with the appropriately sized pinch of salt.

    Expanding on what senso-ji said, It seems like, for at least some elements, you are generating the same information on every iteration. i.e. for "is optom working a particular shift" or "is there a compulsory practice for optom?" - given that the input is not changing, you should be able to pre-calculate that and pass it into the runrota function - it would also give you a chance to identify the r,c combinations that might actually change on every iteration (I'm guessing docalc is non-deterministic)

    I imagine the newsample generation is going to be computationally relatively expensive, so you should only calculate that if you have to.
  • mothercruncher 6 Sep 2019 07:35:42 16,474 posts
    Seen 28 minutes ago
    Registered 13 years ago
    Jesus, I feel disgusted. OK geeks- itís solved now- can we get back to whacking each otherís arses with rolled up towels and trying to look up Sandraís skirt now please?
  • jrmat 6 Sep 2019 10:06:05 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    Thank for the thoughts, yes, I think there are a couple of ways to run this differently to reduce the complexity.

    Omust, osched are grids in the same layout as output, which get compared to see if the optom has a location already assigned or if the optom doesn't need to work that shift and so doesn't need to have a shift assigned. From what you've said I think I need to do as much as I can outside the main loop and pass it in, I'm sure there's a way to do this.

    Regarding dictionaries and hash tables, I didn't think you could do these in R? I believe using a list or a vector would be quicker, so I could convert the whole thing from a data frame to a list, this should speed it up somewhat.
  • senso-ji 6 Sep 2019 10:29:16 9,791 posts
    Seen 22 minutes ago
    Registered 11 years ago
    If R's answer to dictionaries or hash tables is vectors then you can try that.
  • jrmat 6 Sep 2019 10:35:07 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    Yeah, I think that alone will make quite a difference.
  • BinaryBob101 6 Sep 2019 21:30:33 26,979 posts
    Seen 60 minutes ago
    Registered 10 years ago
    Is this where we talk about sous vide at home?
  • Nazo 7 Sep 2019 10:17:57 1,039 posts
    Seen 2 hours ago
    Registered 10 years ago
    Your DoCalc function takes 3 parameters but only appears to only use 1 of them.

    If I'm reading the code correctly (and I'm probably not, fuck untyped languages), the only thing that varies between runs is that it uses a random number based on the number of rows? Probably you'll get diminishing returns from increasing the number of iterations, it might be better to do all the permutations? How many rows are there?

    Also, I believe R passes variables by value so passing those tables and arrays to functions could be quite memory intensive. That might be why your program slows down the longer it runs.
  • jrmat 7 Sep 2019 14:46:46 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    You're right about the docalc function, great spot, I'll fix that one. There are a couple of tweaks I can make to removes some declarations out of the internals functions.

    I'll have to do the maths to figure out the number of permutations, but that could well be quicker. Great idea. I'm away from my computer right now so will check it later. Is there anywhere I can put the underlying files up? There's nothing confidential there.

    I ran a loop of half a million iterations and it took 336 minutes. I've plotted out a handful of different sized loops and it follows a very linear relationship. I had thought it was an exponential one but it's not.

    Edited by jrmat at 14:48:17 07-09-2019
  • Nazo 7 Sep 2019 16:46:14 1,039 posts
    Seen 2 hours ago
    Registered 10 years ago
    Github or something like Dropbox and a shared link?
  • jrmat 8 Sep 2019 17:34:16 108 posts
    Seen 35 minutes ago
    Registered 2 years ago
    Link!

    I've uploaded the file with a brief explanation of what you're looking at. You'll need to export just the top left table as a csv to match the names in the code.

    I actually need to use a 2d array, so switching it to a vector won't work, which is annoying. I'm comparing the schedule for all the locations and to itself for each Optom.
  • First Previous
Log in or register to reply

Sometimes posts may contain links to online retail stores. If you click on one and make a purchase we may receive a small commission. For more information, go here.