PhD progress

Wednesday, 27 February 2013

Generation of four random weights in Excel

Hello!

Following my first Excel related post (Creating automatically series of repeated sequential numbers in Excel - Link), below you can find the solution to a different challenge that we faced while working in one of our research projects. The post explains how to generate four random weights in Excel, based on specific limitations.

Our example explains the method used in order to automatically generate X times (in our case X was equal to 1000), a set of four weights under the following specifications:

- Each weight has an original value:

- The sum of the four weights (original & random) should equal to one:

- Each weight can have a variation of ±0.1 from its original value (±10% of the scale).

An example of the original values and the minimum and maximum values of each weight is given at the following two tables. We are only interested in the set of 4 weights that are presented on the 2nd line of each table.

A table containing the values (original, low and high) of each weight should be present in a table in the excel file so that the user can call those values in the formulas, when necessary. Avoid including specific values in a formula, as it will limit your flexibility of doing changes that are updated automatically everywhere.

In order to randomly choose a number in a specified interval (e.g. (a,b) ) in Excel, all we need to do is to use this formula in a cell: =rand()*(b-a)+a

1. Initially

are randomly picked X times in the designated interval, following a uniform distribution.

2. We define the change in

3. Due to the restriction for the sum of weights (1), the sum of the changes in weights should be equal to zero. Thus:

4. Due to the restriction of the intervals that the weights should be placed, we know that:

5. From (2), (3) and (4)

and therefore:

All terms are known and by solving for

we obtain:

6. By the definition of our project, we know that

7. At this point, we need to check if the interval of

should be modified to follow our restrictions. This is simply done by creating the intersection of the intervals given by (5) and (6).

8. We randomly pick

X times in the modified interval that we calculated in step 7 above, following a uniform distribution.

9. From (1) we calculate

by subtracting from the unit

and thus:

In this way, we are able to calculate the four weights for the purposes that we need them. The implementation in Excel is extremely easy:

- Weights w1 and w2 are calculated based on the formula: =rand()*(b-a)+a

- For weight w3, there could be various implementations. The one I used was:

a. To calculate the values w3(low_b) and w3(high_b) provided by formula (5) in two columns

b. To use the formula:
=RAND()*(MIN(w3high;w3high_b)-MAX(w3low;w3low_b))+MAX(w3low;w3low_b)

- Weight w4 is simply calculated based on the formula: =1-w1-w2-w3

Saturday, 13 October 2012

Creating automatically series of repeated sequential numbers in Excel

Even though excel can be very handy for many user-types and for many different applications, there are some cases where it seems that its developers were not carrying much about the built-in functionalities. As a result, they probably considered that all user are able to produce small pieces of code or are born with programming way of thinking, which obviously is not the case in our everyday life. After this small introduction, let's go straight to the point:

I'll summarise here three different ways of producing automatically and with very little effort a series of repeated sequential numbers. What does that mean exactly?
- Sequential numbers: Even though it is self-explanatory, obviously we're talking about a series of (user specified) numbers, that are within a specified interval,and there are no numbers missing in the sequense. e.g. 0,1,2,3,4,5,6,7,8,9 is a series of 10 sequential numbers in [0,9]

- Repeated: with this we imply that each number is repeated X times before we move to the next digit. In our example, if we wanted to repeat X=3 times, our series of repeated sequential numbers is 0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9.

Obviously, if you have such a limited series, you don't need any kind of automation, you just go to the related excel cells and fill in the numbers, period. BUT what do you do if you have a series of 85 numbers that are repeated 10.000 times? Obviously, you cannot fill in those 850.000! lines manually.

While struggling to find a solution and with quite some time spent on the internet, i've figured out the following three solutions, in order of preference for their use:

using the command 'MOD', which provides the remainder of a number when it's divided with a specific denominator, nested in an 'IF' conditional.
using the command 'INT', which rounds downwards a given number, to the closest integer.
using an 'IF' conditional formula.

In detail, we will use as an example a series with 20 numbers repeated for 10 times (remember: how many times these numbers are repeated is irrelevant to the formulas needed to input in excel). It's easier to just calculate those numbers in a separate sheet, then copy only the values to your destination cells, and discard the formulas:

For the first solution we follow these steps:
- Insert the first number, e.g. 0, in our first cell, e.g. A1
- Insert in the cell below, e.g. A2, the following command: =IF(MOD(ROW(),20)=1,A1+1,A1)
- Drag down the formula to the desired number of cells, e.g. till A200 in our example.
- Obviously, the only adjustable parameter is the number '20' which can be set according to the needs of the user.
For the second solution we follow this step:
- Insert in the first cell, e.g. A1, the command: =INT((ROW()+19)/20)
- Drag down the formula to the desired number of cells, e.g. till A200 in our example.
- In this case, the adjustable parameters are obvious, '19' and '20'.
For the third solution we follow these steps:
- Insert a zero (0) in each cell of A1:A20 (twenty cells)
- Insert in the cell below, e.g. A21, the following formula: =IF(A20=E1,A20+1,A20)
- Drag down the formula to the desired number of cells, e.g. till A200 in our example.
- This solution is obviously harder to update and the least effective of the three, but i kept it in the list for the crazy ones that may prefer it.

Regarding the first two approaches, they work equally well (like a charm) and have saved me plenty of precious time.

This way of producing repeatable sequential numbers can be very handy for anyone working on Monte Carlo methods, uncertainty analysis and/or sensitivity analysis of models as well as for any other data handling purpose. It is well expected that anyone could argue and point out "Why work on excel for a Monte Carlo application instead of Matlab or R" and I could immediately agree. But this discussion is out of the scope of my post.

For any questions or remarks, feel free to leave a comment or contact me by email.

Thursday, 11 October 2012

Update (and justifications)

It's interesting to notice that my last post on the blog was published in October 2011...and now we've already reached October 2012! There are two interpretations of this fact: the optimistic and the pessimistic one....read on.

Obviously the pessimistic would be that "nothing worth to share with the world has happened during this one year of my PhD programme and no valuable work has been produced". Luckily, this is not the case and we are on the optimistic side: There have been numerous interesting things happening, such as seminars, conferences, projects, long fights with MS Excel, interesting results produced, submittions of articles and abstracts, collaborations with other PhD students of my group, non-academic activities, lots of travelling, weddings and kids everywhere...and thus, finding time to report, on a regular basis, on everything that has happened was practically impossible (given my origin and the strong forces of procrastination that rule this world)!

To my atonement, some posts have been saved as drafts for months on the blog and were not published on time (or never if you prefer), other were kept forever in the word document containing 'updates for the blog' and so on.

Hopefully, i'll be able to produce a summary of only the really interesting things and make few posts in the next days. Please bear with me and thank you for the interest in my blog :)

PS. I won't promise (cause i hate breaking promises) but my idea of providing small and useful articles on MCDA, uncertainty handling, sensitivity analysis, challenges with excel and more, is still vivid and hopefully will be implemented soon. The first two topics will be excel related and specifically on the 'generation of random weights for Monte Carlo analysis' and 'generation of long series of repeatable numbers'.

Monday, 3 October 2011

Week 17

The entire week 17 (26-30/09) was devoted to the preparation, programming and execution of the Sensitivity Analysis of the Nanex project, based on the Monte Carlo simulation, in MS Excel. The method included the calculation of the uncertainty factor for each indicator used, the corresponding error for each Line of Evidence, the application of the Monte Carlo method for 1000 runs for each group of Exposure Scenarios and the calculation of the relevant average differences in terms of values and position for each exposure scenario.

Side activities:
Calculus PhD course, SUWAMA course evaluations summary, AMORE recap, Triennial PhD plan.

Sunday, 25 September 2011

Week 8 to week 16

Summer has been challenging, in many different ways. Many interesting things have been happening and breaking the rule of updating the blog at the end of each working week (or every 2nd week the latest) won't help me much in providing a good quality summary. Nevertheless, you can find below the main points of interest between week 8 (25-29/07) and week 16 (19-23/09), organised per project.

AMORE:

After the meeting with the partners from EDF, we summarised the data obtained from the experts throught the questionnaires and planned the next steps. Main points of work:

Update of the criteria hierarchical tree, in order to take into consideration the experts' input (week 9)
Re-design of the scales used that will be used in the 2nd expert questionnaire (weeks 8-9)
Preparation of Intermediate report for delivery to project coordinator (week 8)
Research on Group Decision (GD) theories for application into AMORE scheme (week 10)

TRUST:

This project is not a direct part of my work but we have been working on the development of a very interesting application in MATLAB environment for simulating aquifers for Groundwater level variation (GLV) project with the use of linear algebra and the calculating powers of the software. (weeks 8-10).

SUWAMA:

The SUWAMA conference took place in Istanbul in September 2011. During week 14, we have been intensively working on the preparation of two presentations on:

- Multicriteria Decision Analysis (MCDA) and Decision Support Systems (DSS)

together with Dr. Alex Zabeo, that included both theory about the abovementioned sectors as well as many examples and applications for fair understanding of the topic by the participants. They were delivered at the preparatory course, as part of the pre-conference activities.

Others:

Participation at the seminar of Dr. Nobert (partner of KULTUrisk) regarding Ensemble Forecasts and Ensemble Prediction Systems (EPS) (week 16).
Meeting with KULTUrisk partners (CVR, Department of Economics, Dr. Nobert-week 16).
Introduction to NANEX project and the assessment of nanomaterials, for performing a sensitivity analysis in the near future.
Attendance of PhD course (Calculus I) delivered by Prof. Giove (will last till mid-November).
General study of various mathematic and other sectors:

Matrix computations and Operations with MATLAB
Outliers in statistics
Group Decision Theories
Sugeno and Choquet integrals
Preferential Independence and Rationality

Note: I was on holiday for three weeks (11, 12 and 13).

Monday, 25 July 2011

Week 4 to 7

Below you can find the summary of the work done between weeks 4 (27/6-1/7) and 7 (18-22/7).

AMORE project related:

Discussions with EDF on the procedure to be followed regarding the extraction of expert knowledge (week 4)

Summarisation of the large email/document exchange
Postponement of the preparation of one of the questionnaires for later on.

Meeting with Professor S. Giove (week 6) to discuss procedures used within AMORE, from mathematical point of view.

Preparation of the first questionnaire for the extraction of expert knowledge for the tasks of the AMORE project (week 5&6)

Construction of the scheme, based on the notes from EDF
Construction of the questionnaire in pdf format and sending to the experts (creating experts database as well)
Collection of answers and summarisation of the results to be used at the meeting between CVR and EDF (21 July 2011)