A Theory of Human Capital Investment

Panhaboth Kun
21 min readOct 18, 2021

--

In this essay I model the firm as a producer of human capital (one may think of this firm as a government) and show that efficiency conditions systematically require investment be smoothed out among individuals or groups of individuals in the economy. Then, we will investigate the factors that skew this result by adding a further complexity to the model - namely, differences in the characteristics of individuals or groups. Using this more complex model I demonstrate a general solution for distributing education investment funds. But the key feature of the tools we develop here lies in its ability to analyse existing policies rather than design new ones. Indeed, reform is needed only if we are confident in the disutility of the status quo. Thus, the mathematical rigour that will ensue serves the single purpose of coming up with an inefficiency index which in this essay I call the lambda-multiplier.

Introduction

We begin by defining the problem faced by the government as a producer of human capital. In so doing, we enrich our model with objectives and constraints defined over two distinct classes of objects called ‘given parameters’ and ‘policy parameters’. The policy parameters are our solution of interest, and the optimal values they take on depend on the given parameters. In mathematical terms, optimal policy parameters are functions of the givens. Before we begin the task, though, it is imperative we acknowledge that our problem exists within the fabric of many interconnected problems. As such, while given parameters are taken to be constant in the definition of this problem, they may have been policy parameters in and of themselves in some problem else. We will shortly see concretely what I mean by this.

The idea of the problem is as follows: given an investment fund for education E, how much should be invested among a pool of students in order to maximise the total returns to education? For both geometrical and algebraic simplicity, I opt to restrict the pool to contain only two students, but the insights that we will develop come without loss of generality. Let us formalise the idea of the problem.

Consider the idea that students transform the education invested in them and turn this into income at a later time. Thus, each student is equipped with what I will proceed to call the ‘education transformation function’ whose value is future income and parameter is the education that is invested in them today. We will denote this transformation function as f(eᵢ) for individual i. Let us give shape to this important function. First, it is safe to assume that the returns to education increases when more is invested on education. So, f increases in eᵢ. Second, we assume diminishing marginal returns to investment at the individual level: for all individuals, the next dollar invested will never contribute to f as much as the previous dollar did. This will prove to be a key assumption in the determination of our optimal policy parameters. Finally, we give f a position in the Cartesian space. The returns to zero investment is zero future income. This assumption is not contentious if education is interpreted as any activity, in a classroom or otherwise, aimed at the production of future income. In fact, not only is it not contentious then that f(0) = 0, but it is also necessarily the case. One functional form that satisfies those conditions, which in this essay we work with, is:

f(eᵢ) = eᵢᵃ for any 0 < a < 1 and i {1, 2}.

The a is a given parameter and is not a solution to some other problem that the government faces. In fact, this parameter defines the sharpness of f’s curvature, which in turn reflects the degree of diminishing marginal returns to education for any investment level — indeed, the government does not choose such things (at least, this shall be our position for now). The closer it is to 1, the more linear the curve is and so the less the individual is subject to diminishing marginal returns to education. The closer it is to 0, the greater the difference in outcome between the ‘early dollars’ invested and the ‘later dollars’. I illustrate f(eᵢ) for two different values of a:

Figure 1. The red curve is close to linear. Diminishing marginal returns to investment manifest more clearly in the blue curve.

Now, you might notice that the values that f takes on are rather small over a very long interval: one would need to invest an astronomically large amount on education in order to receive one unit of income. While this is only a matter of unit and will not be of relevance, I proceed to fix the problem anyway to convince the reader that this problem need not any fixing in the first place. In particular, we easily fix the problem by giving f an appropriate scale to achieve the curve that we desire (see diagram below). But notice that the scale factor must be the same for every individual’s education transformation function because f is expressed in nominal terms so that if we were to give different scale factors to different individuals, then it must mean that these individuals are transforming education into income of different currencies or within economies whose average price levels differ. We are concerned here with a country’s investment fund on education, much less the world’s, and therefore a scale factor to fix the issue of unit must be the same for every individual.

Figure 2. Scaling the education transformation function fixes the issue where f is too small over a large range of x.

Soon, we will work with scales but interpret them differently. So, if it happens to be the case that equal scale factors across each individual’s transformation function gives us the same policy parameters as if the scale had been omitted, then the fact that f takes on rather small values over a long interval is irrelevant.

Further, we will assume that a is the same for everyone. Diminishing marginal returns to education is modelled like gravity: set in stone. The government’s objective is to maximise the sum total of future income by choosing the right e₁ and e₂ subject to the constraint that the sum total of investments do not exceed the available fund. More formally, the government maximises the objective function, denoted by F:

F(e₁, e₂) = e₁ᵃ + e₂ᵃ

s.t. e₁ + e₂ ≤ E.

This type of production function is called ‘additively separable’. It is expressible as a sum of single-variable functions of each of its inputs. This kind of function is not commonly used in production theory because it implies that not all of its inputs are required in the production process. In the context of the average coffee shop, this would make for a poor model of production if its inputs are labour and capital. Without the necessary tools, there can be no coffee. But neither too can there be any if there are no baristas. More formally, our production function differs from its neoclassical counterpart in that the Inada conditions do not hold. This is completely fine in our model: us splurging the investment fund on one person will not necessarily lead to zero contribution to output in the future.

As for the constraint, prices do not come into it because e₁ and e₂ are already expressed in nominal terms: the cost of $100 of investment is simply $100. Furthermore, it is timely to note that the E parameter is a given in this problem but of course is not set in stone. What proportion of tax revenue should be allocated to the investment on human capital was once itself a policy parameter; a solution to some optimisation problem. We will deduce three properties of F and one property of the constraint set that will be of great help to us in visualising the problem.

1. F is increasing in eᵢ for all i

2. F-level sets are convex towards the origin

There are many ways by which we could have proven this property. I have opted to show that the slope of every level set in the e₁e₂-space is negative. We know that the gradient vector in this space points towards the direction of F’s maximal differential. Therefore, the direction vector orthogonal to the gradient must be parallel to the slope of the level set: if the fastest way to hike up a mountain is to walk directly towards its peak, then walking perpendicularly to this direction on the horizontal plane amounts to walking along the mountain’s contour. The vector v is the direction vector of interest, and it is scaled so that its first component is 1, leaving λ to be interpreted as the instantaneous rate of change of e₂ with respect to e₁.

3. F-level sets are symmetric about e = e

For this it suffices to show that if (a, b) is on a level set, then so too is (b, a). The proof of this is rather simple: for any total contribution to future income F, assume F = F(a, b). Then, F = F(a, b) = F(b, a) because F(b, a) is a re-arrangement of F(a, b). The proof is complete.

4. The constraint set is a convex set

As a matter of definition, a set is convex if and only if all convex combinations of any two elements in the set are also elements of the set. A convex combination is any linear combination whose scalar coefficients add up to 1.

By definition, α is a real number in the interval [0, 1]. If α = 1 or α = 0, then the convex combination reduces to (a, b) or (c, d), respectively, both of which satisfy the constraint by assumption of the antecedent. If instead 0 < α < 1, then both α and 1 - α are less than 1, so that three and only three cases could ensue:

  • (c, d) satisfies the constraint at the boundary and (a, b) satisfies the constraint in the interior. In other words, c + d = E and a + b < E so that 1 - 2 < 0. Therefore, the sum of the components must be less than E because it is less than c + d as 0 < α < 1.
  • (a, b) satisfies the constraint at the boundary and (c, d) satisfies the constraint in the interior. In other words, a + b = E and c + d < E. α(1 - (2)) + 3 is monotonic increasing in α and its minimum value is c + d + ɛ while its maximum value is a + b - ɛ (substitute α = 0 and add a small ɛ to obtain the minimum and substitute in α = 1 and subtract ɛ to obtain the maximum). Because the maximum of this sum is a + b - ɛ, it follows that the sum cannot exceed E.
  • Both (a, b) and (c, d) satisfy the constraint at the boundary. In other words, a + b = E and c + d = E so that 1 - 2 = 0. Therefore, the sum of the components reduces to c + d, which, as we had presupposed, does satisfy the constraint.

For all collectively exhaustive cases regarding the position of the two vectors of interest, any convex combination of them satisfies the constraint (i.e., are elements of the constraint set) and so the constraint set is a convex set.

Implications of the Properties of F and the Constraint Set

Properties 1, 2 and 4 guarantee a unique solution at the boundary of the constraint set. In other words, if those three conditions hold, then the constraint binds at the maximum value of F. This, as we will soon see when we relax some of the assumptions and perform a full-fledge mathematical optimisation, greatly simplifies the problem by allowing us to equivalently express it as an unconstrained optimisation problem. For now, it suffices to visualise our solution geometrically because Property 3 also guarantees that the solution lies on the line e₁ = e₂:

Figure 3.

Our solution is characterised by the system of equations:

  • e₁ = e₂
  • e₁ + e₂ = E

Recall here the two classes of parameters: E is not an unknown because it is, as modelled in this problem, a given parameter. So, this is a system of two equations in two unknowns, yielding us a unique solution. Substituting the first equation into the second, we get:

2e₂ = E

so that e₂ = E/2. Substituting this into any one of the equations, we get e₁ = E/2. From this it follows: the optimal distribution of investment on human capital is equal investment for all. The reason for why the solution is characterised as such is due entirely to the mathematical inner workings of the assumptions we imposed on each of the relevant classes of objects. Some of these assumptions have been stated explicitly. For example, the smoothing out of investment is accountable to our assumption of diminishing marginal returns. If e₁ > e₂, then the marginal returns to education for individual 1 is always less than the marginal returns to education for individual 2, such that forgoing an infinitesimally small amount of investment on individual 1 in favour of individual 2 guarantees a net increase in total future income. Similarly, if e₂ > e₁, then the marginal returns to education for individual 2 is always less than the marginal returns to education for individual 1, such that forgoing a small amount of investment on individual 2 in favour of individual 1 is guaranteed to lead to a net increase in total future income. Therefore, at the optimum, e₁ and e₂ tend towards each other. The exhaustion of the entire investment fund is accountable to our assumption of increasing education transformation functions: if every dollar increase in investment now leads to some gain in future income, then it would not make sense to leave some of the dollars lying around uninvested.

The implicit assumption in this model is that all students in the economy are the same, since we have characterised each by exactly the same education transformation function. Why did we not make this an explicit assumption? Accountable to our philosophy of science, we explicitly prescribe axioms to our model only if they are a fine reflection of the circumstances of life as it is truly lived. But even then, we will not always be able to find the set of axioms that collectively exhaust every dimension of facts that make economic reality well-defined. Sometimes, indeed often, we will forget. And so in this case, we have forgotten about the other functional forms for f that nevertheless satisfy our prescribed axioms. Our decision to settle with

f(eᵢ) = eᵢᵃ for any 0 < a < 1 and i {1, 2}

arose out of thin air.

A Richer Model

Now we turn our attention to admitting potential differences between students in how successfully each can transform money today into money tomorrow. As suggested in the word ‘potential’, we will develop a more general model that can still capture the previous in its entirety. This is a needed model - as is often the case, one could be sent to a private school and perform poorly in the future for a variety of different reasons. For one, if the investment occurs in the form of a direct cash transfer, then there is no guarantee that the individual uses it to pay for education in the first place. We model this by including scale parameters in the students’ education transformation functions. We take these as given. In particular:

f(eᵢ) = τᵢeᵢᵃ for any 0 < a < 1, i {1, 2}, τᵢ > 0.

We assume here that there is no possibility for debt, so that any individual cannot transform a positive investment into a negative income. Hence why τᵢ > 0. From this it follows that the total contribution to future output for a given investment vector (e₁, e₂) is:

F(e₁, e₂) = τ₁e₁ᵃ + τ₂e₂ᵃ.

Figure 1 shows the f of two students with different values of the scale parameter. Instead of interpreting the scale as giving existence to an appropriate price level, we now interpret it as a productivity factor. Individual 2 is more productive or efficient because they can transform a given investment today into more income in the future, compared to individual 1.

The constraint set has not changed so it is, as before, a convex set and the boundary has a constant slope of -1. As an exercise, the reader can confirm that F is still increasing in each of the components of the investment vector and that the F-level sets are still convex towards the origin. But in general, F-level sets are no longer symmetric about e₁ = e₂. Let us, as an aside though, determine the condition for such symmetry:

Notice that F-level sets are symmetric about e₁ = e₂ if and only if τ₁ = τ₂. Furthermore, we know that, given the constraint, the optimisation problem if the level sets are symmetric about e₁ = e₂ will yield policy parameters restricted to e₁ = e₂. Therefore, in this model, the optimal policy parameters are e₁ = e₂ = E/2 if and only if τ₁ = τ₂. This is precisely what I had meant earlier when I spoke of the irrelevance of scaling the education transformation function to match the way in which we express currency, or equivalently, to equip our economy with the appropriate price level. To reiterate, the exercise of giving our economy an appropriate price level involves giving equivalent scales to every individual’s education transformation function. As we have seen only just now, doing so yields the same policy parameters as had we not done it in the first place. This insight extends to the other interpretation as well: only relative productivity will play a role in determining the optimal policy parameters. We capture this in a theorem.

Theorem: Optimal human capital investment is invariant to identical scalar transforms of the individuals’ education transformation function.

Having equipped ourselves with economic intuition and geometric insight, I turn our attention to the full-fledge optimisation problem. First, recall that we have conditions held (1, 2 and 4) so that a unique solution exists at the boundary. The problem therefore reduces to an unconstrained optimisation problem and we set up the Lagrangian as follows:

Notice that the optimal investment policy reduces to e₁ = e₂ = E/2 if τ₁ = τ₂ as is consistent with our earlier exercise. To see this, simply work with τ₁ = τ₂, which implies τ₁/τ₂ = 1. Substituting τ₁/τ₂ into our solution functions we see that e₁ = e₂ = E/(1+1) = E/2 because 1 raised to the power of any number greater than 1 is 1, and indeed|1/(α - 1)| > 1 since 0 < α < 1.

Furthermore, how the optimal policy responds to changes in the τ-parameters is captured entirely in how it responds to changes in the ratio between τ₁ and τ₂. This is because our solution functions can be expressed as taking in the ratio τ₁/τ₂ instead of the full-fledge τ vector as one of its parameters - to verify this, notice that every instance of τ₁ and τ₂ on the right-hand side of both solution functions appears in its ratio form. In that sense, it is not the absolute values of the τ-parameters that matter in the determination of optimal policy, but rather how τ₁ and τ₂ geometrically fare with each other. In other words, how investment funds for education shall be distributed depends on the relative productivity of the pool of choice variables.

While these insights are a mathematical proof of our earlier posits (namely, those surrounding the redundancy of equipping our economy with a meaningful price level, and those surrounding the imperative of the productivity ratio), these insights also greatly shorten any comparative statics analysis we decide to conduct. Particularly on the solution functions, we need only differentiate e₁* with respect to the ratio τ₁/τ₂ and realise that the rate of change of e₂* with respect to that same ratio is the previous partial derivative’s negation because the same amount of e₂* must be offset for any change in e₁*, so that the constraint remains binding and satisfied. As can be seen, our expression of education in nominal terms has proven very mathematically convenient. With this in mind, we find the partial derivatives. Using the quotient rule:

We now see that e₁* increases in τ₁/τ₂ since α - 1 < 0. Further, e₂* must decrease in that same ratio because both cannot increase at the same time were we to still satisfy the constraint. In particular, if e₁* increases by some value d, then e₂* must decrease by d as we had earlier posited. The first partial derivative I had computed by hand and the second partial derivative followed by implication. We can use the chain rule to decompose the effect into the two partial differentials that are responsible for the change to the optimal solution. More precisely, if τ₁ increases, then the ratio increases and by virtue of the partial derivative, so too does e₁*. If τ₂ increases, then the ratio decreases and so too does e₁*. Investment smoothing is thus skewed by differences in relative productivity of each individual. This gives rise to a window of investment vectors from which the optimum would diverge. But the tendency for divergence degenerates at the limit as |τ₁ - τ₂| → 0.

A rigorous interpretation of the word ‘tendency’ could involve constructing a σ-algebra on this line and assigning its elements a probability measure in a way that the distance |τ₁ — τ₂| is equivalent to the probability of divergence. This is especially useful if we want to model productivity as being composed of random variables (think of talent) and calculate the natural likelihood of divergence.

Implications for Policy Design and Analysis

To begin our reflection with, I want to develop our insights into one main theorem. Often in practice, we see divergences between investment funds allocated to individuals or, on a larger scale, groups of individuals. These differences are accountable to a number of phenomenons: traditional expectations of society, poverty exacerbation as a result of poorly contrived economic policies, or, the classic, market failure. So, a question naturally arises: are these differences, never mind now the position of equity, even efficient? We can sketch an answer to this question with the results we have just now examined. In particular, take the difference between e₁* and e₂*. This difference, by construction, is the efficient difference in investment levels between two individuals, given their productivity ratio and the degree of diminishing marginal returns to education.

Take note that if we divide both sides by E and take the absolute value of the result, then we have compacted our object further by removing E from the right-hand side. This formula is key to our main theorem: it is the efficient absolute difference in investment levels between two individuals as a percentage of the total investment fund, given their productivity ratio. Because this is rather a mouthful, we will for conciseness refer to it as the ‘summary of an efficient policy’, the reason for whose name is but a mathematical property. Let us perform the necessary algebraic manipulations so that we arrive at this key formula:

This formula is so key we make it a full-fledge function, denoted by Γ.

At this point we can’t help but picture out an exercise: provided that we have an estimate for α, we can construct an investment distribution function if we also have each of the student’s normalised exam performance. One method of normalisation compatible with our model, in the full spirit of Walras’s numeraire, is to divide each score by the top performer’s so that if for example the score of individual one is 34/50 and the score of individual two is 40/50, then τ₁ = 0.85 and τ₂ = 1. Plugging these into Γ and using McKinsey and Co.’s estimate for the United States’ labour share of income (0.567) as our α (purely for example’s sake: we have yet to develop a properly concrete interpretation of the parameter α)¹, we arrive at the result that the efficient difference in the levels of investment as a percentage of the total investment fund is approximately 18.5% in favour of individual two. So, if we have an absolute amount of available investment fund E, then we can compute the absolute amount that each student is to receive by calculating EΓ to obtain the absolute difference and solving this simultaneously with e₁* + e₂* = E. In other words, we solve the following system of equations:

  • e₁* - e₂* = ΓE
  • e₁* + e₂* = E

Just to complete our example, if E = $40,000, then e₁* = $16,300 and e₂* = $23,700. I would like to point out now that the reason for why we can work with absolute values as in how Γ is defined is because by now we know, from the results of our comparative statics analysis, that eᵢ* increases in τᵢ so that the better performer is always subject to relatively higher levels of investment. To analyse how fair such a mechanism is requires us to further fine-grain our model, and so is conversation for a different day. With that being said, we have completely derived the optimal policy parameters.

But the main point of constructing the Γ lies beyond the question of design. Indeed, if we had only wanted to derive the optimal policy parameters, we could have simply plugged in the given parameters into our solution functions obtained at the part where we solved the Lagrangian’s first-order conditions (FOCs) for an interior maximum. Still, expressing our solution in some other form (in this case the optimum percentage difference in investment levels) that one-to-one corresponds with the actual policy parameters provides us with greater depth of insight while making conversations about it more compact. What I want to finally turn our attention to is analysis as opposed to design.

Imagine that we encounter a policy that discriminates education investment based on sex, so that the observed deviation of investment between the sexes expressed as a percentage of the total fund is Δ. Here, our model as it is doesn’t need further generalisation: we can read our two-individual interpretation as referring to two groups of individuals. How does Δ fare against the optimum? While Δ is an observed value, the optimum we have derived only just now, so that Δ is the summary of an efficient policy only if:

Δ = Γ(τ₁/τ₂, α).

With the observed value Δ and the given parameter α, we can solve for the productivity ratio that must be the case if the current policy in place is indeed efficient. This productivity ratio we denote by [τ₁*/τ₂* | Δ]. We relate the above equation to the productivity ratio more explicitly: an investment fund distribution mechanism is efficient only if the productivity ratio between the groups by virtue of only the selection criteria into the groups, τ₁/τ₂, satisfies Δ = Γ(τ₁/τ₂, α).

With this in mind, we can employ econometric methods to obtain estimates of the true value of τ₁/τ₂. These estimates are denoted in bold as τ₁/τ₂. From this it follows:

Main Theorem: Δ is the summary of an efficient policy only if τ₁/τ₂ = [τ₁*/τ₂* | Δ].

We are finally equipped enough to construct relations to quantify the degree of inefficiency. The future contribution to income of the optimal investment vector is given by F(e₁*, e₂*) = τ₁e₁*ᵃ + τ₂e₂*ᵃ. The future contribution to income of the current investment vector is F(δ₁, δ₂) = τ₁δ₁ᵃ + τ₂δ₂ᵃ where δ₁ and δ₂ are current policies. Then, scaling the latter to match the former, we get:

so that

Definition of the λ-multiplier. λ - 1 can be interpreted as the percentage increase of future contribution to income had we moved from the current investment vector to the optimal.

I defer the details of this econometric exercise to the reader whom this essay interests. For now, a bigger-picture reflection ensues.

Reflections

In our most recent example, we dabbled a little bit with education policies that discriminate purely based on sex. This need not be interpreted necessarily as a public policy discrimination. Traditions and cultures, while they do influence government policies, nevertheless come up with policies rightfully of their own - that is simply part of the fabric of human society. Our model is applicable to the analysis of policy-makers as an abstract entity. Thus, they are applicable to all institutions, whether it be the free-market or the state, the tradition or the household, that engage in the activity of human capital investment.

But with regards to this brief thought experiment on discrimination, the implications of our theory is clear. The efficient variance in investment distribution between the sexes increases in nature’s tendency to endow each sex with relatively different levels of productivity. Remember that the econometrics exercise requires us to come up with a ceteris paribus estimate of the true value of relative productivity. Group characteristics other than sex must be held constant, so that this relinquishes all absurd reasons as to why there should be investment differentials. If girls tend to be made to stay at home more than boys by virtue of societal traditions, and so may not have had much chance in honing their productivity from early on, then it does not necessarily justify differentials in government-financed investment. Perhaps, that particular policy put in place by society had been narrow and stunted in the first place. So, even without engaging in this exercise with absolute scrutiny, are we to believe that nature systematically endows our students with productivity levels conditioning on their sex, or at least, conditioning on it enough as to warrant our focus so much on determining gender roles? If not, then the window within which investment should diverge is rather small, and our efficiency conditions systematically tends to smooth out investment.

References

¹ https://www.mckinsey.com/featured-insights/employment-and-growth/a-new-look-at-the-declining-labor-share-of-income-in-the-united-states

--

--

No responses yet