Deep-dive to UNDP strategic resemblance with the private sector and internal response trends of UNDP teams on the strategies with Job Postings datasets

Abir
23 min readFeb 10, 2022

--

Source: https://www.freepik.com/vectors/business created by rawpixel.com — www.freepik.com

Disclaimer: This article is co-authored by Juliana Negrini, we also thank Ken Yoonseong Jung and Gorkhan Dikmener for their guidance and support

Introduction

One way to obtain insight into the strategic and thematic focuses of a given organisation is to analyse its hiring practices and trends. With this in mind, our project aimed to investigate the large collection of UNDP job postings dataset accumulated over the organization’s history. A more profound understanding of the building blocks of UNDP workforce was achieved by applying NLP techniques to extract relevant information regarding the skills and other job-specific characteristics. As a consequence, a better understanding of the organisation strategic goals and values was achieved.

UNDP

As a UN-affiliated organisation operating worldwide, the UNDP sets global strategic development goals for all member countries. The UNDP’s regional offices, including regional hubs and country offices are tasked with implementing these goals on the ground in the different member countries. To that purpose, the UNDP’s offices appeal to a wide range of personnel operating both nationally and internationally. An overview of UNDP’s organigram is depicted in Figure 1.

Figure1: UNDP Organigram (As of April 2021)

UNDP Strategy

The UNDP defines its strategy for coming years(s) with regard to its partnerships and sustainable growth goals in the different countries. The strategic documents published by UNDP include:

  • Private Sector Strategy: aims to assist countries to align private sector activities and investments with the 2030 Agenda
  • DCED Briefs: The Donor Committee for Enterprise Development (DCED) briefs describe effective ways to create economic opportunities for the poor, based on practical experience in Private Sector Development (PSD)
  • Digital Transformation Strategy: fosters new collaboration models, introducing supporting systems, structures and mechanisms to drive innovation, and building capabilities to develop and apply digital solutions that will enhance the quality, efficiency and effectiveness of our work
  • Strategic Plan: An overarching strategic orientation emcompassing all UNDP efforts

Background

Related Studies

The methodology applied in our project takes inspiration from the work of Sibarani et al. [1] who presented an ontology-guided job market demand analysis under the umbrella of the EU-funded EDSA project. The Ontology-Based Information Extraction (OBIE) method presented by the authors relies on the Skills and Recruitment Ontology (SARO), which extends both the European Skills, Competences, Qualifications and Occupations Ontology (ESCO) as well as the JobPosting Ontology of Schema.org.

Figure 2: SARO Ontology (Source: https://elisasibarani.github.io/SARO/saro_model_instances_legend.jpg)

Key Technology

I. Named Entity Recognition (NER)

Named entity recognition (NER), also referred to as entity extraction, is a natural language processing (NLP) technique to perform information extraction from unstructured text. NER is applied to recognise nouns, i.e. words or string of words, of interest in a text and classifies them into predefined categories. Identifying entities is key to several applications, including text summarisation, search engines, the semantic web, topic modelling, and machine translation (Augenstein, Derczynski and Bontcheva [4]). This is yet another field where Machine Learning and Deep Learning have been used to leverage the accuracy performance as explained in more detail by Li et al. (2020)[5] and Augenstein, Derczynski and Bontcheva (2017)[6].

II. Simple Knowledge Organisation System (SKOS)

SKOS is a standard data model for theausri, classification schemes, taxonomies etc. designed by the World Wide Web Consortium(W3C). SKOS data can be expressed as machine-readable RDF triples [#]. In the SKOS data model, the entities of a taxonomy or vocabulary are represented as Concepts within a Concept Scheme. Most importantly, SKOS allows for Concpet linking via semantic relations, most notably:

  • related: links two semantically related concepts
  • broader: relates a concept to a concept that is more general in meaning
  • narrower: relates a concept to a concept that is more specific in meaning
  • exact_match: is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications

The Project

In this project, we aim to analyse UNDP job postings and to measure their alignment with the different UNDP strategy documents. We do so by first automatically extracting relevant concepts from the UNDP job posting dataset using an NER model. We then organise the thus collected concepts into a controlled SKOS vocabulary, which can be used in the future as a reference for analysing job postings and other UNDP artifacts related to recruitment. We also leverage the extracted concepts to on the one hand analyse the trends observable in the datasets for the different regions and over a period from 2007–2020 and on the other hand assess the alignment with UNDP strategy docuements. Throughout this project, we made use of a number of UNDP resources such as the UNBIS vocabulary, which we describe in detail in our References section. We summarise our methodology in Figure 3.

Figure 3: Project Methodology

I. Data Collection

The job postings advertised at the UNDP’s jobs platform have a relatively uniform header structure including fields such as location, type of contract, required languages, expected duration of the assignment etc. The job posting’s body is usually structured into the following sections:

  • Background: a description of the organisation offering the position, its goals and the specific project the future employee will be appointed to
  • Duties and Responsibilities: outlines the functions expected to be filled by the future employee and may include a description of the expected deliverables or outcomes at the end of the engagement
  • Competencies: an outline of the required capabilities the candidates should have
  • Required Skills and Experiences: an outline of characteristics and experiences the candidates should have, often quantified in years. This field also includes educational requirements.
Table 1: Dataset Description

Job postings advertised between 2004 and 2020 were extracted from the job platform as raw text, cleaned and parsed into the corresponding fields resulting in a dataset of 73,555 samples.

Table 1 displays a summary of the Missing and Unique Categories for each feature. Overall, all fields were successfully extracted with 1% of missing values.

II. Data Exploration

With regards to geographic reach, 16 regions and 177 countries are listed. The dataset comprises approximately 91% of all the 193 member states countries recognised by the UN. For some Job Postings the location was available as it was described as Home-Based. Figure 4 showcases distribution of job postings per country in the year 2020.

Figure 4: Distribution of Job Postings per Member Country

Throughout the years, the USA, Afghanistan and Bangladesh have held the highest number of job postings, followed by Indonesia and Ukraine. The high concentration of job postings in the USA can be explained by the location of the UNDP HQ while the rest of the countries coincide with focal points of conflict over the last two decades, where UNDP’s mission has been to provide humanitarian aid and support development. Figure 4also shows the participation of the top-five countries and Figure 4 outlines the evolution of the number of job postings in those countries from 2007 to 2020. While overall the 5 top countries still predominate, their participation is on a downward trend since 2015.

Figure 5: Evolution of the Number of Job Postings for the top 5 Countries per Year

Performing a similar analysis by Region in Figure 5, it becomes clear that Africa, and more specifically, sub-saharian Africa has had relevant participation across the years with a peak in 2015. A similar pattern is seen in Asia and southern Asia specifically. This regional grouping provides a more substantive view of the participation of UNDP worldwide revealing the extensive engagement of UNDP in developing regions. Furthermore, it highlights the trend of increasing Home-based openings aligned with a decreasing number of openings for most regions. The need to adjust to home-office settings due to the Covid-19 pandemic is a plausible explanation for this pattern.

Figure 6: Heatmap of Job Postings by Region and by Year

The fields related to the UNDP internal job hierarchy are Type of Contract, Position Level and Staff Category. The UNDP staff categories reflect increasing levels of responsibilities and requirements. The staff categories are as follows [Souce: United Nations Staff Categories]:

  • Professional and higher categories (P and D): employees of this category are internationally recruited and are expected to serve at different duty stations throughout their career with the Organization.
  • General Service and related categories (G, TC, S, PIA, LT): functions in the General Service and related categories include administrative, secretarial and clerical support as well as specialized technical functions such as printing, security and buildings maintenance. Staff in the General Service and related categories are generally recruited locally from the area in which the particular office is located but could be of any nationality.
  • National Professional Officers (NO): are nationals of the country in which they are serving and their functions must have a national context, i.e. functions that require national experience or knowledge of the national language, culture, institutions, and systems.
  • Field Service (FS): Staff in the Field Service category are normally recruited internationally to serve in field missions. Field Service staff members provide administrative, technical, logistics and other support services to United Nations field missions.
  • Senior Appointments (SG, DSG, USG and ASG): the highest positions in the Secretariat either by appointment of the Organization’s legislative organs or Chief Administrative Officer. These positions include: Secretary-General, Deputy Secretary-General, Under-Secretary-General (USG) and Assistant Secretary-General (ASG).

The participation of each of these fileds is displayed below.

Figure 7: Distribution of Staff Categories, Contract Types and Position Levels

While National and International Consultant appears in Staff Category and Position level, these two features carry different data. Position level is a granular level, where one can understand the Job Category (informed by the letters P, D, NO) as well as the experience required for the role. For example, for job category P-2 a minimum of 2 years experience is required. Similarly, for a P-6 or D-1 a minimum of 15 years of experience is required (reference).

The languages required for the role are also described in the dataset as a separated feature. In total, seven languages were available: English, Chinese, Arabic, Russian, French, Spanish and Unspecified for unknown language requirements. The total number of job postings by language requirement is given in Figure 8, where English is the most popular language.

Figure 8: Distribution of Job Postings by Required Language

Figure 9 below displays the language trend over the years. English was removed from the plot below to enable better visualisation of the additional languages. We observe that an increasing demand for arabic language skills from 2008 to 2014 and a decline ever since, which coincides with humanitarian missions in arabic-speaking countries particulary upon the arab spring. Another interesting development is the slow but significant increase in demand of Chinese language skills, which may be a result of the current political development in China’s neighbouring countries.

Figure 9: Yearly Distribution of Required Languages

III. Preprocessing

Semi-Automatic Extraction

In order to train the NER model, we first need to annotate a subset of the job posting data, which can be used as truthset. Normally, truthset creation is undertaken by domain experts. To fill this expertise gap, we made use of several official UNDP frameworks, the UNBIS vocabulary, as well as established resources such as the HECOS vocbulary for Higher Education Classification of Subjects. The relevant concepts from these resources are used to operate a lookup of the job postings and extract their occurences using regular expressions.

The internal UNDP frameworks -used to extract competencies and soft skills from the skills and experiences field- define detailed taxonomies of the characteristics required of personnel working in individual jobs across functional areas, and across all career streams of the UNDP. These include:

  • The Technical Competencies Framework: taxonomies of the technical knowledge, skills and experiences. These competencies are further subdivided into thematic and non-thematic competencies. Thematic competencies are closely related to UNDP strategic goals such as poverty reduction while non-thematic competencies are more general and include for example governance.
  • People Management Competencies: leadership competencies to be demonstrated by all personnel and not only those in formal leadership/management roles
  • Cross-Functional Competencies: capture knowledge and skills to be demonstrated by personnel in a significant number of jobs across career tracks and streams.
  • Core Behavioural Competencies: attitudes and behaviours expected of every individual working in the organisation.

The HECOS vocabulary used to extract educational requirements from job postings gathers subjects and persistent areas or branches of knowledge or learning that are studied in higher education. It is developed by HESA, the Higher Education Statistics Agency, who are the experts in UK higher education data, and the designated data body for England.

We further made use of the list of UNDP units and the UNDP organigram (Figure 1) to extract the employing party in the job postings. Lastly, the topics are extracted using UNBIS, which is a multilingual database of the controlled vocabulary used to describe UN documents and other materials in the UN Digital Library’s collection.

IV. Methodology

1. NER Model

Currently, several NLP libraries can be used to create custom NER models. Motivated by the results of Shelar et al. (2020), we decided to use Spacy’s library as it achieved a higher score compared to Apache OpenNLP and TensorFlow. SpaCy is an open-source library for NLP applications in Python. It contains powerful tools to support text preprocessing, information extraction and NLP models. The Spacy library also provides a pre-trained NER model that can recognise entities such as person, organization, date. For this project, the Spacy framework was used to built the NER model to recognise custom entities (soft skills, duties and responsibilities, units, educational fields). The datasets for the NER model were created using Spacy’s Rule-Based Matching module to accommodate the project timeline as manual labelling for the individual features would be time-consuming. Further information on NER models using Spacy can be found here.

A summary of the NER model results can be found below. The column “number of initial fields” describes the unique tags collected from external sources or from our sample data analysis. As rule-based labelling was used, not all tags provided were found in the dataset. As a result, for most features, there is a smaller number of tags found by the final NER model shown in the column “NER final fields”. Considering the complexity of some of the terms being extracted and no manual labelling requirement, the outcome of the NER was satisfactory. It was capable to capture the tags it was trained with and has shown the capability to generalise to similar tags.

Figure 10: NER Model Results

2. SKOS Vocabulary Creation

We use SKOS to organise the entities extracted in the NER step into a linked vocabulary with a semantic relationship hierarchy. Moreover, we establish a link of the so-created concepts to pre-exsisting concepts from UNBIS where applicable. This ensures a broad coverage of concepts and allows for queries spanning both vocabularies.

We construct our vocabulary using the open sourced tool vocabseditor. We used the tool’s provided API as well as the python library rdflib in order to query UNBIS and curate exact_match relations. We also use SKOS Play! to visualise the vocabulary and to test for conflicts specifically cyclic hierarchical relationships.

Figure 11: High-Level Hierarchy of the SKOS Vocabulary

3. Technical Competency Co-Word Occurrence Network

We applied a simplified version of the word co-occurrence algorithm introduced by Sibarani et al. [1] to visualise the top co-occurring thematic and non-thematic technical competencies in the skills and experiences field of job postings.

We start by creating a symmetric matrix of co-occurrence over all the job postings in the dataset for the technical competencies. We then transform the matrix into a correlation matrix by calculating the equivalence index for each pair, defined as:

Eᵢⱼ = (Cᵢⱼ)²/(Cᵢ * Cⱼ)

where Cᵢ and Cⱼ denote the the number of occurrences of the concept i and j respectively in the dataset while Cᵢⱼ denotes the number of co-occurences of the two concepts. If a competency is mentioned several times within a single job posting, this counts only once towards its overall dataset occurrence.

We then follow the algorithm introduced by the authors consisting of two passes:

  • Pass-1: The link with the highest e-coefficient is added to the network, then all links resulting from a breadth-first search with an e-coefficient exceeding the chosen threshold
  • Pass-2: Each Pass-1 sub-network is extended by adding links with an e-coefficient exceeding the threshold for which the source and the target node belong to two different sub-networks

The threshold of 10 co-occurences used by the authors did not prove adequate in our case as the technical competencies network is strongly connected. Instead we chose to use the average of the equivalence index over all the concept pairs.

We used the Cytoscape tool to build the network from the incidence matrix we created as a .csv. The resulting networks can be found here: thematic technical competencies network, non-thematic technical competencies network.

4. Similarity with Strategic Documents

We followed two approaches to assess the similarity between job postings and UNDP strategy documents.

Approach 1:

We calculate the cosine similarity between each job posting and the strategy documents. For each job posting and each document, a 1000-dimensional vector is extracted based on the top 1000 n-grams emerging from the job postings by means of tf-idf. By averaging the scores by year, we are able to visualise the evolution of the alignment of recruiting with the UNDP’s strategic goals. The results of this approach can be viewed in the tableau dashboard.

Approach 2:

In the second approach, we leverage the concepts extracted and curated in SKOS, specifically the thematic competencies, to calculate the similarity with the strategy documents based on word occurrence. We calculate the occurrences of thematic technical competencies for each document. Only competencies actually appearing in the document are kept and used as feature vector for the job postings. The occurrences are then averaged by the total number of occurrences. We visualise the competencies as dimensions of a plotly polar plot and compare their importance in the document to their representation of the job postings by region. The plots can be found here.

V. Results & Insights

1. NER Extracted Entities:

Soft Skills

Figure 12: Word Cloud of Soft Skills

The definition of soft skills used in this work relates to core behavioural competencies described on UNDP internal documentation. The main purpose is to capture the expected attitudes and behaviours of future UNDP employees. The outcome from this extraction can support UNDP not only to better understand the required Skills by Region or Role, but it is also the first step into predicting the demand for future skills.

Main findings:

  • The top 10 soft skills do not change over the years. The capacity to manage and communicate are the top skills throughout the years.
  • 2014 and 2015 were the years where soft skills had a higher occurrence, each year accounted for 9% of the total. On average, each year accounted for a similar percentage, varying between 6% (2009) and 8.7% (2020)
  • The main difference between 2010 and 2020 is the fact that motivation and confidence are within the top 7 most required skills. In 2010, only 25 job postings mentioned confidence as a desired skill. Compared to 2020, there has been an increase of almost 15 times
Figure 13: Prevalence of Soft Skills

Topics

Extracting the main topics of the job posting can help to understand the theme or concept behind the role. For this task, the background field provided rich information for most of the job postings. Within the UNDP context, this can help contextualise the areas UNDP is more involved in, and visualise the topics trends over the year. For future projects, this information can be used to better understand the personnel profile (skills, education, role level) that is commonly required when handling specific topics. The UNBIS vocabulary was used as a reference to build the vocabulary for a tf-idf matrix. The top three words from the background feature that had higher scoring were defined as topics of the job posting. To reduce the number of topics, the entries that had less than 50 occurrences were replaced by the upper-level hierarchy of that concept. Figure XX below displays how the participation of the 10 most common topics spreads across the years.

Main findings:

  • Considering the job postings related to Women as a topic, only 1% are present in 2007 while 14% of them are found in 2016. Regarding Climate, it is possible to see that most job postings were from the period from 2017 to 2019.
Figure 14: Distribution of Topics by Year

Due to a large amount of participation of UN Women job postings, topics related to Women, Equality and Gender appear as the most recurrent themes

Figure 15: Prevalence of Topics

The graph below shows the topics and their percentage of participation filtered by the Job Postings of 2020 related to Mathematics as an Educational field. Covid-19, Technology and Research are found as expected. However, it is interesting to see how there is demand for specific countries and Rural Development :

Figure 16: Topic Distribution for 2020

Units

The UNDP units list was retrieved from the UNDP structure organigram, as displayed in Figure XX. The UNDP units name, its possible abbreviations and variations were added to the Rule Entity Tagger. The final model output was not able to recognise the Unit for a large number of samples (70%). By reviewing the non-classified samples, it was possible to see that the majority of those samples were not easily recognisable even for a human reader.

  • Figure XX helps to understand the participation of UN Women job postings in the dataset and how their interest it may have a high influence on the analysis
Figure 17: Prevalence of Units

Education

  • Overall, Economic, Social Sciences and Public Administration are the most common Educational Fields followed by Law and Design.
  • For 2020, Economics and Social Sciences remain the top 2 fields. However, Communications is third, surpassing Law. International Development comes in fifth.
  • It is also important to highlight the sharp crescent trend Communications displays in the Line graph below (light blue). In 2017 it surpassed Public Administration and it has maintained itself as the top three Educational Field since then
Figure 18: Distribution of Education Fields

Responsibilities

Figure 19: Word Cloud of Responsibilities
  • Public Administration is the most common task found in out analysis. Aligned with what we shown in the Education field analysis, note how Public Administration as a duty has been in decline since 2016, which explain why Communications has become a more relevant Educational Field.
  • Displaying a clear upward trend are more specialized tasks, such as Technical Evaluation, Financial Evaluation and Cumulative Analysis.
  • Capacity Building is one that is consistently growing year-by-year, while Advisory Services seem to be in decline.
Figure 20: Trend Analysis of Responsibilities

Technical Competencies (thematic Area)

  • Gender Mainstreaming has been an important competency since 2010, and it is currently maintaining an upward trend
  • In the other hand, HIV AIDS has peaked in 2009 and 2012 as the second most relevant competency. Even though it has been in downward trend, it has maintained the fifth post as the most required competency
Figure 21: Distribution of Thematic Technical Competencies

Technical Competencies (non-thematic Areas)

  • The bar graph shows the top ten competencies overall. Soft skills are within the topmost required competencies. The most required skills are associated with technical competencies are Project Management, Procurement and Information & Technology (IT)
  • For 2020, diversity & inclusion was the most popular competency. Together with Writing skills, these are the only competencies to present a prominent upward trend.
  • Even though Learning & Development is still within the top three competencies for 2020, it appears to be losing its participation since 2015. Several of the top ten competencies shown in the line graph below show similar behaviour to Learning & Development, as they peak around 2015 and start to reduce their participation in the following years. It will be interesting to see which competencies will replace them in the following years.
Figure 22: Distribution of Non-Thematic Technical Competencies

Job Title

The Job Title is a highly relevant field. In some instances, it summarises the role name, project topic, and other additional details (nationals only, part-time, etc.). For this reason, it was a challenge to define a proper way to extract and organise this information, as there was no discernable pattern. From the Job Title, two additional fields were extracted. The first was Supportive Information, containing keywords such as lead, director, expert, locum. The second field describes the scope of the role, i.e. if the role was open only for national applicants. As an initial attempt to extract the Job Title, the NER model was trained with samples containing a simple job description, e.g. driver, IT analyst or nurse.

Main Findings:

  • From the initial list of almost 65.000 unique Job Titles, data cleansing and NER reduced the number of unique Job Titles to 4.000. The NER was not capable to capture a Job Title for all samples as some instances do not clearly describe the role name, e.g. consultant on enabling solid state lighting market transformation & promotion of light emitting diode (led) lighting
  • This initial effort allowed the analysis of Soft Skills and other features by Job Title. In Figure 23, an example query is shown for the Job Title “Engineer”. Note how the Duties and Responsibilities and Education field have changed from their original top values
Figure 23: Example Query by Job Title

2. Co-Word Analysis

When looking at the networks formed by the co-occurrence analysis of technical competencies from the skills and experiences field, we can see that the core competency is public administration and project management for the thematic and non-thematic networks respectively. This result concurs with the NER extraction of responsibilities.

Both networks are concentrated around the core competency and present a relatively low number of clusters. However, some unusual associations within some clusters may prove insightful, for example the non-thematic cluster around risk management includes artificial intelligence and financial innovation, hinting at a hightened interest in applying new technologies in this domain. Moreover, we can observe the formation of clusters around a relatively new set of competencies such as graphic design, and social media analytics.

Figure 24: Non-Thematic Technical Competencies Network (left, blue) and Thematic Technical Competencies Network (right, magenta)

This finding is also mirrored in the thematic competency network as one cluster associates technology and automation to access to treatment and renewable energy while another establishes a link between entrepreneurship and digital economy. Another important finding is the emergence of competencies such as green commodities, value chain development, and green economy under agriculture, pointing to a focus on ecologically-aware transformation.

3. Similarity Analysis

The results of our first approach of similarity assessment with strategy documents showcased in Figure 25 points to an increase in alignment of job postings with the UNDP strategic plan 2022–2025. A similar yet less significant trend is observed for the private sector strategy 2018–2022 and the digital strategy of 2019.

Figure 25: Trend Analysis of Cosine Similarity with Strategy Documents

In this the second approach, we look more closely at the dimensions mentioned in the strategy documents and analyse to what extent these are mirrored in the job postings for the different regions of the world where UNDP operates.

a. Private Sector Strategy

Figure 26 depicts the overlapping of the private sector strategy 2018–2022 with the job postings from the different regions based on thematic competencies as dimensions. A more detailed view is available here.

Figure 26: Alignment with Private Sector Strategy 2018–2022 for different regions

Overall, we observe a considerable overlap along dimensions such as agriculture and land belonging to the broad knowledge domain of Nature, Climate and Energy. However other concepts are still lacking in importance. This is the case for innovation and green economy and for renewable energy to the exception of the region of Northern Europe. Other insightful findings pertain to the relatively low importance of the concepts of impact investment and public private partnerships for the different regions.

b. DECD Briefs

A similar trend as for the private sector strategy is observable for DCED briefs. However, in the case of the DCED briefs we can observe gaps for additional concepts such as value chain development, entrepreneurship and youth employment which are have low coverage by the job postings of the different regions. A more detailed view of the reults can be found here.

Figure 27: Alignment with DCED Briefs for different regions

c. Digital Strategy 2019

We observe the alignment of the job postings with the digital strategy from 2019 in Figure 28. A more detailed view can be found here.

Notably, a limited alignment is observed for core concepts such as digital transformation, automation, and innovation, which confirms the results for the other documents. However, a posititve result concerns the concept of security, which has consistently high coverage across the documents, surpassing the importance of this concept in the digital strategy document.

Figure 28: Alignment with Digital Strategy for different regions

d. Strategic Plan 2022–2025

The strategic plan 2022–2025 puts emphasis on a high variety of thematic competencies, including relatively novel sustainability goals such as blue economy, urban resilience and gig economy, which have relatively low representation in the job posting dataset for all regions. This is also true for concepts such as youth political participation, multidimentional poverty and impact measurement. A more detailed view of the results for the sifferent regions can be found here.

Figure 29: Alignment with Strategic Plan 2022–2025 for different regions

Discussion & Future Work

Regarding the application of NER methodology for information extraction, we can identify the following points as possible improvements:

  • Manual labelling could provide a better training set for the model for more complex fields. For example, duties and responsibilities, thematic area or the Unit fields would benefit from knowledge experts to improve the quality of the model output.
  • Given GPU performance is available, it would be interesting to see if using transformer architecture would provide significant improvements to the performance. Not only regarding metric score but its capability to generalise to new tags.
  • Understand a better way to separate between training and test sets, as we suspect the similarity between job postings could have had an influence over the high test set f-scores.

As for the similarity with strategy documents, we make note of the following weaknesses:

  • Both approaches used are based on keyword frequency. The first approach uses tf-idf and assesses similarity based on the top 1000 n-grams. However, the n-grams extracted are not garanteed to be relevant for the similarity assessment. Additionally, using cosine similarity hinders explainability. In the second approach based on frequency of occurrences of thematic competencies, explainability and relevance are improved but the occurrence frequency may be skewed due to ambiguity of context, for example concepts such as migration may mean the movement of human beings in one context or the movement of data between two IT infrastructures in another context.

While the different similarity analyses may be a good first analysis for the alignment between strategy and recruitment, our projct experience taught us that deeper more domain-driven metrics are needed to synthesize more targeted insights from the data available. In particular, we think that the strategic goals have to be translated into models measuring their level of success. These models can have the job posting characteristics as a variable among others. Additionally, it should be taken into account whether the advertised position has indeed been filled and over which period of time. This will also allow a more targeted alignment with the strategy enforeced during the corresponding time period.

Resources

UNBIS:

The UNBIS vocabulary is a multilingual database of the controlled vocabulary used to describe UN documents and other materials in the UN Digital Library’s collection.

HECOS:

The Higher Education Classification of Subjects (HECOS) vocabulary gathers subjects and persistent areas or branches of knowledge or learning that are studied in higher education. It is developed by HESA, the Higher Education Statistics Agency, who are the experts in UK higher education data, and the designated data body for England.

UNDP Frameworks:

Consist of internal UNDP frameworks defining detailed taxonomies of the characteristics required of personnel working in individual jobs across functional areas, and across all career streams of the UNDP. These include:

  • The Technical Competencies Framework: taxonomies of the technical knowledge, skills and experiences (thematic, non-thematic)
  • People Management Competencies: leadership competencies to be demonstrated by all personnel and not only those in formal leadership/management roles
  • Cross-Functional Competencies: capture knowledge and skills to be demonstrated by personnel in a significant number of jobs across career tracks and streams.
  • Core Behavioural Competencies: attitudes and behaviours expected of every individual working in the organisation.

References

Tableau Dashboard

Thematic Competency Network

Non-Thematic Competency Network

Similarity Analysis Plots

SKOS Vocabulary

GitHub Repository

--

--