library / lib490601328410f607
Handbook of Graphs and Networks in People Analytics
Keith McNulty · 2022
In a sentence
A practical handbook teaching the theory and applied methods of graph and network analysis for studying people, groups and organizations, with worked examples in R and Python.
Most of us live inside huge graphs—social networks, family trees, communication systems—yet few people know how to analyze the network structures that shape connection, influence and information flow in organizations. This handbook by Keith McNulty demystifies graph and network analysis for students and practitioners in the social, organizational and people-analytics fields, explaining just enough theory to support analytical curiosity while teaching the concrete, reproducible methods needed to create, visualize and analyze graphs using freely available open-source tools. From building graphs out of messy rectangular data, to computing paths, distance, centrality, communities and cliques, to persisting data in graph databases, the book grounds every concept in real example datasets and runnable code. Readers finish able to apply network thinking to organizational problems such as onboarding new hires, encouraging diverse collaboration, finding influential employees, detecting communities, and identifying superconnectors—without expensive proprietary software.
The four lenses
- Science
- Statistics
- Systems
- Strategy
Tags
The model
An inferred framework expressing how the design of data/graph modeling and connection definitions (levers) produce measurable network-structural states (centrality, distance, community structure, assortativity, similarity) that in turn shape psychological and behavioral states (connection, information flow, familiarity, influence) and organizational outcomes (collaboration, integration, productivity, retention, resilience).
Connection/Edge Definitiondesign lever
The analyst's choice of which entities to connect and how to define the relationship (edge) between them, including direction, weight and entity types, derived from the underlying data and the analytic question.
Graph Visualization Designdesign lever
Decisions about how a graph is laid out and styled (vertex size, color, edge thickness, layout algorithm) to effectively communicate intended inferences about network structure and dynamics to an audience.
Degree Centralitypsychological state
The number of edges connected to a vertex, representing immediate connection or reach of an individual within a network; a fundamental measure of a vertex's local importance and immediate social connectivity.
Closeness Centralitypsychological state
The inverse of the sum of distances from a vertex to all other vertices, measuring how efficiently a vertex can reach the entire network and thus its potential to spread information quickly.
Betweenness Centralitypsychological state
The extent to which a vertex lies on the shortest paths between other pairs of vertices, indicating its role as a 'superconnector' or bridge whose removal would most disrupt overall network connectivity.
Eigenvector Centrality (Influence)psychological state
A measure of how connected a vertex is to other influential vertices, capturing relative prestige or influence; high values arise from being linked to a few highly influential nodes or many nodes.
Network Distance and Diameterpsychological state
The length of shortest paths between vertices and the maximum such distance in a graph, representing closeness, familiarity and likelihood of information reaching across the network or between individuals.
Community/Clique Structurebehavioral pattern
The partitioning of a network into densely connected subgroups (communities, cliques) relative to between-group connection, capturing the natural pockets of intense interaction within a population.
Network Assortativitybehavioral pattern
The tendency of vertices with similar properties (e.g., department, degree) to connect to one another, measuring homophily and preferential attachment within the network and its resilience versus diversity tradeoffs.
Vertex/Graph Similaritybehavioral pattern
The degree to which two vertices share immediate neighbors, or two graphs share edge sets, used to infer latent similarity between individuals or between alternative definitions of connection in a network.
Persistent Graph Data Infrastructurecontextual condition
The storage of relational data in a graph database (labelled-property graph or RDF) so connections can be queried and analyzed repeatedly and efficiently, enabling mature, scalable organizational network analysis.
Organizational and Social Outcomesoutcome metric
Downstream outcomes the book motivates network analysis to improve, including social integration of new hires, diverse collaboration, productivity, retention, well-being, network resilience and effective information flow.
How they connect
- connection definition → influences degree centrality
- connection definition → influences distance diameter
- connection definition → influences community structure
- degree centrality → predicts organizational outcomes
- closeness centrality → predicts organizational outcomes
- betweenness centrality → predicts organizational outcomes
- eigenvector centrality → predicts organizational outcomes
- distance diameter − influences organizational outcomes
- community structure → influences organizational outcomes
- assortativity → influences organizational outcomes
- vertex graph similarity → correlates community structure
- graph visualization design → moderates organizational outcomes
- graph data persistence → moderates organizational outcomes
- connection definition → influences assortativity
A candidate measure
Handbook of Graphs and Networks in People Analytics — derived measurement candidates
Connection/Edge Definition
documented transformation logic; number of relationship types modeled; edge weighting scheme
self-report suitability: low
Graph Visualization Design
fidelity of visual encoding to underlying metric; presence of hairball avoidance; use of fixed random seed
self-report suitability: low
Degree Centrality
degree count; in-degree; out-degree; 1st order ego size
self-report suitability: low
Closeness Centrality
inverse sum of shortest-path distances; normalized closeness
self-report suitability: none
Betweenness Centrality
betweenness score; normalized betweenness
self-report suitability: none
Eigenvector Centrality (Influence)
eigenvector centrality value; hub score; authority score
self-report suitability: low
Network Distance and Diameter
pairwise distance; average distance; diameter
self-report suitability: none
Community/Clique Structure
community assignment; modularity; number/size of communities; largest clique size
self-report suitability: low
Network Assortativity
nominal assortativity coefficient; degree assortativity coefficient
self-report suitability: none
Vertex/Graph Similarity
Jaccard coefficient; dice coefficient; inverse-log-weighted similarity; edge-set Jaccard
self-report suitability: none
Persistent Graph Data Infrastructure
graph DB maturity level; query language adoption; schema completeness
self-report suitability: low
Organizational and Social Outcomes
integration time; collaboration diversity index; retention rate; engagement score; message propagation rate
self-report suitability: medium
The story
The reader A technical practitioner, analyst, student or researcher who wants to understand and analyze connections among people, groups and organizations.
External problem
Their data is stored in transactional/rectangular form and they lack the skills to model, visualize and analyze relationships as networks.
Internal problem
They feel that network analysis is an esoteric 'dark art' beyond their reach and worry they need expensive specialist software.
Philosophical problem
It's wrong that valuable insights about connection, influence and community remain hidden when free, rigorous, open-source methods exist to reveal them.
The plan
- Learn elementary graph theory and how to create graph objects from data.
- Visualize graphs effectively using layouts and styling.
- Restructure existing data into graph-friendly edge and vertex sets.
- Measure paths, distance, centrality, communities and cliques.
- Persist data in graph databases for repeatable analysis.
Success
- The reader can confidently build, visualize and analyze networks to solve organizational problems like onboarding, diverse collaboration and finding influencers.
- They can democratize network analysis in their organization and have more productive, data-fluent conversations.
- They no longer need expensive vendor software to derive rigorous network insights.
At stake
- They keep treating connection data as mere transactions and miss the hidden social structures shaping outcomes.
- They remain dependent on costly, inflexible software or fail to act on the network dynamics affecting performance, retention and well-being.
- Important relationships, influencers and at-risk connections go unidentified.
Chapter by chapter
ch01Graphs Everywhere!
This chapter explores the pervasive role of graphs in various domains, framing them as essential mathematical models that help analyze complex systems like social networks, connections, and information flow.
- Graphs serve as powerful mathematical models that can dramatically enhance our understanding of complex systems, such as social networks.
- The Seven Bridges of Königsberg exemplifies how foundational problems in graph theory can have broad implications for modern analysis.
- Effective communication and decision-making are considerably improved when we recognize and apply graph principles.
- Engaging with the dynamics of relationships using graph theory helps clarify and navigate the complexities of human interactions.
ch02Working with Graphs
This chapter provides a comprehensive overview of graph theory and practical methodologies for creating and manipulating graphs using R and Python.
ch03Visualizing Graphs
This chapter delves into the methodology and tools for visualizing graphs using both R and Python, emphasizing the importance of effective graph representation for data analysis.
ch04Restructuring Data for Use in Graphs
This chapter outlines methods for transforming various data formats into structured graph representations, essential for data analysis and visualization.
- Effective data visualization requires the transformation of raw data into structured formats appropriate for graph representation.
- Simple graphs can reveal essential relationships in management hierarchies and customer connections.
- The ability to connect customers through sales representatives and common purchases provides deeper insights into strategic decisions.
- Scraping semi-structured documents into coherent data formats unlocks new analytical potential within reams of qualitative information.
ch05p01Paths and Distance (part 1/2)
This chapter explores the theory of graph traversal, particularly focusing on paths, distance, and algorithms for finding the shortest paths within graph structures.
ch05p02Paths and Distance (part 2/2)
This chapter delves into the intricacies of graph visualization techniques applied to social networks, emphasizing modern approaches in R for representing complex connections and communities.
- Force-directed layouts significantly enhance the visualization of social networks by representing connected vertices concisely.
- The choice of layout algorithm can drastically alter the interpretation of community structures in a graph.
- Using layers in static visualizations can increase informative value and distinguish between various attributes of the data.
- Interactive visualizations open avenues for deeper exploration and understanding of network data.
ch06Paths and Distance
In this chapter, the concepts of graph traversal, paths, and distance are explored, emphasizing their importance in understanding network structures and relationships among vertices.
- Graph traversal is an essential skill for efficiently navigating networks and determining optimal connections.
- Distinction between simple paths and general paths is critical for confronting complex data analyses.
- Breadth-first search is typically more computationally efficient for smaller networks, whereas depth-first search shines in certain contexts.
- Distance is a vital measure in graph analysis, with practical implications across physical, social, and organizational networks.
ch07Vertex Importance and Centrality
This chapter delves into the mathematical definitions and practical implications of vertex importance within networks, focusing specifically on various centrality measures that elucidate roles individuals play in organizational structures.
- Understanding vertex centrality is crucial for analyzing organizational networks and enhancing communication flows.
- Degree centrality offers a simplistic but effective measure of immediate connections within a network.
- Closeness centrality reveals how quickly individuals can share information across the organization, aiding efficiency.
- Betweenness centrality indicates who plays a pivotal role in maintaining connectivity between dispersed network segments, serving as potential ‘superconnectors.’
ch08Components, Communities and Cliques
This chapter explores the intricacies of group dynamics within networks, focusing on how to identify and analyze specific subgroups—comprising components, communities, and cliques—that influence overall network behavior.
- Identifying subgroups within networks is critical for drawing actionable insights about organizational dynamics.
- Connected components offer foundational understanding for measuring graph characteristics, including diameter.
- Vertex partitions created through cuts are essential for efficient network analysis, particularly in identifying community structures.
- Community detection via the Louvain algorithm maximizes modularity, reflecting genuine interactions within networks.
ch09Assortativity and Similarity
This chapter explores the concepts of assortativity and vertex similarity in networks, highlighting their implications for understanding social structures and relationships within organizational contexts.
- High assortativity in networks can indicate robust but potentially insular communities, underscoring the need for diverse connections.
- The assortativity coefficient can vary significantly based on the dataset, illustrating the nuances of social connections.
- Understanding vertex similarity can serve as an effective alternative when detailed vertex information is unavailable.
- Methods for calculating graph similarity provide unique insights into how different interactions correspond within the same set of entities.
ch10Graphs as Databases
This chapter delves into the necessity and methodology for organizations to persist data in a graph structure, enhancing the speed and efficiency of network analysis.