<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://quantifiedcuriosities.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://quantifiedcuriosities.com/" rel="alternate" type="text/html" /><updated>2026-02-18T10:49:04+01:00</updated><id>https://quantifiedcuriosities.com/feed.xml</id><title type="html">Quantified Curiosities</title><subtitle>Finding clarity in the things no one asked to be clarified.</subtitle><author><name>Elie Maalouly</name></author><entry><title type="html">Quantifying Love</title><link href="https://quantifiedcuriosities.com/posts/arabic-love-word-analysis/" rel="alternate" type="text/html" title="Quantifying Love" /><published>2025-09-04T00:00:00+02:00</published><updated>2025-09-04T00:00:00+02:00</updated><id>https://quantifiedcuriosities.com/posts/arabic-love-word-analysis</id><content type="html" xml:base="https://quantifiedcuriosities.com/posts/arabic-love-word-analysis/"><![CDATA[<p>How do poets name love? A data-driven exploration of 14 Arabic words for love that traces their etymological roots, poetic functions, and shifting meanings across twelve historical eras. Using a 2-million-verse corpus and sentiment analysis, the analysis shows how word choice maps to emotional polarity and co‑occurrence communities, revealing that poets consistently exploit subtle lexical differences to shape tone, intensity, and literary effect. The data and code for this project is available on <a href="https://github.com/eliemaalouly/arabic-love-word-analysis" target="_blank" rel="noopener noreferrer">GitHub</a>.</p>

<p><br /></p>

<h1 id="introduction">Introduction</h1>

<p><a href="https://en.wikipedia.org/wiki/Al-Ma'arri" target="_blank" rel="noopener noreferrer">Abu al-‘Ala al-Ma‘arri</a>, the famed poet and linguist who had lost his sight, once entered a gathering of scholars. As he made his way through the crowd, he accidentally stepped on a man’s foot. Irritated and not realizing who stood before him, the man blurted out, “Who is this dog that couldn’t see me?” Offended, Abu al-‘Ala was quick with his reply, <a href="/assets/images/posts/arabic-love-word-analysis/dog.jpg" class="image-popup">“The dog is the one who does not know seventy names for a dog.”</a><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>

<p>One might think this little anecdote illustrating the richness of the Arabic language is an exaggeration, but it is not. Not even in the slightest! Arabic has a rich and extensive vocabulary. Scholars have compiled books listing the synonyms of various Arabic words: For example, there are 70 words to describe honey, 200 names for a snake, and <a href="/assets/images/posts/arabic-love-word-analysis/lion.jpg" class="image-popup">400 words for a lion!</a> Yes, 400!<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> That may be a bit extreme and may seem redundant. After all, do we really need so many words for a lion?</p>

<p>To grasp this, we need to appreciate the role language played in Arab society. Before the rise of Islam, most Arabs were not literate, and poetry was the primary medium of communication<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. It carried news, preserved history, and told stories. Through verse, events were immortalized. Poets held such influence that a single line could disgrace an entire tribe or elevate it to honor<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Out of this deep reverence for words, the Arabs became unrivaled masters of their language as repetition in oral poetry was frowned upon. A great poet had to show mastery by avoiding redundancy, constantly reaching for new words, metaphors, and imagery. This drove poets to constantly preserve, invent, and refine synonyms.</p>

<p>Not only did that result in an insane number of synonyms, but each synonym carries a slightly different and nuanced meaning! Let’s stick with lion for a moment. Don’t worry, we will not go through the 400 words, but here are a few to demonstrate the different shades of meaning, each evoking different qualities and characteristics. For instance: The basic and common word for lion is <span dir="rtl" lang="ar">أَسَدْ</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/asad.mp3" aria-label="Play pronunciation of asad" title="Play pronunciation">🔊</button> (asad). But when the lion is praised for its courage, you call it <span dir="rtl" lang="ar">لَيْث</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/layth.mp3" aria-label="Play pronunciation of layth" title="Play pronunciation">🔊</button> (layth). If it charges so fiercely that even camels scatter, it is a <span dir="rtl" lang="ar">قَسْوَرَة</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/qaswara.mp3" aria-label="Play pronunciation of qaswara" title="Play pronunciation">🔊</button> (Qaswara). A massive, crushing lion is called a <span dir="rtl" lang="ar">هِزَبْر</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/hizabr.mp3" aria-label="Play pronunciation of hizabr" title="Play pronunciation">🔊</button> (hizabr), while a biting, wide-jawed lion is a <span dir="rtl" lang="ar">ضَيْغَم</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/daygham.mp3" aria-label="Play pronunciation of daygham" title="Play pronunciation">🔊</button> (daygham). If the lion is wild and ferocious, they call it <span dir="rtl" lang="ar">ضِرْغَامْ</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/dirgham.mp3" aria-label="Play pronunciation of dirgham" title="Play pronunciation">🔊</button> (dirgham). And when poets wanted a word for majesty and awe, they used <span dir="rtl" lang="ar">غَضَنْفَرْ</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/ghadanfar.mp3" aria-label="Play pronunciation of ghadanfar" title="Play pronunciation">🔊</button> (ghadanfar) or <span dir="rtl" lang="ar">حَيْدَرَة</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/haydara.mp3" aria-label="Play pronunciation of haydara" title="Play pronunciation">🔊</button> (haydara) (these names were so noble they became epithets for warriors).</p>

<p class="notice--warning">If you were ever thinking about learning Arabic in any serious manner, this might have scared you off completely! But don’t be discouraged. As a native Arabic speaker, I only know one word for lion! Because we only use one in daily colloquial conversations. I know one word for honey, one word for water, one for flimflam, etc. Because <strong>fortunately</strong>, <a href="/assets/images/posts/arabic-love-word-analysis/zuhayr-ibn-abi-sulma.gif" class="image-popup">none of us is a pre-Islamic era Bedouin tribe-spokesperson poet fighting for the honor of their tribe with that <em>killer verse</em> by plugging in that <em>sweet, sweet synonym!</em></a></p>

<p>But I digress. We’re not here to talk about lions. If you’re anything like Van Halen, you might have noticed from the title of this post that so far I <a href="https://www.youtube.com/watch?v=qtwBFz6lfrY" target="_blank" rel="noopener noreferrer">Ain’t Talkin’ ‘Bout Love</a>, so let’s move on to the main course. The linguistic richness of the Arabic language is not limited to animals. It has an astonishing variety of words for love, each capturing different facets of this complex emotion. Given that poetry was the main form of artistic expression in the pre-Islamic era, and that love was a recurrent theme in the poetry of that era, it is no surprise that more than 50 words<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> of love exist today as a result of that early interest.</p>

<p><br /></p>

<h1 id="the-semantic-field-of-love">The Semantic Field of Love</h1>

<p>In classical Arabic, the semantic field of love is treated as one half of a spectrum that contains love and hate. On one side of the spectrum lies love, made up of 14 distinct stages, while on the opposite side lies hate with its own stages<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. The stages of love start with inclination and infatuation, leading into growth and proliferation, and culminate in either madness or self-sacrifice for the beloved. To paraphrase from <em>The Beloved in Middle Eastern Literatures</em><sup id="fnref:6:1" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>: If love is like the pull of electric charges, hate works in reverse as repulsion. It starts with indifference, intensifies into hostility, and at its peak becomes the urge to cause harm. At its core, love creates and multiplies, while hate destroys and annihilates. However, the extremes of love can be just as dangerous as those of hate, only instead of hurting others, love taken too far turns its harm inward.</p>

<p>But here we will only focus on the “love” half of the spectrum. As mentioned earlier, there are over 50 synonyms for love in Arabic, all with their small nuances, but here we will only consider 14 of the most common words as they have been shown in studies to represent distinct psychological stages<sup id="fnref:6:2" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. You will see that most of the words of love either evolved from another word or are derived from roots that are unrelated to love itself. In Arabic, these words of love represent emotions or psychological states that <strong><em>resemble</em></strong> some feeling that is inspired by the root from which it is derived. You will understand as we delve into them one by one. Let’s start with the most neutral words on the spectrum and finish with the most extreme.</p>

<ol>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/hawa.mp3" aria-label="Play pronunciation of hawa" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">هَوَى</span> (hawa):
<br />Hawa means an inclination toward something or someone. The word originally refers to “air” and the “void”, and by extension conveys the idea of “falling into emptiness”. From this sense arises hawa as a term for succumbing to the passions of the soul, or more simply: “falling in love”. It is derived from a root that means “to blow”. This first stage of love is thus seen as fleeting, much like the wind; it is unavoidable, and it can rise and fall.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/wudd.mp3" aria-label="Play pronunciation of wudd" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">وُدّ</span> (wudd):
<br />Wudd signifies a warm and tender affection, often used to describe the love between friends or family, but can also refer to romantic love in its early stages, where there is a wish to get close to someone.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/hubb.mp3" aria-label="Play pronunciation of hubb" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">حُبّ</span> (hubb):
<br />Hubb is the most common verbal noun for “love” in Arabic. The word hubb is derived from the same root as the word “seed”. Hubb is seen to resemble a seed, capable of growing, giving life, and branching out like a tree through procreation. Yet its origin lies in an invisible intention of the heart, much like a hidden seed beneath the soil.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/shaghaf.mp3" aria-label="Play pronunciation of shaghaf" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">شَغَفْ</span> (shaghaf):
<br />The word originally means the outer membrane of the heart. Shaghaf refers to the passion that has surpassed the outer “superficial” attraction and has reached the core, and has completely covered the heart.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/sababa.mp3" aria-label="Play pronunciation of sababa" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">صَبابَة</span> (sababa):
<br />Sababa comes from the root of the Arabic word, which means “to pour” or “to spill”. This reflects the idea of love as spilling the essence of one’s heart onto someone else, symbolizing deep faithfulness in love.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/ishq.mp3" aria-label="Play pronunciation of 'ishq" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">عِشْق</span> (‘ishq):
<br />‘Ishq is a very common word for love that comes from the word <span dir="rtl" lang="ar">عَشَّ</span> <button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/assha.mp3" aria-label="Play pronunciation of 'assha" title="Play pronunciation">🔊</button> (‘assha), which means “to be nested in”. In a sense, the lover is now <span class="easter-egg-container">nested<span class="easter-egg">🥚</span></span> in their beloved. At this stage, the lover is said to be inseparable from their beloved.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/wala.mp3" aria-label="Play pronunciation of wala'" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">وَلَعْ</span> (wala’):
<br />Wala’ comes from a word that means “burning”. This is a psychological state that resembles “catching on fire”. This is not a poetic expression, but this word refers to incredible pain that actually resembles being burned. We can see here that we have moved into the more extreme side of the spectrum of love.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/gharam.mp3" aria-label="Play pronunciation of gharam" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">غَرَامْ</span> (gharam):
<br />It comes from the word “debt” or “the price one must pay”. Gharam was also used to describe the torture of hell (<a href="/assets/images/posts/arabic-love-word-analysis/happy-music.jpeg" class="image-popup">yes, we’re getting serious now!</a>). At this stage, love resembles the inescapable torment that one feels when they are preoccupied with loving someone at all times.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/huyam.mp3" aria-label="Play pronunciation of huyam" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">هُيَامْ</span> (huyam):
<br />Quite simple, huyam is the madness of love. The word comes from an illness that afflicts camels and causes them to go thirsty and wander astray in the desert. It is an extreme feeling given how long a camel can go without water in the hot and dry desert. At this stage, love represents the total surrender of reason. A lover becomes overwhelmed, unable to envision existence apart from their beloved. Consumed entirely by passion, they drift away from reality and descend into the torment of madness. A striking example is Qais in <a href="https://en.wikipedia.org/wiki/Layla_and_Majnun" target="_blank" rel="noopener noreferrer"><em>Layla and Majnun</em></a>. After Layla’s death, he grows numb and withdraws into a profound trance. In time, he loses all awareness of himself and the physical world, earning the name majnun (madman), and living in solitude until his eventual death.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/taym.mp3" aria-label="Play pronunciation of taym" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">تَيْم</span> (taym):
<br /> Taym originally means enslavement. Since love is now etched in the heart and mind, the lover becomes enslaved to the object of their adoration. They are now chained to their beloved and refuse to let go of this love that consumes them.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/walah.mp3" aria-label="Play pronunciation of walah" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">وَلَهْ</span> (walah):
<br /> Walah, very simply, refers to a state resembling losing one’s mind. You might be asking, isn’t that what <em>huyam</em> is? And the explanation of huyam was much cooler, so why is walah considered worse? Well, they might look the same if we just look at them outwardly (i.e., madness). But in fact, they differ in their moral weight. Whereas huyam is seen as a human weakness caused by excessive passion, walah is viewed as being so overcome with passion that it leads to the negligence of God.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/jawa.mp3" aria-label="Play pronunciation of jawa" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">جَوَى</span> (jawa):
<br /> The word jawa means grief. As a synonym of love, it represents a psychological state where the lover feels they are being consumed from the inside, like they are being wasted away, like they are being internally burned with acid! It is a deep, burning, internal grief caused by unfulfilled love, longing, or heartbreak. Unlike the previous synonyms, jawa is not only an emotional madness, it’s a chronic sickness that is physically destructive.</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/fitna.mp3" aria-label="Play pronunciation of fitna" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">فِتْنَة</span> (fitna):
<br /> Fitna, like jawa, is a love that is physically destructive. At this stage, the lover is in a state of internal turmoil akin to the process of smelting gold in a furnace (I know it sounds weird, but you have to admit it doesn’t sound pleasant).</p>
  </li>
  <li>
    <p><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/tawq.mp3" aria-label="Play pronunciation of tawq" title="Play pronunciation">🔊</button>
<span dir="rtl" lang="ar">تَوْق</span> (tawq):
<br /> And we’re at the final stage, tawq. It represents fighting one’s own psyche for the sake of the beloved to the extent of <strong>self-sacrifice</strong> (yes, we are at this point now!).</p>
  </li>
</ol>

<p>So let’s summarize what the spectrum of love looks like. The early stages of love begin with a simple attraction or leaning toward someone (hawa), followed by the stage of openly showing affection (wudd). This can then develop into hubb, a love marked by proliferation and growth. Beyond this third stage, there remain only three higher forms of love that carry no negative implications: the state of wholehearted attachment (shaghaf), the outpouring of the heart’s deepest feelings (sababa), and finally, the condition of complete inseparability from the beloved (‘ishq).</p>

<p>The seventh through fourteenth terms in this sequence carry underlying associations of emotional and physical harm. While poets, mystics, and storytellers have celebrated these words as expressions of deep love, their original roots suggest tones of caution, reproach, or warning. This critical sense is not directly stated in literary usage, but it remains embedded in the etymological origins of the words, which were first applied in contexts unrelated to love. Particularly, the <em>early</em> Arabic speaker had in mind that love can be similar to: catching on fire (wala’), being in debt or tortured in hell (gharam), being lost and thirsty in the desert (huyam), being enslaved (taym), losing one’s mind (walah), suffering your insides burning (jawa), suffering your body smelting in a furnace (fitna), and an internal struggle leading to sacrificing oneself (tawq).</p>

<p>An early Arab speaker could consciously perceive the emotional dangers attached to these words; However, as lexical usage evolved, writers and poets may have used these terms unconsciously to express <em>intense</em> cases of love<sup id="fnref:6:3" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. I can say that personally, I was not aware of the presence of this negative side of the spectrum of love. I had encountered some of these words in Arabic poetry, but I never really understood the differences between them. In daily conversation, we only use three of these words (hubb, gharam, ‘ishq). Hubb is by far the most commonly used term, while gharam is a close second. I never truly grasped the nuances between hubb and gharam, except that I unconsciously would use gharam to denote a more passionate kind of love, whereas I would use hubb for a more general and pure sense of love. ‘Ishq is not a word we commonly use as is; we use the word <span dir="rtl" lang="ar">عُشّاقْ</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/ushaq.mp3" aria-label="Play pronunciation of 'ushaq" title="Play pronunciation">🔊</button> (‘ushaq), which is derived from ‘ishq, and it means lovers. Valentine’s Day, for example, is called <span dir="rtl" lang="ar">عِيدُ العُشّاقْ</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/ushaq2.mp3" aria-label="Play pronunciation of 'id al'ushaq" title="Play pronunciation">🔊</button> (‘id al ‘ushaq), which literally translates to the “Festival of Lovers.”</p>

<p><br /></p>

<h1 id="the-big-questions">The Big Questions</h1>

<p>With all that in mind, I set out to explore how these words are used in Arabic poetry as it has evolved over time. I had the following questions in mind:</p>
<ol>
  <li>Which of these words are most commonly prevalent in Arabic poetry?</li>
  <li>How has the usage of these words changed over time periods?</li>
  <li>Are there differences in usage between poets of different genders?</li>
  <li>Do poets use multiple synonyms of love in the same poem? And if they do, which ones are most commonly paired together?</li>
  <li>Regardless of the semantic differences, are there any patterns in how these words are practically used in poetry? Are certain words more associated with positive vs negative emotional tones?</li>
</ol>

<p><br /></p>

<h1 id="dataset">Dataset</h1>

<p>To answer these questions, I set out to find or scrape a large corpus of Arabic poetry. Luckily, I found a huge dataset of Arabic poetry from the classical pre-Islamic era all the way to modern times. The dataset, <a href="https://doi.org/10.7910/DVN/PJPWOY" target="_blank" rel="noopener noreferrer">AraPoems</a><sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>, is compiled from two online sources: <a href="https://poetry.dctabudhabi.ae/" target="_blank" rel="noopener noreferrer">Almausua</a> and <a href="https://www.aldiwan.net/" target="_blank" rel="noopener noreferrer">Aldiwan</a>. It consists of more than <em>2 million</em> verses of Arabic poetry, <em>165,220</em> poems, and <em>5,384</em> poets spanning <em>12</em> time periods.</p>

<p><br /></p>

<h1 id="keyword-extraction">Keyword Extraction</h1>

<p>With the dataset in hand, I wrote a Python script to extract the occurrences of the 14 words of love in all the verses. But it is not as simple as just matching the word in the verse. For example, the word hubb can appear in several different forms. My initial solution was to match the root of the words (which is unique); however, the NLP tools for Arabic were not capable of doing what I needed for that. My next solution was to manually find all the possible <a href="https://en.wikipedia.org/wiki/Lexeme" target="_blank" rel="noopener noreferrer">lexemes</a> for each word and to try to match them. With <a href="https://github.com/CAMeL-Lab/camel_tools" target="_blank" rel="noopener noreferrer">CAMeL Tools</a>, I was capable of extracting and matching the lexemes of each word. But the next problem was that most words can have multiple meanings. For example, the lexeme for gharam can also mean “to fine”. So I manually went through the list of possible meanings of each lexeme and filtered out the ones that are unrelated to love.</p>

<p>I then created a JSON file functioning as a “glossary” with the 14 words I wish to extract, the acceptable lexemes for each word, and the acceptable meanings for each lexeme. Here is a sample of the glossary:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="w">    </span><span class="nl">"wala'"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"label"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ولع"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"label_en"</span><span class="p">:</span><span class="w"> </span><span class="s2">"wala'"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"lemmas"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
            </span><span class="s2">"ولع"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"اولع"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"مولع"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"تولع"</span><span class="w">
        </span><span class="p">],</span><span class="w">
        </span><span class="nl">"gloss_include"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
            </span><span class="s2">"passionate"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"passion"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"enamored"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"desire"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"be enamored"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"fall in love"</span><span class="w">
        </span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="err">,</span><span class="w">
    </span><span class="nl">"gharam"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"label"</span><span class="p">:</span><span class="w"> </span><span class="s2">"غرام"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"label_en"</span><span class="p">:</span><span class="w"> </span><span class="s2">"gharam"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"lemmas"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
            </span><span class="s2">"غرام"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"مغرم"</span><span class="w">
        </span><span class="p">],</span><span class="w">
        </span><span class="nl">"gloss_include"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
            </span><span class="s2">"infatuation"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"infatuated"</span><span class="p">,</span><span class="w">
            </span><span class="s2">"enamored"</span><span class="w">
        </span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p><a href="https://github.com/CAMeL-Lab/camel_tools" target="_blank" rel="noopener noreferrer">CAMeL Tools</a> also provides a sentiment analysis tool, which was very useful for my analysis. It was able to predict the sentiment (positive, negative, neutral) of each verse of poetry. Finally, I was able to extract the occurrences of each love word in the verses by matching its lexeme and meaning to the glossary I built and stored them in a new dataset, along with the verse it was found in, the sentiment of the verse, the poem name, the poet, the poet’s gender, and the time period. The extracted dataset consists of <em>152,512</em> occurrences of a word of love in <em>57,024</em> unique poems!</p>

<p><br /></p>

<h1 id="analysis">Analysis</h1>

<p>So let’s answer the simplest question: <strong>Which love word is most commonly prevalent in Arabic poetry?</strong></p>

<figure class=""><a href="/assets/images/posts/arabic-love-word-analysis/overall_distribution.png" class="image-popup"><img src="/assets/images/posts/arabic-love-word-analysis/overall_distribution.png" alt="The overall distribution of love words in the corpus" /></a></figure>

<p>Hubb is by far the most common, taking over almost half of the occurrences of the love words. Given how prevalent hubb is in the modern Arabic languages, it is not surprising that it completely dominates the corpus. Hawa is in second place, and it takes a fair share at around 18%, followed by the rest of the words, where each takes a share of less than 10%.</p>

<h2 id="temporal-variations">Temporal Variations</h2>

<p>Now let’s go a bit deeper. <strong>Has the prevalence of these love words changed over time periods?</strong></p>

<p>Let’s take a broad look at the distribution of each of the words in different time periods<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. The time periods or eras that are included in the dataset are: <a href="https://en.wikipedia.org/wiki/Pre-Islamic_Arabia" target="_blank" rel="noopener noreferrer">Pre-Islam</a> (~610 CE), <a href="https://www.oxfordreference.com/display/10.1093/acref/9780191836954.001.0001/acref-9780191836954-e-194" target="_blank" rel="noopener noreferrer">Seasoned</a><sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>, <a href="https://en.wikipedia.org/wiki/First_Islamic_state" target="_blank" rel="noopener noreferrer">Islamic</a> (610-661), <a href="https://en.wikipedia.org/wiki/Umayyad_Caliphate" target="_blank" rel="noopener noreferrer">Umayyad</a> (661–750), <a href="https://en.wikipedia.org/wiki/Al-Andalus" target="_blank" rel="noopener noreferrer">Andalusian</a> (711-1492), Dual-eras<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>, <a href="https://en.wikipedia.org/wiki/Abbasid_Caliphate" target="_blank" rel="noopener noreferrer">Abbasid</a> (750-1517), <a href="https://en.wikipedia.org/wiki/Fatimid_Caliphate" target="_blank" rel="noopener noreferrer">Fatimid</a> (909–1171), <a href="https://en.wikipedia.org/wiki/Ayyubid_dynasty" target="_blank" rel="noopener noreferrer">Ayyubid</a> (1171–1260), <a href="https://en.wikipedia.org/wiki/Mamluk_Sultanate" target="_blank" rel="noopener noreferrer">Mamluk</a> (1250–1517), <a href="https://en.wikipedia.org/wiki/Ottoman_Empire" target="_blank" rel="noopener noreferrer">Ottoman</a> (1517-1922), and Modern (1922~).</p>

<figure class=""><a href="/assets/images/posts/arabic-love-word-analysis/vocabulary_evolution_era.png" class="image-popup"><img src="/assets/images/posts/arabic-love-word-analysis/vocabulary_evolution_era.png" alt="Evolution of Love Vocabulary in Arabic Poetry" /></a></figure>

<p>We can see that hubb is the dominant love word in all periods, consistently between 40% and 55%. It is obviously the “core” word for love in Arabic. Hawa, our second most common synonym, has seen a rise in usage from the Pre-Islamic period and the seasoned era up to the Islamic era, and it remained relatively stable from then on. ‘Ishq was relatively absent in the early periods but drastically rose after the Umayyad period. This sudden rise can be explained by the emergence of <a href="https://en.wikipedia.org/wiki/Sufism" target="_blank" rel="noopener noreferrer">Sufi</a> poetry (Islamic Mysticism), where it became the preferred word for divine love. It became a core concept in the doctrine of Islamic Mysticism and is sometimes referred to as “the basis of creation”. Wudd fluctuates quite a bit over time, rising and falling, but seems to be the most popular in the early eras. Gharam is more interesting; it had a share of around 4% in the pre-Islam era, and then significantly went down in subsequent early eras, only to become popular again in later eras. It is surprisingly difficult to find information on the history of these words, so we can only try to speculate. This dip might have happened due to its single use in the <a href="https://corpus.quran.com/wordmorphology.jsp?location=(25:65:11)" target="_blank" rel="noopener noreferrer">Quran</a>. It was mentioned once as a description of the suffering of hell. This might have caused the poets of the time to be extra cautious with the word and might have actively avoided it.</p>

<p>In general, we can see an overall pattern where hubb is the anchor word, hawa is the second most used word, ‘ishq and gharam are the late bloomers, and wudd is the old-fashioned term!</p>

<p>Now, let’s test whether these patterns are significant. To test whether the distribution of love words depends on the time period, we can run a \(\chi^2\) (pronounced chi-squared) test of independence.</p>

<div class="notice--danger">
  
<p><strong>The dreaded terminology alert</strong>: I will take a moment to explain the \(\chi^2\) test as painlessly as possible. If you already know it, feel free to ignore this box. If you don’t know and you don’t care, just know that it’s a statistical test to determine whether there is a significant association between two variables.</p>

<p><strong>The \(\chi^2\) Test</strong></p>

<p>Let’s explain the \(\chi^2\) test with a very simple example. Imagine you walk into a bookstore. You see that there is one section per book genre in the bookstore. And the sections look the same size, so now you have the <em>expectation</em> that the bookstore is stocked pretty evenly across genres. But as you walk around, you notice that the mystery shelves are packed full of books, the romance shelves hold just a few books, the history section has tons of books, and the sci-fi shelves have almost none. This is where the \(\chi^2\) test shines; it will answer the following question for you:</p>

<p>“Are these differences just random, or are they too big to be explained by chance? Do customers (or the librarians) actually prefer some genres over others?”</p>

<p>The \(\chi^2\) test compares the expected values (which are that the books are distributed evenly) to the actual values (what you actually observed). If they are close, the test suggests that the differences are most probably due to chance. If they are far, the test suggests that there is probably a pattern here: maybe some genres do dominate.</p>

<p>This is exactly what we will do with our data. The \(\chi^2\) test will look at the expected distribution of love words across time periods and compare it with the actual distribution.</p>

</div>

<p>I will spare you the details of the output (for those who are interested, the outcome of the test is in the Jupyter notebook in the <a href="https://github.com/eliemaalouly/arabic-love-word-analysis" target="_blank" rel="noopener noreferrer">GitHub</a> repository). <a href="/assets/images/posts/arabic-love-word-analysis/drumroll.gif" class="image-popup">And the results are:</a></p>

<p>Era <strong>does</strong> matter for which love term gets used. But the effect size is small, which means that the patterns exist, but they’re subtle. This could be due to the dominance of hubb in every era. Now let’s do something more interesting. By looking at the residuals of the \(\chi^2\) test, we can see whether each love word is under or over-represented in each era<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>.</p>

<figure class=""><a href="/assets/images/posts/arabic-love-word-analysis/word_representation_era.png" class="image-popup"><img src="/assets/images/posts/arabic-love-word-analysis/word_representation_era.png" alt="Over and Under-Representation of Love Words by Era" /></a></figure>

<p>This heatmap shows the representation of each word in different eras. The blue boxes are words that are underrepresented in this era. In other words, they appear much less than we expect them to. The red boxes, conversely, are over-represented words (they appear more than expected). For example, we can see that gharam was under-represented in the early eras, then became over-represented in the later eras, starting from the Ayyubid era. This heatmap can show us the patterns that were a bit too difficult to see in the previous figure. We can see the decrease in popularity of wudd as it advances through the eras and becomes more and more underrepresented. The light blue and red colors in the boxes of the hubb column indicate a very slight fluctuation of the representation of this word in different eras. In other words, it is fairly stable across time (compared, for example, to wudd, which fluctuates heavily).</p>

<h2 id="gender-influences">Gender Influences</h2>

<p>In our extracted dataset, we have 3,050 poets in total. Unsurprisingly, the poets are mostly male. Specifically, we have 2,937 males and 112 females. That’s 3.7% of the poets who are female! So, given this huge mismatch in the proportions of the genders, I wouldn’t draw any real conclusions from this kind of lopsided data, but we can still take a quick look.</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>fitna</th>
      <th>gharam</th>
      <th>hawa</th>
      <th>hubb</th>
      <th>huyam</th>
      <th>‘ishq</th>
      <th>jawa</th>
      <th>sababa</th>
      <th>shaghaf</th>
      <th>tawq</th>
      <th>taym</th>
      <th>wala’</th>
      <th>walah</th>
      <th>wudd</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Female</td>
      <td>0.1</td>
      <td>4.5</td>
      <td>14.5</td>
      <td>54.6</td>
      <td>2.5</td>
      <td>9.2</td>
      <td>0.2</td>
      <td>2.7</td>
      <td>3.5</td>
      <td>1.0</td>
      <td>0.6</td>
      <td>1.3</td>
      <td>1.3</td>
      <td>4.1</td>
    </tr>
    <tr>
      <td>Male</td>
      <td>0.3</td>
      <td>6.9</td>
      <td>18.4</td>
      <td>46.4</td>
      <td>3.6</td>
      <td>8.3</td>
      <td>0.3</td>
      <td>3.6</td>
      <td>1.1</td>
      <td>0.8</td>
      <td>0.7</td>
      <td>1.5</td>
      <td>0.8</td>
      <td>7.4</td>
    </tr>
  </tbody>
</table>

<p>The table shows the percent share of love words by each gender in the dataset. Even though the female poets used hubb more than male poets, the overall trend is very similar. We see hubb taking the biggest share of love words, while hawa taking the second place, and ‘ishq in the third place. I don’t think there’s much to be said about the differences in usage between male and female poets, given the large disparity in the sample size, and that overall pattern doesn’t seem to deviate much between the two.</p>

<h2 id="stylistic-function">Stylistic Function</h2>

<p>Now for the parts I’m most looking forward to: Exploring how exactly these words have been used. First, I wanted to see how often different words of love co-occur in the same poem.</p>

<figure class=""><a href="/assets/images/posts/arabic-love-word-analysis/cooccurence_per_poem.png" class="image-popup"><img src="/assets/images/posts/arabic-love-word-analysis/cooccurence_per_poem.png" alt="Distribution of Distinct Love Words per Poem" /></a></figure>

<p>Most of the poems include only one unique love word; around 13,500 poems include two unique words. Very few poems include more than 7 unique love words. The maximum love words in one poem was 10 words! And that happened in 3 poems! Now, let’s look at the average number of unique love words per poem in different eras.</p>

<figure class=""><a href="/assets/images/posts/arabic-love-word-analysis/cooccurences_over_era.png" class="image-popup"><img src="/assets/images/posts/arabic-love-word-analysis/cooccurences_over_era.png" alt="Average Distinct Love Words per Poem by Era" /></a></figure>

<p>There is a clear upward trend here despite the fluctuations. This might suggest that poets in later eras were more experimental in their use of love vocabulary, or that there is a gradual increase in lexical diversity and a stylistic shift toward richer rhetorical variation in the words of love, reaching a peak in the Ottoman period, where the average poem uses nearly two different love words!</p>

<p>So now we know it was not uncommon for distinct love words to appear together in the same poem. But is there a pattern to which words appeared together?</p>

<p>If you have been following along with the previous project, you might have guessed what I am about to do: A Network Analysis!</p>

<p>If you want to learn what a network analysis is or if you just want a refresher, I go into the details in <a href="/posts/greek-myth-network1/" target="_blank" rel="noopener noreferrer">Mapping the Mythos Part I</a>. If you’re ready, let’s keep going. We can create a network made of co-occurrences of these love words within each poem, and then what we need to look at specifically is whether this network of words splits into separate communities. The communities would represent the words that most often appear together. So here we go!</p>

<figure class=""><a href="/assets/images/posts/arabic-love-word-analysis/communities.png" class="image-popup"><img src="/assets/images/posts/arabic-love-word-analysis/communities.png" alt="Communities of Love Word Co-occurrence" /></a></figure>

<p>We can see something very interesting. The network splits into two communities. The first community is made up of the words hubb, hawa, wudd, sababa, ‘ishq, gharam, while the rest of the words make up the second community. But look at the actual words in these communities. They correspond quite nicely to the two main polarities (positive and negative) in the semantic field of love, with one exception (gharam and shaghaf have switched teams!). This is quite interesting, and it does suggest one thing: as Arabic poetry has evolved over time, the poets have still preserved this distinction (albeit probably unconsciously) between the positive and negative words of love. This is just a suggestion, but what we can clearly see is that there is a tendency for poets to use the love words that have the same polarity in the same poem. The reasons for that could be an awareness of the semantic distinctions between these words, or it could be, as mentioned earlier, that as lexical usage evolved, writers and poets may have used the negative words of love unconsciously to express <em>intense</em> cases of love<sup id="fnref:6:4" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>. Can we know for sure? Probably not. Buuuuut, we can definitely explore this a bit further.</p>

<h2 id="sentiment-analysis">Sentiment Analysis</h2>

<p>Luckily, we have predicted the sentiment of each poetry verse that contains a word of love. We can therefore analyze whether each of the love words has a tendency to appear in a verse with a specific sentiment. We can use the <a href="https://en.wikipedia.org/wiki/Lift_(data_mining)" target="_blank" rel="noopener noreferrer">lift</a> to compute the association between each love word and the sentiment classes.</p>

<div class="notice--danger">
  
<p><strong>The dreaded terminology alert</strong>: I will take a moment to explain “lift” as painlessly as possible. If you already know it, feel free to ignore this box.</p>

<p>Just think of lift as a way to see if two things happen <strong>together</strong> more often than you’d expect by chance. If lift is equal to 1, then these two things happen together just by coincidence, like two strangers wearing the same color shirt. Nothing special. If lift is greater than 1, then these things happen together more often than just by chance, like mornings and coffee! They are probably connected. If lift is less than 1, then these things happen less often together than pure chance, like cats and bathtubs! They probably actively avoid each other.</p>

<p>For the mathematically curious, I’m going to slide this in right here, but feel free to ignore:</p>

\[\text{Lift}(A, B) = \frac{P(A \cap B)}{P(A) \cdot P(B)}\]


</div>

<p>After calculating the lift for each love word with the three sentiment classes (neutral, positive, negative), we can construct a scale of “increasing” sentiment. The scale will start with the words that are most associated with neutral sentiment, then move into the words associated with positive sentiment, and finally, the words associated with negative sentiment. And we can order them based on their lift values (higher lift would correspond to a higher association with a sentiment, and therefore it would be further on the scale)</p>

<figure class=""><a href="/assets/images/posts/arabic-love-word-analysis/word_sentiment_spectrum.png" class="image-popup"><img src="/assets/images/posts/arabic-love-word-analysis/word_sentiment_spectrum.png" alt="Love Words Along Polarity Path (Neutral → Positive → Negative)" /></a></figure>

<p>This is Beautiful! This also corresponds well to the spectrum of the semantic field of love. There are two words here that are more associated with neutral sentiment (hawa and wudd), which are the first two words in the semantic field (the lighter words). Then there are three words associated with positive sentiment (‘ishq, hubb, and shaghaf) which are also part of the positive spectrum. Sababa is the only word that is part of the positive spectrum of love, which appears more associated with negative sentiment in poetic verses.</p>

<p>We’ve seen that not only do these words of love form two separate communities that almost fully correspond to the positive and negative sides of the spectrum of love (semantically), but also that they also exhibit distinct patterns of usage in poetry that correspond to the emotional sentiments of that spectrum. I think this does show that there is some evidence to suggest that poets are making conscious choices about the words they use based on their emotional connotations, at least it does not exclude the possibility of them being aware of these nuances, even in modern poetry.</p>

<p><br /></p>

<h1 id="conclusion">Conclusion</h1>

<p>we have a saying, <span dir="rtl" lang="ar">يَحِقُّ لِلشَّاعِرِ مَا لَا يَحِقُّ لِغَيْرِهِ</span><button type="button" class="pronounce-btn" data-audio="/assets/sounds/arabic-love-word-analysis/poetic-license-proverb.mp3" aria-label="Play pronunciation of poetic license proverb" title="Play pronunciation">🔊</button>
(yahiqqu lil-shaʿir ma la yahiqqu lighayrihi), which translates to <em>The poet has the right to do what others do not.</em> It highlights the unique creative liberties that poets can take in their work, allowing them to express emotions and ideas in ways that may not be permissible in other forms of discourse. I wanted to find out whether poets take advantage of the huge lexicon of love in order to make their verses fit in with the poetic rhythms more easily (regardless of whether this word fits the sentiment). In other words, do they cheat the system? Who’s going to know whether this or that word of love is more appropriate anyway, given that most modern Arabic readers are unaware of the differences between these words? And based on what we have seen from the data, I am inclined to say that they do not!</p>

<p>We’ve seen how the words they use cluster together, and how they are more likely to be used in specific sentiments that generally align with the classical semantic field of love. This suggests that poets are aware of the subtle differences between these words and they use them intentionally to convey specific emotional tones.</p>

<p>Well, this was fun! As usual, you can find all the data and code I used on <a href="https://github.com/eliemaalouly/arabic-love-word-analysis" target="_blank" rel="noopener noreferrer">GitHub</a>. Feel free to explore the data yourself and prove me wrong!</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>This is a famous anecdote about Abu al-‘Ala al-Ma‘arri (973–1057) from <em>“Al-Lta’if fi al-Lugha, Ahmad ibn Mustafa al-Bayeedi, Dar al-Fadila, Cairo”</em>. In Arabic rhetoric, simply calling someone a dog was considered a low, unsophisticated insult. A refined speaker, especially in a gathering of scholars, could have chosen from many other, more eloquent expressions or metaphors. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://www.academia.edu/32748592/Names_of_the_Lion_by_Ibn_Kh%C4%81lawayh" target="_blank" rel="noopener noreferrer">Larsen, David. Names of the Lion by Ibn Khālawayh.</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>McDonald, M. V. (1978). Orally transmitted poetry in pre-Islamic Arabia and other pre-literate societies. <em>Journal of Arabic Literature</em>, 14-31. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://retrospectjournal.com/2025/02/16/nomadic-and-nostalgic-how-pre-islamic-arabian-poetry-reflected-and-reinforced-the-contemporary-bedouin-lifestyle" target="_blank" rel="noopener noreferrer">“Nomadic and Nostalgic: How Pre-Islamic Arabian Poetry Reflected and Reinforced the Contemporary Bedouin Lifestyle”. Retrospect Journal. February 16, 2025.</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://archive.org/details/REWATULMUIBBNWeNuzhetuLMutqnIbnQajjimElDewzijjeh" target="_blank" rel="noopener noreferrer">“rawdat almuhibiyn wanuzhat almushtaqin”. ibn qiam aljawzia. Internet Archive</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p>Obiedat, A. Z. (2017). The Semantic Field of Love in Classical Arabic: Understanding the Subconscious Meaning Preserved in the Ḥubb Synonyms and Antonyms through Their Etymologies. The Beloved in Middle Eastern Literatures: The Culture of Love and Languishing. Ur Alirea Korangy, Hanadi Al-Samman, and Michael C. London, New York: IB Tauris, 300-323. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a> <a href="#fnref:6:1" class="reversefootnote" role="doc-backlink">&#8617;<sup>2</sup></a> <a href="#fnref:6:2" class="reversefootnote" role="doc-backlink">&#8617;<sup>3</sup></a> <a href="#fnref:6:3" class="reversefootnote" role="doc-backlink">&#8617;<sup>4</sup></a> <a href="#fnref:6:4" class="reversefootnote" role="doc-backlink">&#8617;<sup>5</sup></a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p>Qarah, F. (2023). AraPoems: An Extensive Dataset of Arabic Poetry Associated with Verses, Rhymes, Meters, and More (Version V1) <a href="https://doi.org/doi:10.7910/DVN/PJPWOY" target="_blank" rel="noopener noreferrer">dataset</a>. Harvard Dataverse. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p>The eras are not perfectly chronological, as they represent not only different time periods, but also different geographical areas and political entities. The Arabic-speaking world was sometimes divided into more than one governance, and each developed its own literary traditions. Sometimes caliphates overlapped in time but differed in space, resulting in different “eras”. An example of this is the Abbasid caliphate from 750 to 1517 and a rival caliphate (the Fatimid) from 909 to 1117. The Abbasid caliphate was recognized in most of the Sunni regions, while the Fatimid caliphate was recognized in North Africa and Egypt. The poets are thus placed in an “era” based on the dominant caliphate in their region during their lifetime. And some “eras” are only literary eras, such as the “seasoned” and the “dual-eras” which represent poets that were active between two time periods. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p>Poets who were active in both the pre-Islamic and early Islamic periods. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p>Poets who were active in both the late Umayyad and the early Abbasid periods. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p>Do you remember how the test compares the observed value to the expected value? Here, by measuring how far the word is from the expected value, we can quantify to what extent the word appeared more or less than expected. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Elie Maalouly</name></author><summary type="html"><![CDATA[How do poets name love? A data-driven exploration of 14 Arabic words for love that traces their etymological roots, poetic functions, and shifting meanings across twelve historical eras. Using a 2-million-verse corpus and sentiment analysis, the analysis shows how word choice maps to emotional polarity and co‑occurrence communities, revealing that poets consistently exploit subtle lexical differences to shape tone, intensity, and literary effect. The data and code for this project is available on GitHub.]]></summary></entry><entry><title type="html">Mapping the Mythos: Part II</title><link href="https://quantifiedcuriosities.com/posts/greek-myth-network2/" rel="alternate" type="text/html" title="Mapping the Mythos: Part II" /><published>2025-06-18T00:00:00+02:00</published><updated>2025-06-18T00:00:00+02:00</updated><id>https://quantifiedcuriosities.com/posts/greek-myth-network2</id><content type="html" xml:base="https://quantifiedcuriosities.com/posts/greek-myth-network2/"><![CDATA[<p>In the first part of this series, we laid the groundwork by scraping datasets of Greek mythological texts and characters, extracting connections between them. Now, in Part II, we delve into the network analysis of these mythological figures to uncover who truly holds the center stage in this ancient drama. As mentioned in the first part, the code for this project is available on <a href="https://github.com/eliemaalouly/greek-myth-network" target="_blank" rel="noopener noreferrer">GitHub</a>.</p>

<p><br /></p>

<h1 id="basic-network-statistics">Basic Network Statistics</h1>

<p>Using the edges and nodes datasets that were created in the first part, we can get some basic network statistics with the help of the <a href="https://networkx.org/" target="_blank" rel="noopener noreferrer">NetworkX</a> library.</p>

<p class="notice--danger"><strong>The dreaded terminology alert</strong>: Several new terms will be introduced, which may be overwhelming and frankly a bit boring! I’ll try to make it as painless as possible by explaining all these new terms in the simplest way possible. Feel free to skip ahead if you are already familiar with them.</p>

<p>I am going to introduce some key concepts in network analysis required to understand the basic statistics. This might get boring but just bear with me. From Part I, you are already familiar with the concepts of nodes and edges. Here are some additional terms that will be used in the analysis:</p>
<ul>
  <li><strong>Average degree</strong>: The average number of connections (edges) each node (character) has in the network. It gives an idea of how interconnected the network is. For example, if the average degree is 100, it means each character is connected to about 100 other characters on average, indicating a highly interconnected network. On the other hand, if the average degree is 1, it means most characters are only connected to one other character, indicating a sparse network.</li>
  <li><strong>Density</strong>: The ratio of the number of edges in the network to the maximum possible number of edges. It ranges from 0 (no connections) to 1 (complete connectivity). A density of 0.1 means that only 10% of all possible connections between characters are actually present. For example, let’s keep this simple and imagine we have a network of only 4 characters (A, B, C, D). The maximum <strong>possible</strong> number of edges in this network is 6 (A-B, A-C, A-D, B-C, B-D, C-D). So 6 edges are just how many are possible theoretically; However, If we see that we <strong>actually</strong> only have 2 edges (A-B and C-D), the density would be 2/6 = 0.33 or 33%. A lower percentage indicates a more sparse network, while a higher percentage indicates a more connected network.</li>
  <li><strong>Connected components</strong>: Subsets of the network where every node is reachable from every other node within that subset. If a network has <strong>multiple</strong> connected components, it means there are isolated groups of characters that do not interact with each other. Let’s take a simple example with 6 characters: A, B, C, D, E, F. If A is only connected to B and C, and D is only connected to E and F, we have two connected components: {A, B, C} and {D, E, F}, because any character in {A, B, C} can reach any other character in that group<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, and the same goes for {D, E, F}. However, there are no connections between one component to the other, this means that characters in one component cannot interact with characters in the other component.</li>
  <li><strong>Diameter</strong>: The longest shortest path (Could they have chosen a more confusing term?) between any two nodes in the network. Let’s say we have a network of 5 characters: A, B, C, D, E. If A is connected to B, B to C, C to D, and D to E, the diameter would be 4 because the longest shortest path between any two characters is A to E (A → B → C → D → E). This means that the longest distance between any two characters in the network is 4 steps. If you still find this confusing, I like to think of it as traveling between cities. There are hundreds of cities in your state. You have multiple ways to get from one city to another. You can take a train that passes through 3 other cities, or you can take a bus that passes through 5 other cities. The <strong>diameter</strong> is the longest distance you can travel between any two cities in the network, which in this case is 5 cities. In other words, it’s the worst-case scenario for how far apart two characters can be in the network. It gives us an idea of how “spread out” the network is. If the diameter is small, it means that most characters are relatively close to each other in terms of connections. If the diameter is large, it means that some characters are very far apart in the network.</li>
  <li><strong>Average shortest path length</strong>: The average number of edges in the shortest path between all pairs of nodes. As a simple example, let’s say we have a network of 4 characters: A, B, C, D. If A is connected to B and C, and B is connected to D, the shortest path between A and D is 2 (A → B → D). The shortest path between B and C is 1 (B → C). The shortest path between C and D is 2 (C → B → D). We repeat this for all pairs of characters. The average shortest path length would be the average of all shortest paths between all pairs of characters. If we calculate it for all pairs in this example, we find that the average shortest path length is 1.5. This metric helps us understand how quickly information can travel through the network. In our example, the average shortest path length is 1.5, which means that on average, you can reach any character from any other character in just 1.5 steps. This is a sign of a small-world network.</li>
</ul>

<p>Phewww. That was a lot of new terms, but I hope it was clear enough. Now let’s make use of these terms!</p>

<p>The mythological character network extracted from the corpus consists of <strong>943 unique characters</strong> (nodes) and <strong>26,696 undirected<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> relationships</strong> (edges) derived from co-occurrences in myth texts. This structure provides a foundational view of how interconnected the figures of Greek mythology are.</p>

<p>The network has an <strong>average degree of 57</strong>, meaning each character is directly connected to roughly 57 others on average. The network exhibits a moderate level of connectivity, with a density of <strong>0.06</strong>. This indicates that only a small fraction (<strong>6%</strong>) of all possible character pairs are actually connected, a reflection of how mythological figures tend to appear in limited narrative contexts or clusters.</p>

<p>Remarkably, the network is almost entirely connected. The full graph is composed of <strong>two connected components</strong>, suggesting that Greek mythology, as represented in this dataset, contains two isolated narrative groups. Looking closer at these groups, we find that the largest component contains <strong>941 characters</strong>, accounting for almost all of the mythological interactions. This indicates that nearly all known mythological figures appear in a shared narrative universe, with overlapping appearances across stories and texts. The smaller group contains only two characters, namely <strong>Phantasos</strong> and <strong>Phobetor</strong>. These characters only made one appearance in Ovid’s Metamorphoses Book 11 where they co-occurred in one line mentioning the sons of Morpheus:</p>

<blockquote>
  <p>That son, by the gods above was called Icelos—but the inhabitants of earth called him Phobetor—and a third son, named Phantasos, cleverly could change himself into the forms of earth that have no life; into a statue, water, or a tree.</p>
</blockquote>

<p class="small"><cite>Ovid</cite> — Metamorphoses, Book 11, Section 8: <strong>“Ceyx &amp; Halcyone”</strong></p>

<p>As these two characters were not mentioned<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> anywhere else in our dataset, they formed their own “isolated island”, completely disconnected from all the other 941 characters living in one main narrative group.</p>

<p>Focusing on the main cluster, the <strong>diameter</strong>, the longest shortest path between any two characters, is <strong>5</strong>, while the <strong>average shortest path length</strong> is <strong>2.15</strong>. These values demonstrate a clear small-world property, where even distant characters are rarely more than two degrees apart, reinforcing the idea that Greek myths are densely interlinked despite covering vast thematic and temporal ground.</p>

<p><strong>So what do all these network statistics tell us?</strong> Well, I wouldn’t say they show anything surprising so far (except for maybe the average path length), but they do provide a solid foundation for further analysis. The network is highly interconnected, with most characters appearing in multiple stories and contexts. The small-world property suggests that any two characters (whether minor or major) can quickly connect to each other in about 2 steps on average (which is insane if you think about it given the huge number of characters!), reflecting the fluidity and adaptability of Greek mythology as a narrative tradition.</p>

<p><br /></p>

<h1 id="centrality-analysis">Centrality Analysis</h1>

<p>Now let’s get to the juicy part of the analysis: <strong>centrality measures</strong>. To evaluate which characters play structurally important roles in Greek mythology, we applied several centrality metrics (<span title="Who interacts with the most people?">degree</span>, <span title="Who connects different groups?">betweenness</span>, <span title="Who has powerful friends?">eigenvector</span>, and <span title="Who gets attention from important people?">Pagerank</span>) from network theory. Each measure highlights a different dimension of influence, from sheer popularity to narrative bridge-building or deeper systemic significance. With <a href="https://networkx.org/" target="_blank" rel="noopener noreferrer">NetworkX</a>, we can easily compute these centrality measures for our mythological network. Let’s take a look at the top 10 characters for each centrality measure:</p>

<figure class=""><a href="/assets/images/posts/greek-myth-network2/ranked_centrality_charts.png" class="image-popup"><img src="/assets/images/posts/greek-myth-network2/ranked_centrality_charts.png" alt="" /></a></figure>

<p>We can see that the top 10 spots for each centrality measure are dominated by almost the same characters (Olympian gods), with some minor variations. This indicates that these characters are not only popular (high <span title="Who interacts with the most people?">degree centrality</span>), but also play important roles in connecting different groups (high <span title="Who connects different groups?">betweenness centrality</span>), have powerful friends (high <span title="Who has powerful friends?">eigenvector centrality</span>), and get attention from important people (high <span title="Who gets attention from important people?">Pagerank centrality</span>). Zeus, the king of the gods, unsurprisingly takes the top position in all four measures, followed closely by Apollo and Heracles who are fighting for the second place.</p>

<p>Heracles is the only non-Olympian to make it to the top 10 in all of the measures, which is quite interesting. He is a demigod, the son of Zeus and Alcmene, and is known for his incredible strength and heroic deeds. His presence in the top ranks (let alone consistently between the 2<sup>nd</sup> and 3<sup>rd</sup> place) indicates that he plays a significant role in the mythological network, connecting various characters and stories. The only mortal character to make it to the top 10 is Odysseus, who is known for his cunning and intelligence. He is the protagonist of Homer’s “Odyssey” and is often portrayed as a hero who overcomes great challenges through his wit and resourcefulness. Odysseus only infiltrates the top 10 in <span title="Who connects different groups?">betweenness centrality</span> obtaining 7<sup>th</sup> place, which indicates that he plays a significant role in connecting different groups of characters, but is not as popular or influential as the Olympian gods. His role in the network is more about bridging gaps between different stories and characters, rather than being a central figure in the mythology itself.</p>

<p><br /></p>

<h1 id="community-insights">Community Insights</h1>

<p>Let’s see what  else we can learn from the network. We can use community detection algorithms to identify clusters of characters that frequently interact with each other. This can help us understand how characters are grouped based on their connections and how these groups relate to each other. If we think of the network as one big party, some people might form their own tight-knit circles where they chat and interact more frequently, while others might be more isolated or only interact with a few people. Community detection algorithms can help us identify these circles and see how they relate to each other.</p>

<p>I applied a community detection algorithm with <a href="https://networkx.org/" target="_blank" rel="noopener noreferrer">NetworkX</a> to identify clusters of characters that frequently interact with each other. Using the <a href="https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.modularity_max.greedy_modularity_communities.html" target="_blank" rel="noopener noreferrer">greedy modularity maximization</a> method, I uncovered 7 distinct communities  with highly imbalanced sizes within the network, revealing how characters are grouped based on their connections.</p>

<figure class=""><a href="/assets/images/posts/greek-myth-network2/communities_sizes.png" class="image-popup"><img src="/assets/images/posts/greek-myth-network2/communities_sizes.png" alt="" /></a></figure>

<p>We can see 2 large communities, 1 medium-sized community, and 4 small-sized communities. The three largest communities account for the vast majority of the characters:</p>

<ol>
  <li>Community 1: 417</li>
  <li>Community 2: 355</li>
  <li>Community 3: 148</li>
</ol>

<p>The remaining communities contain just a few characters each, often due to isolated stories or minor connections.</p>

<p>Let’s take a look at the most central characters in each of the top 3 communities</p>

<table class="text-center">
  <thead>
    <tr>
      <th>Community 1</th>
      <th>Community 2</th>
      <th>Community 3</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Zeus</td>
      <td>Achilles</td>
      <td>Athena</td>
    </tr>
    <tr>
      <td>Apollo</td>
      <td>Agamemnon</td>
      <td>Eleusis</td>
    </tr>
    <tr>
      <td>Heracles</td>
      <td>Odysseus</td>
      <td>Amphictyon</td>
    </tr>
    <tr>
      <td>Poseidon</td>
      <td>Theseus</td>
      <td>Stymphalus</td>
    </tr>
    <tr>
      <td>Demeter</td>
      <td>Ares</td>
      <td>Phorbas</td>
    </tr>
    <tr>
      <td>Artemis</td>
      <td>Paris</td>
      <td>Niobe</td>
    </tr>
    <tr>
      <td>Aphrodite</td>
      <td>Sparta</td>
      <td>Tantalus</td>
    </tr>
    <tr>
      <td>Hera</td>
      <td>Priam</td>
      <td>Deucalion</td>
    </tr>
    <tr>
      <td>Hermes</td>
      <td><span class="easter-egg-container">Helen of Troy<span class="easter-egg">🥚</span></span></td>
      <td>Asopus</td>
    </tr>
    <tr>
      <td>Cronus</td>
      <td>Asclepius</td>
      <td>Xuthus</td>
    </tr>
  </tbody>
</table>

<p>Looking strictly at the most central characters we can see a dominance of Olympian gods in Community 1, while Community 2 is dominated by characters of the Trojan War. I tried to find a pattern in how these communities are structured, but nothing seems apparent, except that characters from Epics seem to populate Community 2, while characters from plays seem to populate Community 1. While community detection identified seven distinct structural clusters, their boundaries do not always align neatly with literary sources or periods. This reflects the deeply interwoven nature of Greek mythology, where key figures reappear across genres and eras. Instead, some communities seem to reflect thematic or functional groupings rather than strict textual divisions. But overall, the interwoven nature of Greek mythology is evident, where the structure of the communities reflects a complex web of relationships that transcends individual texts and periods. Characters frequently appear across multiple communities, indicating their multifaceted roles in the mythological narrative.</p>

<p><br /></p>

<h1 id="shifts-in-character-roles">Shifts in Character Roles</h1>

<h2 id="transcultural-dynamics">Transcultural Dynamics</h2>

<p>Greek mythology evolved over centuries, shaped by shifting literary traditions, political landscapes, philosophical influences, and cultural exchanges. Although Roman adaptations of Greek myths often retained core characters and narratives, they introduced new interpretations and emphases. They were not truly a <a href="/assets/images/posts/greek-myth-network2/greek-roman.png" class="image-popup">copy of Greek mythology</a>, but rather a reinterpretation that reflected Roman values and beliefs. To explore how the roles of characters changed across those two cultures, I used the language of the original texts as a proxy for culture (Greek language for Greek texts, Latin for Roman texts). I then calculated the centrality measures separately depending on whether the edge was extracted from a Greek or Latin text. This allows us to compare the centrality ranks of characters in Greek and Latin mythological texts, revealing how their roles evolved across cultures.</p>

<p>Let’s take a look at the top 5 characters in each centrality measure for both Greek and Latin texts.</p>

<figure class=""><a href="/assets/images/posts/greek-myth-network2/centrality_language.png" class="image-popup"><img src="/assets/images/posts/greek-myth-network2/centrality_language.png" alt="" /></a></figure>

<p>We can see that there is a significant difference in the centrality ranks of characters between the Greek and Latin texts. The most striking difference is the fall of Zeus from the 1<sup>st</sup> place in all centrality measures in the Greek texts to the 2<sup>nd</sup> place in <span title="Who connects different groups?">betweenness</span> and <span title="Who gets attention from important people?">Pagerank centrality</span> in the Latin texts while falling off the top 5 completely in the remaining measures! Zeus was dethroned in the Latin texts by Apollo who took the top spot in all centrality measures!</p>

<p>Another notable change is the decrease in the influence of Olympians in the Latin texts. The top 5 ranks in Greek texts are completely dominated by Olympian gods, with the exception of Heracles. However, in the Latin texts, Apollo is the only Olympian to make it to the top 5 in <span title="Who interacts with the most people?">degree centrality</span>. Apollo and Zeus are the only Olympians in the top 5 in <span title="Who has powerful friends?">eigenvector centrality</span>, while Apollo and Poseidon are the only Olympians in the top 5 in <span title="Who gets attention from important people?">Pagerank centrality</span>. Only in the case of <span title="Who connects different groups?">betweenness centrality</span> do we see a complete dominance of Olympians, with Apollo, Zeus, Poseidon, and Athena taking 4 of the top 5 spots. This indicates that the roles of characters in Greek mythology changed significantly when adapted into Latin texts. Olympian gods are now less central to the narrative (with the exception of Apollo). Their diminished dominance in <span title="Who has powerful friends?">eigenvector</span> and <span title="Who gets attention from important people?">Pagerank centrality</span> suggests a shift in narrative focus, with other characters gaining prominence in the Latin adaptations. This could reflect changing cultural values or narrative priorities in Roman society, where the emphasis may have shifted from divine authority to human agency. However, the role of the Olympians is still significant, as they still occupy the top ranks in <span title="Who connects different groups?">betweenness centrality</span> and <span title="Who interacts with the most people?">degree centrality</span>. This indicates that they still play important roles in connecting different groups of characters, but their influence is not as dominant as it was in the Greek texts.</p>

<p>Next, let’s take a look at characters that have a significant difference in their centrality ranks between the Greek and Latin texts. This will help us identify characters whose roles changed significantly across cultures, either gaining or losing influence in the mythological narrative. First, the Olympians with the most extreme change (more than 200 rank change) in their <span title="Who interacts with the most people?">degree centrality</span> ranks between the Greek and Latin texts are:</p>

<table class="text-center">
  <thead>
    <tr>
      <th>Olympian</th>
      <th>Greek Rank</th>
      <th>Roman Rank</th>
      <th>Rank Change</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Hestia</td>
      <td>497</td>
      <td>226</td>
      <td>+271</td>
    </tr>
    <tr>
      <td>Hebe</td>
      <td>264</td>
      <td>528</td>
      <td>-264</td>
    </tr>
    <tr>
      <td>Hermes</td>
      <td>7</td>
      <td>226</td>
      <td>-219</td>
    </tr>
  </tbody>
</table>

<p>Next, Greek deities (non-Olympians) with the most extreme change (more than 500 rank change) in their <span title="Who interacts with the most people?">degree centrality</span> ranks between the Greek and Latin texts are:</p>

<table class="text-center">
  <thead>
    <tr>
      <th>Deity</th>
      <th>Greek Rank</th>
      <th>Roman Rank</th>
      <th>Rank Change</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scylla</td>
      <td>685</td>
      <td>41</td>
      <td>+644</td>
    </tr>
    <tr>
      <td>Hymen</td>
      <td>880</td>
      <td>282</td>
      <td>+598</td>
    </tr>
    <tr>
      <td>Dionysus</td>
      <td>10</td>
      <td>597</td>
      <td>-587</td>
    </tr>
    <tr>
      <td>Helle</td>
      <td>651</td>
      <td>69</td>
      <td>+582</td>
    </tr>
    <tr>
      <td>Pheme</td>
      <td>637</td>
      <td>59</td>
      <td>+578</td>
    </tr>
    <tr>
      <td>Atropos</td>
      <td>762</td>
      <td>212</td>
      <td>+550</td>
    </tr>
    <tr>
      <td>Asclepius</td>
      <td>52</td>
      <td>597</td>
      <td>-545</td>
    </tr>
    <tr>
      <td>Chaos</td>
      <td>637</td>
      <td>94</td>
      <td>+543</td>
    </tr>
    <tr>
      <td>Iacchus</td>
      <td>827</td>
      <td>315</td>
      <td>+512</td>
    </tr>
    <tr>
      <td>Helios</td>
      <td>85</td>
      <td>597</td>
      <td>-512</td>
    </tr>
  </tbody>
</table>

<p>Now let’s take a look at the top 10 characters with the most extreme change in their <span title="Who interacts with the most people?">degree centrality</span> ranks between the Greek and Latin texts.</p>

<figure class=""><a href="/assets/images/posts/greek-myth-network2/centrality_rank_change_language.png" class="image-popup"><img src="/assets/images/posts/greek-myth-network2/centrality_rank_change_language.png" alt="" /></a></figure>

<p>We do see some extreme changes in the ranks of some characters, Hestia, for example, has a rank change of +271 places, moving from 497th place in the Greek texts to 226th place in the Latin texts. This is an interesting change, especially considering that Hestia (Vesta in Roman mythology) became a more prominent figure in Roman mythology, where she was associated with the hearth and home. Her increased rank in the Latin texts reflects her elevated status in Roman culture, where she was regarded as a goddess of domesticity and family life and associated with the much-revered order of the Vestal Virgins who were in charge of maintaining the sacred fire and performed rituals ensuring Rome’s protection.</p>

<p>We also see a significant drop in the <span title="Who interacts with the most people?">degree centrality</span> rank of Hermes from 7th place in the Greek texts to 226th place in the Latin texts. This indicates that Hermes, the messenger god, lost a considerable amount of influence in the Latin texts compared to the Greek ones. This change does make sense when we consider that the Roman equivalent of Hermes, Mercury, was given a different role<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> in Roman mythology, where he was more associated with commerce and trade rather than being a messenger of the gods. In Latin texts, Mercury appears more as a functional deity, often in lists and rituals, but barely as a character in stories. As Roman culture absorbed Greek mythology, and mixed it with Etruscan influences and their own traditions, the role of Hermes in the Greek texts was split among various deities in Roman mythology, such as Janus<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> who was regarded as the guardian of doorways and transitions. This shift in focus likely contributed to his reduced centrality in the Latin texts.</p>

<p>I don’t know about you, but this was shocking to me! Dionysus has gone from being in the top 10 in Greek texts to being ranked 597<sup>th</sup> in Latin texts! I took a look at his other centrality measures for an extra check. There is a drop in his <span title="Who connects different groups?">betweenness centrality</span> rank from 6th place in Greek texts to 307th place in the Latin ones. The drop is more severe in <span title="Who gets attention from important people?">Pagerank centrality</span> from 7th place to 628! For <span title="Who has powerful friends?">eigenvector centrality</span>, a drop from 4th place to 639! That’s a drop of 635 places! This is an unimaginable change, indicating that his role in the mythological network has diminished from a major god in the Greek texts to an almost minor insignificant character in the Latin texts. But why is that? How can the role of such a major Greek god be so drastically reduced in the Latin texts?</p>

<p>After a bit of digging, there is a plausible and very interesting explanation for this. In Roman mythology, Dionysus was referred to as Bacchus<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>, who was associated with wine, fertility, and ritual madness. In ancient Greek society, Dionysus had a special place. He was central to mystery cults such as the Dionysian Mysteries<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>, which were secret religious rites that celebrated the god’s death and rebirth and promised salvation and ecstatic experiences. He was also intimately tied to the theater, as many Greek plays were performed in his honor during festivals like the City Dionysia<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup> in Athens. Dionysus symbolized the liberation of the self from societal constraints, especially through intoxication and ritual madness. On the other hand, Roman religion was more structured, public, and politically oriented. It emphasized social order and civic duty over ecstatic experience. Bacchic rituals, associated with secrecy, sexual liberation, and emotional excess, conflicted with Roman values of Gravitas and constantia<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>. Then came The Bacchanalian Scandal<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup> in 186 BCE, where the perception of Bacchus as a dangerous figure reached its peak, with descriptions of the Bacchanalia as subversive and immoral, implicating them in criminal activity, sexual promiscuity, and political conspiracies. This culminated in the Roman Senate’s decree, the “Senatus consultum de Bacchanalibus”<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>, which severely restricted the Bacchic cult and its practices. The decree effectively banned the Bacchanalia, limiting their gatherings to only a few times a year and under strict supervision. This marked a significant shift in how Bacchus was perceived in Roman society, transforming him from a major deity associated with ecstatic worship to a more subdued and controlled figure.</p>

<p>But this is possibly not really about preventing the moral deterioration of Roman society, but rather a political move to consolidate power and control over the common people. The Roman elite saw the Bacchanalia as a threat to their authority, as the cult’s secretive nature and ecstatic rituals could potentially challenge the established order. In Rome, Bacchus was the champion of the Plebians, the common people, during commemorations like the Liberalia<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">12</a></sup>. He symbolized freedom and social equality, resonating strongly with the lower classes. The Bacchic cult crossed lines of class and gender, including in its ranks slaves, freedmen, women, and men, thus becoming a socially disruptive force<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">13</a></sup>. Dionysus was also linked to Spartacus, who was believed to be Thracian<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">14</a></sup>, and became a symbol of his revolution. His wife (according to Plutarch) was a <em>“prophetess and subject to visitations of the Dionysian frenzy”</em><sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">15</a></sup>. Dionysus became the rallying cry for a number of Rome’s opponents such as the rebel slaves in Sicily in the second century B.C.E., and the Anatolian King Mithradates of Pontus, whose protracted battle against Rome was still ongoing at the time of Spartacus’s uprising in 73 B.C.E.<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">16</a></sup> I believe this to be a plausible explanation for the drastic change in the role given to Dionysus in the Latin texts. The Roman upper classes’ fear of the Bacchic cult’s potential to disrupt social order and challenge their authority led to a significant reduction in the prominence of Bacchus in their texts (and let’s not forget that most if not all the texts we have today were written by the upper classes who had the education, resources, and time to write), resulting in his diminished centrality in the mythological network.</p>

<p>Ok, that was a long detour, but I hope you found it as fascinating as I did! Here’s a <a href="/assets/images/posts/greek-myth-network2/duck-bacchus.png" class="image-popup">Dionysian duck</a> as a celebration for all of us!</p>

<h2 id="gender-distribution">Gender Distribution</h2>

<p>Let’s finally look at how centrality is distributed across genders. First, I looked at how the characters are distributed across the genders.:</p>

<table class="text-center">
  <thead>
    <tr>
      <th>Gender</th>
      <th>Count</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Male</td>
      <td>614</td>
    </tr>
    <tr>
      <td>Female</td>
      <td>312</td>
    </tr>
    <tr>
      <td>Male organism</td>
      <td>4</td>
    </tr>
    <tr>
      <td>Female organism</td>
      <td>4</td>
    </tr>
    <tr>
      <td>Trans man</td>
      <td>2</td>
    </tr>
    <tr>
      <td>Hermaphrodite</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Intersex</td>
      <td>1</td>
    </tr>
  </tbody>
</table>

<p>Due to the very low number of non-binary characters, I have only included Male and Female characters in this analysis. It is evident there is a much larger representation of male characters, where around <strong>66.3%</strong> of the characters are male and <strong>33.7%</strong> are female. But how about the centrality measures? Are there any differences in the centrality measures between male and female characters? I first checked the average centrality ranks for male and female characters. In all of the centrality measures, male characters had a higher score, which is not surprising but also not that interesting. Looking at the average alone can be misleading, as it does not take into account the distribution of the characters. So let’s look at the distribution of the <span title="Who gets attention from important people?">Pagerank centrality</span> ranks.</p>

<figure class=""><a href="/assets/images/posts/greek-myth-network2/pagerank_by_gender.png" class="image-popup"><img src="/assets/images/posts/greek-myth-network2/pagerank_by_gender.png" alt="" /></a></figure>

<p>From the figure, we can see that the lower ranks (more central characters) are more densely populated by male characters, while the higher ranks (less central characters) are more densely populated by female characters. It is clear that male characters are therefore given a more prominent role when we look at the Greek mythological universe as a whole (even though the figure only shows <span title="Who gets attention from important people?">Pagerank centrality</span>, The distribution of all the other centrality measures shows a similar pattern where male characters dominate the most central ranks).</p>

<p>Is this pattern still present when we compare the Greek and Latin texts? Let’s take a look at the <span title="Who interacts with the most people?">degree centrality</span> and <span title="Who gets attention from important people?">Pagerank centrality</span> ranks of the male and female characters in the Greek and Latin texts.</p>

<figure class=""><a href="/assets/images/posts/greek-myth-network2/degree_by_gender_language.png" class="image-popup"><img src="/assets/images/posts/greek-myth-network2/degree_by_gender_language.png" alt="" /></a></figure>

<figure class=""><a href="/assets/images/posts/greek-myth-network2/pagerank_by_gender_language.png" class="image-popup"><img src="/assets/images/posts/greek-myth-network2/pagerank_by_gender_language.png" alt="" /></a></figure>

<p>Here a more interesting pattern emerges. In the Greek texts, male characters dominate the lower ranks (more central characters) in both <span title="Who interacts with the most people?">degree</span> and <span title="Who gets attention from important people?">Pagerank centrality</span>. The male characters are therefore given a more prominent forward-facing role (<span title="Who interacts with the most people?">degree centrality</span>) and a more influential role (<span title="Who gets attention from important people?">Pagerank centrality</span>). However, in the Latin texts, the pattern is different. While male characters do still dominate the lower ranks (although slightly) in <span title="Who interacts with the most people?">degree centrality</span>, the distribution of <span title="Who gets attention from important people?">Pagerank centrality</span> ranks has shifted in favor of the female characters.</p>

<p>This shift could be attributed to the changing roles of women in Roman society compared to Greek society. In ancient Greece, women had no political rights (couldn’t vote or hold office) and were largely confined to the domestic sphere<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote" rel="footnote">17</a></sup> (although some exceptions existed such as in Sparta, where girls received physical training and more freedom). They were supposed to stay out of the public eye and were not allowed to participate in politics. However, in ancient Rome, women had more legal autonomy (although only slightly more than in ancient Greece). Though still couldn’t vote or hold office, they could own property, manage their own finances, and even run businesses<sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote" rel="footnote">18</a></sup>. They could even influence politics behind the scenes through their relationships with powerful men. Women participated in public life to a greater extent than in Greece, and some had important priestly roles, such as the Vestal Virgins who were responsible for maintaining the sacred fire of Vesta.</p>

<p>Influential Roman women like Livia (wife of Augustus) and Agrippina the Younger wielded real political influence through family ties. Livia was regarded as a trusted confidant and advisor to her husband, Emperor Augustus, and played a key role in shaping imperial succession. She was also rumored to have been involved in the deaths of several political rivals to ensure her son Tiberius’s ascension to the throne <sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote" rel="footnote">19</a></sup>. Agrippina the younger, mother of Emperor Nero, was another powerful figure who exerted significant influence over her son’s rule <sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote" rel="footnote">20</a></sup>. She was known for her political acumen and ambition, and she played a key role in securing Nero’s position as emperor. Ancient writers portray her as ambitious, fierce, obstinate, and strategically active in elevating her son Nero to the throne, maneuvering dynastic politics to her advantage.</p>

<p>There are many more examples, but what is important to note is that while Roman women couldn’t hold formal office, these elite figures effectively operated as backchannels: steering decisions, shaping alliances, and influencing imperial succession through their proximity to power. I think this might give us a clue as to why in the Latin texts women have a higher <span title="Who gets attention from important people?">Pagerank centrality</span> while having a lower <span title="Who interacts with the most people?">degree centrality</span> than men. And I believe it is a reflection of how the power of women was viewed in Roman society. The public perceived real power as being with men. They were the face of the republic and empire (public officials, generals, etc.) and were the ones who interacted publically in political life. This might translate to a high <span title="Who interacts with the most people?">degree centrality</span>, as characters who have the most interactions with other characters. Women, on the other hand, were often seen as having a more behind-the-scenes role, influencing decisions through their connections to influential men without being in the public eye. This might translate to a high <span title="Who gets attention from important people?">Pagerank centrality</span>, as characters who are connected to many important characters, but not necessarily a big number of connections overall.</p>

<p class="notice--info">Now, this is just a hypothesis, and I would love to hear your thoughts on this! It is important to remember that all these are just interpretations based on the data <strong>I have</strong> and how the <strong>relationships</strong> were defined. The data I have is not exhaustive, there are still much more texts that I do not have my hands on. There are also lots of texts that never survived to our present day. The data used here is only a small fraction and probably not enough to get anything close to a definitive answer to the questions we might have. The relationships were defined as co-occurrences which is obviously not going to be accurate as the characters can be mentioned together in the same sentence while not having any kind of real relationship and characters mentioned far apart can have a relationship. These limitations should be kept in mind when interpreting the results. And even if we did have perfect, complete data, there are still so many factors that could have influenced the roles of characters in Greek and Latin texts. The data is a reflection of the stories that were told and the roles that characters played in those stories. It is not a definitive representation of the characters themselves or their roles in society. With all that being said, it is till fun to explore the data and speculate on the possible links between the data and the real world, as long as we keep in mind the limitations of the data and the relationships defined, and that we do not take the results as definitive answers to the questions we might have.</p>

<p><br /></p>

<h1 id="conclusion">Conclusion</h1>

<p>All right, we made it! I hope you enjoyed this deep dive into the Greek mythological network! We explored how characters are connected, who the most central characters are, and how their roles changed across cultures. There’s a lot more data to explore in these datasets, and I encourage you to take a look at the repository and play around with the data yourself. You can find the datasets, notebooks, and all the code on <a href="https://github.com/eliemaalouly/greek-myth-network" target="_blank" rel="noopener noreferrer">GitHub</a>.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Every character in the component can reach every other character in that component through a series of edges, even if not directly connected. In this example, A has a direct connection to B and C, therefore C can reach B through A (C → A → B). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>The edges are undirected, meaning that relationships between characters do not have a direction (e.g., “A is related to B” is the same as “B is related to A”). This allows us to focus on the connections themselves rather than their directionality. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>Actually, they might have been mentioned in the dataset, but in that case, they would have been mentioned without the co-occurrence of any other character, keeping them isolated. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://ancientlinks.blogspot.com/2011/01/differences-between-hermes-and-mercury.html" target="_blank" rel="noopener noreferrer">“Differences between Hermes and Mercury.” Ancient Links (blog). January 25, 2011</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p><a href="https://www.worldhistory.org/Janus/" target="_blank" rel="noopener noreferrer">“Janus.” World History Encyclopedia. February 6, 2015</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6" role="doc-endnote">
      <p><a href="https://www.worldhistory.org/Bacchus/" target="_blank" rel="noopener noreferrer">“Bacchus.” World History Encyclopedia. August 8, 2023</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7" role="doc-endnote">
      <p><a href="https://en.wikipedia.org/wiki/Dionysian_Mysteries" target="_blank" rel="noopener noreferrer">“Dionysian Mysteries.” Wikipedia. June 12, 2025</a> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8" role="doc-endnote">
      <p><a href="https://en.wikipedia.org/wiki/Dionysia" target="_blank" rel="noopener noreferrer">“City Dionysia.” Wikipedia. June 11, 2025</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9" role="doc-endnote">
      <p><a href="https://en.wikipedia.org/wiki/Mos_maiorum#Gravitas_and_constantia" target="_blank" rel="noopener noreferrer">“Gravitas and constantia.” Wikipedia. February 26, 2025</a> <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10" role="doc-endnote">
      <p><a href="https://roman-empire.net/religion/bacchanalia" target="_blank" rel="noopener noreferrer">“Bacchanalia: Exploring the Ancient Roman Festivals of Excess.” The Roman Empire</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:11" role="doc-endnote">
      <p><a href="https://en.wikipedia.org/wiki/Senatus_consultum_de_Bacchanalibus" target="_blank" rel="noopener noreferrer">“Senatus consultum de Bacchanalibus.” Wikipedia. January 4, 2025</a> <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:12" role="doc-endnote">
      <p><a href="https://en.wikipedia.org/wiki/Liberalia" target="_blank" rel="noopener noreferrer">“Liberalia.” Wikipedia. May 20, 2024</a> <a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:13" role="doc-endnote">
      <p><a href="https://romanpagan.wordpress.com/bacchus/" target="_blank" rel="noopener noreferrer">“Bacchus.” Roman Pagan (blog)</a> <a href="#fnref:13" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:14" role="doc-endnote">
      <p>Dionysus was a patron god of Thrace. <a href="#fnref:14" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:15" role="doc-endnote">
      <p><a href="https://penelope.uchicago.edu/Thayer/E/Roman/Texts/Plutarch/Lives/Crassus*.html" target="_blank" rel="noopener noreferrer">“The Life of Crassus.” Plutarch. The Parallel Lives. Loeb Classical Library Edition, 1916</a> <a href="#fnref:15" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:16" role="doc-endnote">
      <p><a href="https://barrystrauss.com/the-unbelievable-mostly-untold-story-of-spartacuss-wife/" target="_blank" rel="noopener noreferrer">“The Unbelievable (Mostly) Untold Tale of Spartacus’s Wife.” Barry Strauss (blog), April 8, 2013</a> <a href="#fnref:16" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:17" role="doc-endnote">
      <p><a href="https://www.worldhistory.org/article/927/women-in-ancient-greece/" target="_blank" rel="noopener noreferrer">“Women in Ancient Greece.” World History Encyclopedia. July 27, 2016</a> <a href="#fnref:17" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:18" role="doc-endnote">
      <p><a href="https://www.worldhistory.org/article/659/the-role-of-women-in-the-roman-world/" target="_blank" rel="noopener noreferrer">“The Role of Women in the Roman World.” World History Encyclopedia. February 22, 2014</a> <a href="#fnref:18" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:19" role="doc-endnote">
      <p><a href="https://medium.com/@ancient.rome/the-most-powerful-woman-in-the-history-of-ancient-rome-29c69a741cb7" target="_blank" rel="noopener noreferrer">“The Most Powerful Woman in the History of Ancient Rome.” Medium. April 24, 2024</a> <a href="#fnref:19" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:20" role="doc-endnote">
      <p><a href="https://www.historyextra.com/period/roman/agrippina-younger-empress-ancient-rome-empress-nero-caligula/" target="_blank" rel="noopener noreferrer">“Agrippina the Younger: the first true empress of Ancient Rome.” Medium. March, 2019</a> <a href="#fnref:20" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Elie Maalouly</name></author><summary type="html"><![CDATA[In the first part of this series, we laid the groundwork by scraping datasets of Greek mythological texts and characters, extracting connections between them. Now, in Part II, we delve into the network analysis of these mythological figures to uncover who truly holds the center stage in this ancient drama. As mentioned in the first part, the code for this project is available on GitHub.]]></summary></entry><entry><title type="html">Mapping the Mythos: Part I</title><link href="https://quantifiedcuriosities.com/posts/greek-myth-network1/" rel="alternate" type="text/html" title="Mapping the Mythos: Part I" /><published>2025-06-17T00:00:00+02:00</published><updated>2025-06-17T00:00:00+02:00</updated><id>https://quantifiedcuriosities.com/posts/greek-myth-network1</id><content type="html" xml:base="https://quantifiedcuriosities.com/posts/greek-myth-network1/"><![CDATA[<p>In this two-part series, we will explore the fascinating world of Greek mythology through the lens of network analysis. By examining the relationships between characters, we can uncover hidden patterns and insights about the mythological universe and the people who wrote them. In Part I, we will focus on the data gathering and information extraction process, while Part II will delve into the analysis of the central figures and their roles within the narrative structure. The code for this project is available on <a href="https://github.com/eliemaalouly/greek-myth-network" target="_blank" rel="noopener noreferrer">GitHub</a>.</p>

<p><br /></p>

<h1 id="introduction">Introduction</h1>

<p>Ancient Greek mythology is more than just a collection of stories. It’s a vast, interconnected web of gods, heroes, monsters, and tragedies. But what if we treated this mythology not just as literature, but as a system, a network we could map, analyze, and explore?</p>

<p>I started with a set of big, open-ended questions, the kind you’d ask if you were trying to understand mythology not just as literature, but as a complex network of relationships:</p>
<ol>
  <li>Who are the most central and influential characters in Greek mythology?</li>
  <li>What communities or clusters emerge and what do they correspond to?</li>
  <li>How does a character’s centrality relate to their gender?</li>
  <li>How does this centrality shift when we transition from Greek to Roman mythology?</li>
</ol>

<p>These aren’t just trivia questions, they touch on the structural logic of myth. Greek mythology isn’t random (for the most part); it’s a universe with its own internal order. But most of that order is hidden in prose, spread across dozens of authors and centuries of storytelling. As a project, I wanted to extract this hidden structure. By representing the mythological universe as a graph with characters as nodes and relationships as edges, we can use tools from network analysis to surface new insights and hopefully answer these questions.</p>

<p><br /></p>

<h1 id="methodology">Methodology</h1>

<p>To explore these questions, I built a dataset of characters and relationships directly from ancient mythological texts. I began by scraping English translations of primary sources from <a href="https://www.theoi.com" target="_blank" rel="noopener noreferrer">Theoi.com</a>, parsing and structuring the text into smaller narrative units. I then compiled a comprehensive list of mythological characters using <a href="https://www.wikidata.org/" target="_blank" rel="noopener noreferrer">Wikidata</a>, filtering for relevance and type. From there, I used natural language processing to identify character mentions and infer relationships based on co-occurrences. These relationships were stored as edges in a graph, enabling network analysis to uncover patterns of centrality, clustering, and thematic structure across the mythological corpus.</p>

<p>Now in order to answer our questions we need to define our methodology properly. Here it would be useful to take a break from the technical details to talk about centrality in general, and how we will use it in this project. What does it mean to be central in a network? What are the different ways we can measure it? And how do these measures relate to the mythology we’re studying? How will we calculate these measures?</p>

<p class="notice--danger"><strong>The dreaded terminology alert</strong>: Several new terms will be introduced, which may be overwhelming and frankly a bit boring! I’ll try to make it as painless as possible by explaining all these new terms in the simplest way possible. Feel free to skip ahead if you are already familiar with them.</p>

<h2 id="centrality-in-networks">Centrality in Networks</h2>

<p>The two most basic terms that will be used throughout this project are <strong>nodes</strong> and <strong>edges</strong> and they are the simplest to understand. Simply put, nodes are the characters in our mythological network, and edges are the relationships between them. In network analysis, we represent these characters and their relationships as a graph, where nodes are points and edges are lines connecting them. This graph structure allows us to analyze the relationships between characters in a systematic way. There isn’t much more to say on these two terms, so let’s move on to the more interesting part: centrality.</p>

<p>Centrality is a key concept in network analysis, measuring the importance or influence of a node (character) within a network. In the context of Greek mythology, centrality can help us identify which characters are most pivotal to the narrative structure. There are several different ways to measure centrality based on how we decide to define it. So here is a brief simplified overview of the main centrality measures we will use in this project and how they relate to our mythological network:</p>
<ol>
  <li><strong>Degree Centrality</strong>: The simplest measure, counting the number of direct connections a node has. It tells us who has the most relationships (“You’re important because a lot of people are directly connected to you”).<br />
    <blockquote>
      <p>In Greek mythology, <strong>Who interacts with the most people?</strong><br />
Some characters are everywhere. You see them in many myths interacting with lots of other figures. For example, we can imagine that Zeus has a high degree centrality because he’s directly connected (<a href="/assets/images/posts/greek-myth-network1/zeus.jpg" class="image-popup">unfortunately not in a nice way!</a>) to countless others regardless of whether they are major or minor characters. It’s the number of relationships that counts here (and Zeus is no slouch in that department!).</p>
    </blockquote>
  </li>
  <li><strong>Betweenness Centrality</strong>: This measures how often a node lies on the shortest path between other nodes. It tells us who connects separate groups and helps information travel through the network (“You’re important because others rely on you to stay connected”).<br />
    <blockquote>
      <p>In Greek mythology, <strong>Who connects different groups?</strong><br />
Some characters move between different parts of the mythological universe, such as the world of gods, mortals, and the underworld, or between Greek tragedies and Epics. They’re not just famous, they’re the link between different parts of the mythology. For example, Hermes might have high betweenness centrality because he connects gods to mortals, acting as a messenger and guide to the underworld. Hermes is therefore central, because without him, a lot of characters and places wouldn’t be connected.</p>
    </blockquote>
  </li>
  <li><strong>Eigenvector Centrality</strong>: This measure considers the quality and influence of a node’s connections. It tells us who is linked to other important people, not just how many people they know (“You’re important because your friends are important”).<br />
    <blockquote>
      <p>In Greek mythology, <strong>Who has powerful friends?</strong><br />
Not all connections are equal. Some characters are influential because they’re close to other influential characters. For example, Athena may not have the most connections, but her relationships are high-quality. She advises Odysseus, Perseus, and Heracles, and she’s a trusted daughter of Zeus. Athena is therefore central because her connections are big players in the myths.</p>
    </blockquote>
  </li>
  <li><strong>PageRank</strong>: It measures the influence of a node based on the number and quality of links it receives, similar to eigenvector centrality but with a probabilistic interpretation. It tells us how often you’re mentioned or pointed to by respected others, even if not directly connected to many (“You’re important because important people choose to talk about or link to you”).<br />
    <blockquote>
      <p>In Greek mythology, <strong>Who gets attention from important people?</strong><br />
Some characters aren’t in every story, but when they show up, it’s because someone important is talking about them or dealing with them. For example, Hades might not have the most connections, but he’s mentioned by many important characters like Zeus and Persephone. He’s central because he’s mentioned or visited by characters who matter a lot, even if he doesn’t leave the Underworld often.</p>
    </blockquote>
  </li>
</ol>

<p class="notice--info"><strong>Note</strong>: If you are new to these concepts it might get confusing and you might start mixing them up. Therefore, as we will come across these concepts throughout the project, if you don’t remember what each one means, just hover over the terms in the text and a tooltip will appear with a brief reminder. For example, if you hover over <span title="Who interacts with the most people?"><strong>Degree centrality</strong></span>, you will see a tooltip explaining that it represents who interacts with the most people.</p>

<h2 id="defining-relationships">Defining Relationships</h2>

<p>To analyze the relationships between characters, we need to define what constitutes a relationship for the purposes of this project. Relationships can be complex and multifaceted, but for our purposes, we will focus only on the co-occurrences of characters within the same narrative context. This means that if two characters appear within a specified number of sentences in the same myth or story, they will be considered connected. It is the most common way to define relationships in network analysis, especially when dealing with large text corpora where explicit relationships may not be clearly stated.</p>

<p>Now there is a bias in this approach, as we are going to use different genres of texts each with its own narrative style and structure. For example, tragedies and epic poetry like the “Iliad” or “Odyssey” may have a large number of connections for the lead characters compared to a small Homeric Hymn. This will cause the characters in long texts to dominate the connections in the network and would give these characters an unfair advantage. One way to mitigate this bias is by applying a weight to the extracted relationships in the texts.</p>

<p>Generally, in network analysis, weighting relationships allows us to differentiate between strong and weak connections. For example, a connection between family members may be considered stronger than a connection between acquaintances. However, this is not generally necessary and it depends completely on the type of analysis you want to perform. In our case, it will be a useful tool to balance the relationships extracted from different texts, especially when dealing with texts of varying lengths and narrative styles. I will consider each “myth” as a full narrative unit regardless of its length and the relationships between the characters will be normalized by the number of total relationships in that myth. This way, we can ensure that characters from shorter texts are not unfairly overshadowed by those from longer texts. For example, in the Illiad, a connection between Achilles and Hector will be weighted by the number of total connections between all characters in the text (which might be in the thousands), so that it does not dominate the network simply because the text is longer. Additionally, I applied a log transformation to keep the network balanced and to avoid the influence of outliers while preserving the order of the centrality ranks. The log “compresses” those outliers, so no single pair overwhelms the rest of the network. This makes it easier to see meaningful relationships across all texts, not just the loudest ones.</p>

<p>For those who want their math fix, the formula for the final weighted relationship between two characters A and B:
<a name="eq-weights"></a></p>

\[\text{W}_{ij} = \log\left(1 + \sum_t \frac{c_{ij}^{(t)}}{C^{(t)}} \right)\]

<p>where:</p>
<ul>
  <li>\(W_{ij}\) is the final weight of the relationship between characters \(i\) and \(j\) in all texts,</li>
  <li>\(c_{ij}^{(t)}\) is the number of co-occurrences between characters \(i\) and \(j\) in text \(t\),</li>
  <li>\(C^{(t)}\) is the total number of co-occurrences of all character pairs in myth \(t\).</li>
</ul>

<p><br /></p>

<h1 id="data-collection">Data Collection</h1>

<p>While the myths are culturally ubiquitous, the data isn’t. Most of these stories live in unstructured, messy texts, scattered across classical literature websites. Luckily we can find a good collection of English translations of primary sources on the <a href="https://www.theoi.com/Library.html" target="_blank" rel="noopener noreferrer">Theoi.com library</a>. This site is a treasure trove of mythological texts, including epic poetry, tragedies, and other narratives. However, the data is not structured in a way that makes it easy to extract. Therefore, it was necessary to scrape the text from the site and parse it into smaller narrative units (which was not that pleasant).</p>

<h2 id="text-corpus">Text Corpus</h2>

<p>I used <a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/" target="_blank" rel="noopener noreferrer">Beautiful Soup</a> to first scrape a list of available texts in the Theoi library along with their metadata (URLs, author names, time period, original language). Then I built a crawler to scrape the text of each item from the list while excluding nonprimary sources (I’m looking at you <a href="https://en.wikipedia.org/wiki/John_Tzetzes" target="_blank" rel="noopener noreferrer">Tzetzes</a>!). Due to the structure of the Theoi library, the texts were not organized in a straightforward manner, so I had to have the crawler navigate through multiple levels of links to reach the full parts of every text and store each text in a txt file. The crawler was designed to handle pagination and nested links, ensuring that all relevant texts were collected. Additionally, when a table of contents was present, the crawler would follow the links to each section and store the title of each section in a JSON file, allowing for a more organized dataset.</p>

<p>The text files were then imported and split based on the possible unique structures of the texts (some have sections with Roman numerals, some with Arabic numerals, some with titles and some without, some have a title only in the table of contents, etc.). I built a clean and structured dataset of mythological texts, where each entry corresponds to the smallest possible division of a text (e.g., one myth, or one book section). I also classified each text based on whether it’s an Epic, dialogue (tragedy or comedy), or other. The dataset is comprised of 1441 entries. Here is an example of how the dataset looks like:</p>

<table>
  <thead>
    <tr>
      <th>index</th>
      <th>author</th>
      <th>title</th>
      <th>url</th>
      <th>language</th>
      <th>period</th>
      <th>years</th>
      <th>text</th>
      <th>chapter</th>
      <th>genre</th>
      <th>section</th>
      <th>myth</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>54</td>
      <td>Homer</td>
      <td>Illiad</td>
      <td>https://www.theoi.com/Text/HomerIliad1.html</td>
      <td>Greek</td>
      <td>Archaic</td>
      <td>800 B.C. - 500 B.C.</td>
      <td>And the son of Atreus, Menelaus, dear to Ares,…</td>
      <td>17</td>
      <td>Epic</td>
      <td>None</td>
      <td>None</td>
    </tr>
    <tr>
      <td>195</td>
      <td>Apollodorus</td>
      <td>The Library</td>
      <td>https://www.theoi.com/Text/Apollodorus1.html</td>
      <td>Greek</td>
      <td>Hellenistic</td>
      <td>300 B.C. - 100 B.C.</td>
      <td>Reigning over Calydon, Oeneus was the first who received…</td>
      <td>1</td>
      <td>Other</td>
      <td>8</td>
      <td>Oeneus, Meleager, Tydeus</td>
    </tr>
  </tbody>
</table>

<h2 id="character-dataset">Character dataset</h2>

<p>To identify co-occurrences in the texts, we need to be able to correctly recognize the characters. The problem with using Named Entity Recognition (NER) is that it is not always accurate, especially with mythological characters. Many characters have multiple names or epithets, and some characters are not recognized by NER models at all as these tools are not really trained on such texts. Therefore, I decided to use a curated list of mythological characters from <a href="https://www.wikidata.org/" target="_blank" rel="noopener noreferrer">Wikidata</a> as a reference.</p>

<p>I queried the database to extract all the instances of the <a href="https://www.wikidata.org/wiki/Q22988604" target="_blank" rel="noopener noreferrer">mythological Greek character</a> class and to recursively extract all the instances of its subclasses along with some of their metadata that might be useful (name, aliases, Roman name, Roman aliases, domain, gender, siteLinkCount, etc.). This resulted in a huge dataset of over 6000 Greek mythological characters. Each of these characters also has a list of aliases. The problem with this is there will be multiple characters with the same name or alias which will cause problems when trying to identify co-occurrences in the texts. So I had to deduplicate the dataset by only keeping one character per name or alias. I did this by keeping the character that is more “important” by giving each character an importance score. This score is calculated based on the presence of a <a href="https://www.theoi.com" target="_blank" rel="noopener noreferrer">Theoi.com</a> page for the character (characters with a page are more important), and the siteLinkCount<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> (characters with a higher siteLinkCount are more important). After removing all the duplicate names, I was left with a final dataset containing 1076 unique characters, each with their metadata and aliases. Here is an example of how the dataset looks like:</p>

<table>
  <thead>
    <tr>
      <th>id</th>
      <th>name</th>
      <th>description</th>
      <th>type</th>
      <th>domain</th>
      <th>gender</th>
      <th>aliases</th>
      <th>residence</th>
      <th>theoi_url</th>
      <th>level1</th>
      <th>level2</th>
      <th>level3</th>
      <th>level4</th>
      <th>level5</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Q174278</td>
      <td>Medea</td>
      <td>daughter of King Aeëtes of Colchis</td>
      <td>Mythological Greek Character</td>
      <td> </td>
      <td>female</td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td>Mythological Greek Character</td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
    </tr>
    <tr>
      <td>Q37340</td>
      <td>Apollo</td>
      <td>god in Greek and later Roman mythology</td>
      <td>Olympian god</td>
      <td> </td>
      <td>male</td>
      <td>Apollōn, Aploun,  Apellōn,  Apeilōn,  Phoebus, Apollon, Mogounos, Mogons</td>
      <td>Olympus</td>
      <td>https://www.theoi.com/Olympios/Apollon.html</td>
      <td>Mythological Greek Character</td>
      <td>Greek deity</td>
      <td>Olympian god</td>
      <td> </td>
      <td> </td>
    </tr>
    <tr>
      <td>Q420269</td>
      <td>Arges</td>
      <td>cyclops in Greek mythology</td>
      <td>cyclops</td>
      <td> </td>
      <td>male</td>
      <td>Pyraemon, Acmonides</td>
      <td> </td>
      <td> </td>
      <td>Mythological Greek Character</td>
      <td>cyclops</td>
      <td> </td>
      <td> </td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p><br /></p>

<h1 id="nodes-and-edges-extraction">Nodes and edges extraction</h1>

<p>Now that we have the text corpus and the character dataset, we can extract the relationships between characters. I used <a href="https://spacy.io/" target="_blank" rel="noopener noreferrer">spaCy</a>, a powerful NLP library, to process the text and identify character mentions. I created a custom pipeline that uses the character dataset to recognize mentions of characters in the text and would resolve any alias to the canonical name of the character. This way, we can ensure that all mentions of a character are correctly identified, regardless of the name or alias used in the text. I set a 2 sentence window to identify co-occurrences, meaning that if two characters appear within 2 sentences of each other<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, they will be considered connected.</p>

<p>Additionally, for dialogue-type texts, I used additional criteria to identify relationships. If a character is mentioned within the speech of another character, they will be considered connected. This is particularly useful for tragedies and comedies where characters often speak about or address each other directly. I also considered each consecutive speaker to be connected to the previous speaker, as they are part of the same dialogue exchange. This allows us to capture the dynamics of conversations and interactions between characters in a more nuanced way.</p>

<p>After going through the entire text corpus with <a href="https://spacy.io/" target="_blank" rel="noopener noreferrer">spaCy</a>, I was able to extract a list of character pairs with their co-occurrences. For a network analysis, we need an edges and nodes datasets. For the edges dataset, I normalized the relationships as described in the <a href="#eq-weights">weighting formula</a> based on the total number of co-occurrences in each myth (or the smallest available division of text). This resulted in a final dataset of character relationships, where each entry represents a unique connection between two characters along with its weight. The resulting dataset contains 48,414 unique relationships between characters. For the nodes dataset, I simply used the characters in the character dataset that appear in the edges dataset, along with their metadata. The final nodes dataset contains 943 unique characters.</p>

<hr />

<p><br />
This concludes the data-gathering and information-extraction process. In the next part, we will analyze the network structure and try to find answers to our initial questions. We will explore the centrality measures we defined earlier and see what patterns and relationships they reveal within the mythological universe.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>The number of links to the character from other Wikidata pages. A higher number of links indicates that the character is more important or more frequently referenced in other contexts. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>The choice of a 2 sentence window is somewhat arbitrary. A 3-5 sentence window is usually more common but it is mainly used when analyzing a network with a much smaller number of texts. In this project, I considered a 2 sentence window as it would be a bit more conservative than the more common 3-5 sentence window, but more flexible than a single-sentence window. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Elie Maalouly</name></author><summary type="html"><![CDATA[In this two-part series, we will explore the fascinating world of Greek mythology through the lens of network analysis. By examining the relationships between characters, we can uncover hidden patterns and insights about the mythological universe and the people who wrote them. In Part I, we will focus on the data gathering and information extraction process, while Part II will delve into the analysis of the central figures and their roles within the narrative structure. The code for this project is available on GitHub.]]></summary></entry></feed>