Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Extend the use of supplemental variables in GDA by applying machine learning to the free text descriptive response portion and combining it with MCA analysis

419kfj
October 08, 2023

Extend the use of supplemental variables in GDA by applying machine learning to the free text descriptive response portion and combining it with MCA analysis

The practice of linking the distribution of individuals within the space revealed by MCA with qualitative surveys has been mentioned in the book [1] and practiced in research activity [2]. In Japan, KH Coder [3] as a text analysis tool has been remarkably popularized and used in many social surveys.
It is possible to link this text analysis with the selected answers using functions within KH Coder. Our first attempt as a mixed research method is to use this functionality.
The next step is to add the frequently occurring words (important words) obtained at this stage to the individual coordinates as supplementary variables in the MCA and to analyze them by a GDA method [4].
In this report, as the next step, we report an example [5] in which frequently occurring words (important words) were tagged as positive/negative by the machine learning process and analyzed as supplementary variables.
This approach extends the use of supplementary variables in GDA.

References
• [1] Le Roux, Brigitte, & Henry Rouanet. 2010. "Multiple correspondence analysis.", Quantitative applications in the social sciences 163. Thousand Oaks, Calif: Sage Publications. "Between quantity and quality, there is geometry."p1
• [2] Tony Bennett, Mike Savage, Elizabeth Silva, Alan Warde, Modesto Gayo-Cal and David Wright al, "Culture, Class, Distinction",2009,2010, Routledge
• [3] https://khcoder.net/en/
• [4] with [1] and using the GDAtools package of R. Robette N. (2023), GDAtools : Geometric Data Analysis in R, version 2.0, https://nicolas- robette.github.io/GDAtools/
• [5] Kazuo Fujimoto and Kazuya Ohata, “Development of a method for analyzing participant satisfaction survey data that combines MCA and Aspect Based Sentiment Analysis.”(in Japanese), NLP2023
https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q1-11.pdf
• (in English) https://419kfj.sakura.ne.jp/db/wp- content/uploads/2023/09/nlp2023−article_01−13v1.1_eng.pdf

419kfj

October 08, 2023
Tweet

More Decks by 419kfj

Other Decks in Research

Transcript

  1. Extend the use of supplemental variables in GDA
    by applying machine learning to the free text
    descriptive response portion and combining it with
    MCA analysis ver1.0
    CARME2023 09/28 Room2
    11:00-12:30
    kazuo fujimoto [email protected]
    Project Researcher
    Institute for Mathematics and Computer Science
    Tsuda University

    View Slide

  2. Very short seld introduction:After CARME…
    After CARME2015,
    This transrated Book
    publisherd.
    After CARME2019!

    View Slide

  3. So Aftre CARME2023…
    • Not decided …
    2023/9/28 CARME2023@University of Bonn 3

    View Slide

  4. Abstract
    The practice of linking the distribution of individuals within the space
    revealed by MCA with qualitative surveys has been mentioned in the book [1]
    and practiced in research activity [2]. In Japan, KH Coder [3] as a text
    analysis tool has been remarkably popularized and used in many social
    surveys.
    It is possible to link this text analysis with the selected answers using
    functions within KH Coder. Our first attempt as a mixed research method is
    to use this functionality.
    The next step is to add the frequently occurring words (important words)
    obtained at this stage to the individual coordinates as supplementary variables
    in the MCA and to analyze them by a GDA method [4].
    In this report, as the next step, we report an example [5] in which frequently
    occurring words (important words) were tagged as positive/negative by the
    machine learning process and analyzed as supplementary variables.
    This approach extends the use of supplementary variables in GDA.
    2023/9/28 CARME2023@University of Bonn 4

    View Slide

  5. References
    • [1] Le Roux, Brigitte, & Henry Rouanet. 2010. "Multiple correspondence
    analysis.", Quantitative applications in the social sciences 163. Thousand Oaks,
    Calif: Sage Publications. "Between quantity and quality, there is geometry."p1
    • [2] Tony Bennett, Mike Savage, Elizabeth Silva, Alan Warde, Modesto Gayo-Cal
    and David Wright al, "Culture, Class, Distinction",2009,2010, Routledge
    • [3] https://khcoder.net/en/
    • [4] with [1] and using the GDAtools package of R. Robette N. (2023), GDAtools :
    Geometric Data Analysis in R, version 2.0, https://nicolas-
    robette.github.io/GDAtools/
    • [5] Kazuo Fujimoto and Kazuya Ohata, “Development of a method for analyzing
    participant satisfaction survey data that combines MCA and Aspect Based
    Sentiment Analysis.”(in Japanese), NLP2023
    • https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q1-11.pdf
    • (in English) https://419kfj.sakura.ne.jp/db/wp-
    content/uploads/2023/09/nlp2023−article_01−13v1.1_eng.pdf English
    version
    2023/9/28 CARME2023@University of Bonn 5

    View Slide

  6. Software related references
    • Higuchi, Koichi 2017 “A Two-Step Approach to Quantitative Content Analysis: KH Coder Tutorial
    using Anne of Green Gables (Part II)” Ritsumeikan social sciences review 53(1): 137-147. [PDF
    File] https://khcoder.net/en/
    • Robette N. (2023), GDAtools : Geometric Data Analysis in R, version 2.0, https://nicolas-
    robette.github.io/GDAtools/
    • RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA
    URL http://www.rstudio.com/.
    • R Core Team (2023). _R: A Language and Environment for Statistical Computing_. R Foundation
    for Statistical Computing, Vienna, Austria. .
    • Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A,
    Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D,
    Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the
    tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686.
    2023/9/28 CARME2023@University of Bonn 6

    View Slide

  7. Notice and Apology
    • In the following report, due to an application problem of the reporter,
    permission to reuse the raw data was not granted, so graphs and other
    information are based on the report for the The Association for Natural
    Language Processing in 2023/03, and no new analysis was conducted.
    • Referenced reports
    • Kazuo Fujimoto and Kazuya Ohata, “Development of a method for analyzing
    participant satisfaction survey data that combines MCA and Aspect Based
    Sentiment Analysis.”(in Japanese), NLP2023
    (https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/Q1-11.pdf)
    (English version)
    2023/9/28 CARME2023@University of Bonn 7

    View Slide

  8. Outline of my presentaion
    2023/9/28 CARME2023@University of Bonn 8

    View Slide

  9. Outline of my presentaion
    • Characteristics of the data (congratulatory response)
    • Challenge:
    • How can we extract improvement measures and issues when most of the responses are "good"?
    • Step 0 Exploratory Data Analysis (EDA) and MCA, and Basic Text mining, separately.
    • Step 1: Focus on free text responses. Linking text mining and MCA
    • Step 2: Focus on ambiguity of most frequently used key words and phrases. Adding Tags
    (positive/negative/ none) by machine learning (ABSA: Aspect Based Semantic Analysis).
    • Step 3 Project the tagged words onto the MCA indivisual map.
    • Issue. It was found that the individuals who selected the important tagged words can be plotted on
    the whole individual map, but the amount of tagging depends on the dictionary of machine learning.
    • Also, the MCA map is very biased to begin with, so we would like to deepen the analysis by
    utilizing CSA.
    2023/9/28 CARME2023@University of Bonn 9

    View Slide

  10. Schematic overview of this report
    • Projecting tagged extracted words as supplemental variables into MCA's
    result space.
    • Our trial is an attempt to create supplemental variables by text mining and
    machine learning tagging and plotting them in individual space, and
    developing another mixed research methods.
    * Le Roux, Brigitte, & Henry Rouanet. 2010. "Multiple correspondence
    analysis.", chapter 1
    Famous
    phrases. *
    MCA and mixed
    research methods
    2023/9/28 CARME2023@University of Bonn 10

    View Slide

  11. Data Structure
    2023/9/28 CARME2023@University of Bonn 11
    ID Var1 Var2 …. Varn



    m-3
    m-2
    m-1

    Open Ended Free Text Answer parts
    :
    :
    :
    :


    View Slide

  12. Step 0 MCA and Text Minig Separately
    2023/9/28 CARME2023@University of Bonn 12
    ID Var1 Var2 …. Vark



    N-3
    N-2
    N-1
    N
    Free Text parts
    :
    :
    :
    :


    Specific MCA
    Text Mining by
    KH Coder.
    One Variable and its
    categories can be
    ploted in co-
    occurrence Network
    and CA Plot with
    words.
    Examning the mutual relations
    by KWIC concordance
    Separately

    View Slide

  13. Step 1 MCA and Frequent word as
    supplymentary variables
    2023/9/28 CARME2023@University of Bonn 13
    ID Var1 Var2 …. Vark



    N-3
    N-2
    N-1
    N
    Free Text parts
    :
    :
    Word1 Word2 Word3 … Wordk
    1
    0
    1
    0
    1
    1
    0
    Specific MCA and SDA Interpret the Words using KWIC
    of Step 0

    View Slide

  14. by using KWIC of Step0
    • We found the Ambiguous Meaning within frequented Words.
    • So we made next another approach as as follows:
    • put the p and n tag to each words. p means “positive” and n means “negative”
    • We make this process by using Aspect Based Semantic Analysis (ABSA).
    • After tagging to the Words and make data frame as Supplymentaly variable.
    • Overlayed them on individual space which is generated by MCA.
    2023/9/28 CARME2023@University of Bonn 14

    View Slide

  15. Step 2 MCA and Tagged Frequent word as
    supplymentary variables
    2023/9/28 CARME2023@University of Bonn 15
    ID Var1 Var2 …. Vark



    N-3
    N-2
    N-1
    N
    Free Text parts
    :
    :
    Word1/p Word1/n Word2/p … Word/n
    1
    0
    1
    0
    1
    1
    0
    Specific MCA and SDA Interpret the Words using KWIC
    of Step 0

    View Slide

  16. Step 0 and Step 1
    2023/9/28 CARME2023@University of Bonn 16

    View Slide

  17. Characteristics of the data and Challenge
    • Characteristics of the data (congratulatory response)
    • Response selection for 5 case method
    • Mostly 5 or 4 responses. Average is ….
    • The seminar was about information security workshop, and participants were
    highly motivated.
    • Challenge: How can we extract improvement measures and issues
    when most of the responses are "good"?
    • Based on these results, if it is sufficient to summarize that the event was a
    success, then there is nothing to say.
    • However, it is necessary to identify issues that need to be addressed in order to
    make the event even better.
    2023/9/28 CARME2023@University of Bonn 17

    View Slide

  18. Step0 Exploratory Data Analysis (EDA) and
    MCA
    • Number of respondents 2001
    • Confirmation of the relationship between satisfaction and responses.
    • Analysis of the distribution of data by MCA confirms the trend of
    unsatisfactory respondents.
    • Responses that could lead to improvement (free text responses) are not found
    in the unsatisfactory response group.
    • An analysis of the free-response statements of the satisfied respondent group is
    needed.
    2023/9/28 CARME2023@University of Bonn 18

    View Slide

  19. Paris displsy of
    Skill improved and
    Understanding
    2023/9/28 CARME2023@University of Bonn
    • A large portion of
    “understanding” is accounted
    for by "skills: improved ".
    • ! Don’t understanding and
    skill improvement are not
    related.
    • Congratulatory Responses
    • That wasn't so bad, was it?
    (Polite Responses)
    • Involvement Self-identification
    Confirmation Responses
    • As long as you participated,
    there should be results.
    • There are issues to be clarified
    here.
    skills: improved
    skills: improved
    understanding
    understanding
    very improved、improved、
    no change、Don’t know、NA understand well、understand、
    Don’t understand some, Don’t understand many
    NA 19

    View Slide

  20. hese three questions are biased toward posive.
    Instructor's
    explanation
    and others
    focusing on
    understanding
    Seen in this way,
    responses about
    “instructor explanation”,
    “support”, and
    “response” are considered
    to be uninformative with
    respect to
    “understanding”
    2023/9/28 CARME2023@University of Bonn
    Understanding instructor explanation support responses
    Understanding
    instructor
    explanation
    support responses
    ← Positive /Negative →
    20

    View Slide

  21. Step 1: Focus on free text response.
    Linking text mining and MCA
    Respondents with extremely low
    satisfaction did not respond to the
    open-ended (free text )responses
    either.
    Therefore, they are not eligible to
    explore areas for improvement in
    the workshops.
    2023/9/28 CARME2023@University of Bonn 21

    View Slide

  22. Space generation by MCA
    (speMCA with only NA excl.)
    2023/9/28 CARME2023@University of Bonn
    Completely disagree.
    Clustering of response patterns
    22

    View Slide

  23. Number of responses and response rate to open-
    ended free text questions (Q15-2, Q20, Q22)
    • Answer all three questions: 223
    (14.8%)
    • Reasons for "understand" responses
    (Q15-2):
    • 742+348+87+223=1400
    • 70.0%
    • Course environment (Q20):
    • 73+348+223+9=653
    • 32.6%
    • Other overall impressions (Q22):
    • 24+87+223+9=343
    • 17.1%
    2023/9/28 CARME2023@University of Bonn
    Reasons for
    "understand"
    overall impressions
    Course environment
    23

    View Slide

  24. Step 2
    2023/9/28 CARME2023@University of Bonn 24

    View Slide

  25. Step 2: Focus on ambiguity of frequently
    used key words and phrases.
    • Tag (positive/negative/none) these by machine learning (ABSA).
    • Words with both p/n occurrences
    • 'time, exercise, content, knowledge, terminology, explanation, training, lecture
    • Negative Word Top 5 ('time', 84), ('exercise', 72), ('content', 52), ('knowledge',
    44), ('term', 30)
    • Positive Word Top 5 ('exercise', 120), ('content', 94), ('explanation', 78),
    ('training', 49), ('lecture', 37)
    • The table on the next page shows the "extracted words" list without
    the p/n tag. Frequent words detected by the aspect-based sentiment
    analysis are marked in this.
    2023/9/28 CARME2023@University of Bonn 25

    View Slide

  26. Words with a high number of occurrences with
    ambiguous usage
    • Time
    • Exercise
    • Contents
    • Knowledge
    • explanation
    抽出語 出現回数 抽出語 出現回数
    1理解 583 21流れ 148
    2時間 516 22発⽣ 143
    3思う 488 23ありがとう 141
    4インシデント 387 24研修 135
    5演習 363 25⾮常 134
    6内容 351 26勉強 134
    7対応 316 27業務 130
    8知識 307 28⾏う 127
    9感じる 254 29解析 124
    10ログ 208 30情報 124
    11事前 183 31具体 122
    12実際 182 32⽤語 119
    13説明 175 33難しい 107
    14学習 171 34グループ 98
    15部分 168 35参加 98
    16多い 160 36分かる 98
    17もう少し 154 37⾃分 95
    18受講 153 38良い 94
    19セキュリティ 151 39講義 93
    20報告 149 40必要 93
    2023/9/28 CARME2023@University of Bonn
    • Training
    • Specific
    terms
    • lecture
    Term Frequency
    Term Frequency
    26

    View Slide

  27. 2023/9/28 CARME2023@University of Bonn
    Response patterns for each question
    Sill improved Understanding Explanation of lecturer
    Adequate Speed ? Supports Responces to Questions
    27

    View Slide

  28. Step 3 Project the tagged words onto the
    MCA entity map.
    The ”explanation" and "content" are characterized
    by negative expressions (successfully separated).
    2023/9/28 CARME2023@University of Bonn 28

    View Slide

  29. Interim Summary and Future
    Issues
    2023/9/28 CARME2023@University of Bonn 29

    View Slide

  30. Interim Summary and Future Issues
    • As indicated above, the results suggest that the input of free description
    responses from text mining as a supplemental variable in MCA allows for
    analysis in combination with the analysis of the free description portion and
    categorical variables.
    • It is also suggested that text mining can be used not only to extract words,
    but also to tag them using machine learning to enable more detailed analysis.
    • The key issue to be addressed is whether it is possible to encourage
    workshop participants to respond to free-text questions.
    • Since the distribution of congratulatory responses is highly skewed, we
    would like to deepen the analysis by using CSA and other methods.
    2023/9/28 CARME2023@University of Bonn 30

    View Slide

  31. Summary by charts
    2023/9/28 CARME2023@University of Bonn 31
    ID Var1 Var2 …. Varn



    Open Ended Free Text Answer parts
    MCA KH Coder /Text mining
    KWIC concordance
    [Frequency
    List] of words
    SDA w/supplymentary
    Variables
    Πϯγσϯτ
    Α͘ཧղͰ͖ͨ
    ಺༰
    ஌ࣝ
    ۩ମ
    ཧղ
    ԋश
    ରԠ
    ϩά
    આ໌
    डߨ
    ݚम
    ඇৗ
    ࣌ؒ
    ͋Γ͕ͱ͏
    ࢥ͏
    ײ͡Δ
    ࣮ࡍ
    ཧղͰ͖ͨ
    ࣄલ
    ෦෼
    ηΩϡϦςΟ
    ྲྀΕ ൃੜ
    ΋͏গ͠
    ཧղͰ͖ͳ͍಺༰͕͋ͬͨ
    ༻ޠ
    ઐ໳
    ղੳ
    ෆ଍
    ଟ͍
    ೉͍͠
    ཧղͰ͖ͳ͍಺༰͕ଟ͔ͬͨ
    ػձ
    ษڧ
    Degree:
    ø
    ù
    ú
    û
    Frequency:
    ø÷÷
    ù÷÷
    ú÷÷
    û÷÷
    ü÷÷
    ಛʹ
    ෆ଍
    આ໌ķ
    ࣌ؒĵ
    ॳΊͯ
    ۩ମ
    ෦෼
    ઐ໳
    ಺༰ķ
    ଟ͍
    ϋϯζΦϯ
    ୹͍
    ֬ೝ
    গͳ͍
    ಺༰ĵ
    ϩά
    ֶश
    ࣮ફ
    ೉͍͠
    ༻ޠ
    ԋशĵ
    ଍ΓΔ
    ෼͔Δ
    ԋशķ
    ܦݧ
    άϧʔϓ
    શମ
    ஌ࣝ
    ՝୊
    ղੳ
    ྲྀΕ
    ࣌ؒ
    ஌Δ
    ࡞ۀ
    ݚमķ
    ରԠ
    ࢿྉ
    આ໌
    ಺༰
    ࣄલ
    ํ๏
    ମݧ
    ݚम
    ߨࢣ ࢀߟ
    ηΩϡϦςΟ
    ษڧ
    ൃੜ
    ࡞੒
    ֶͿ
    ཧղ
    ৘ใ
    ࣮ࡍ
    Πϯγσϯτ
    डߨ
    ΋͏গ͠
    ඞཁ
    ԋश
    ࣗ෼
    ඇৗ
    ôù
    ôø
    ÷
    ø
    ù
    ú
    ôø ÷ ø ù
    ੒෼øççï÷õûúûüóççúõýìð
    ੒෼ùççï÷õúĀøĀóççúõùüìð
    čĹĬĸļĬĵĪŀā
    ø÷÷
    ù÷÷
    ú÷÷
    û÷÷
    ü÷÷
    Co-Occurrence
    Network Map
    CA Map
    MCA Map
    Step1
    ABSA/Machine
    Learning
    Step2
    Tagged
    GDA/SDA
    Questionnaire Free text answers
    Step0
    Analysis separately
    Analysis Text by
    KWIC,
    refering MAPs.

    View Slide

  32. Acknowledgments
    2023/9/28 CARME2023@University of Bonn 32

    View Slide

  33. Acknowledgments
    • This paper would not have been possible without the machine learning
    (ABSA) run by NICT's 2022 RA. Kazuya Ohata; thank you again for
    the co-authored paper and poster session presentation at the March
    2023 Natural Language Processing Conference NLP2023.
    • The research on multiple correspondence analysis by the reporter is
    also supported by Grant-in-Aid for Scientific Research (KAKENHI),
    20K02162 "Research on Categorical Data Analysis Methods Focusing
    on Geometric Arrangement of Data". We would like to express our
    gratitude for the support. https://kaken.nii.ac.jp/ja/grant/KAKENHI-
    PROJECT-20K02162/
    2023/9/28 CARME2023@University of Bonn 33

    View Slide

  34. Thank you for your attention.
    Questions and suggestions are
    welcome.
    [email protected]
    2023/9/28 CARME2023@University of Bonn 34

    View Slide

  35. MEMO
    2023/9/28 CARME2023@University of Bonn 35

    View Slide