Sunny Rai on Using Large Language Models to Understand the Depiction of Shame and Pride in Bollywood versus Hollywood

Rai and Rajagopalan explore the cross-cultural analysis of shame and pride in film

SHRUTI RAJAGOPALAN: Welcome to Ideas of India, a podcast where we examine academic ideas that can propel India forward. My name is Shruti Rajagopalan, and we are kicking off the 2025 job market series, where I speak with young scholars entering the academic job market about their latest research on India. 

Our second scholar in the series is Sunny Rai, who is a postdoctoral fellow at the Department of Computer and Information Science University of Pennsylvania. She received her Ph.D. in Computer Engineering from University of Delhi. 

Her research focuses on misinformation, mental health and cross-cultural variations in human language. We spoke about her co-authored job market paper titled, Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and Prejudice. We talked about depictions of shame and pride and heroism in Indian versus American films, the challenges with textual analysis of a visual medium, and much more.

For a full transcript of this conversation, including helpful links of all the references mentioned, click the link in the show notes or visit mercatus.org/podcasts.

Hi, Sunny. Welcome to the show. It’s such a pleasure to have you here.

SUNNY RAI: Thank you so much for having me. I’m super glad to be on this podcast.

RAJAGOPALAN: You might be our first computer scientist, I think. We’ve had all sorts of people. We’ve had historians, economists, geographers, political scientists, every kind. But I think you might be our first computer scientist. It’s very, very exciting. I feel like we have expanded.

RAI: I feel honored to be part of this group. Of course, my research basically focuses on expanding these computational methods for social science.

RAJAGOPALAN: No, the reason I really enjoyed reading your paper and we invited you here is you’re looking at social norms in cinema. The listeners of this podcast know that I’m a little obsessed with film. That’s an obvious fit. You’re more broadly looking at how can we use these large language models and other tools, to look at cross-cultural behavior or cross-cultural norms. 

You and your co-authors [Khushang Jilesh ZaveriShreya Havaldar, Soumna Nema, Lyle UngarSharath Chandra Guntuku] in this particular paper mine the subtitles of more than 5,000 Bollywood and Hollywood movies. What you’re studying is how these movies portray social emotions of shame and pride. In particular, you try to triangulate who’s feeling these emotions, who’s targeted, what kinds of actions are triggered, and so on. By combining behavioral science and psychology and looking at language as one part of your toolkit, and also with the help of large language models, which can now look at a lot of material very quickly, you quantify these cross-cultural differences.

This was an absolutely fascinating thing to look into. I was terrified when we saw a PhD in computer science. I was like, “Oh, my God, is this going to be something I don’t understand at all?” It was super fun to read the paper. Let me give away the headline findings before you get into it. What you find is, as one would expect, the Bollywood dialogues talk about shame far more than you see in films in the United States. Shame in Hollywood, when you do perceive it, is very self-focused, whereas in Bollywood movies, it’s very specific. It’s very tied or gendered.

The way women and men are shamed is quite different. The sources are also different, whether it comes from family or society and so on and so forth. This is just absolutely fascinating. These are things I would have expected, but it’s still very cool to see it in this particular format. Thank you for coming.

Shame and Pride in Film

RAI: Thank you so much Shrutifor that wonderful summary. You have covered almost everything. Let me start with what got me interested in social norms. I have a super curious five-year-old nephew. One day, I was watching him scrolling cartoons on YouTube. I noticed that it was all nice. He was having fun. But very quickly, I saw that some of the characters and stories were filled with harmful stereotypes, things we definitely wouldn’t want our kids to pick up.

That’s when I thought, what if we build a system to detect social norms depicted in these videos? That way, it would be easier for parents to identify videos, books, or content that they want their kids to watch, and it will solve so many problems for us. Soon, I realized that it’s not that simple. Characters don’t really come out and say the norms they are teaching. Instead, these norms are hidden in the way they talk, act, and interact.

That got me thinking: How do we teach a machine to pick up on this unspoken social knowledge? Like you rightly said, the latest language models like ChatGPT are incredibly good at understanding human language. Most of us have already used them and are in awe of this technology. Why don’t we just ask ChatGPT to give us the norms depicted in the videos?

In some cases, yes, you can definitely do that if you are interested in norms of Western society. These models do not do so well when it comes to underrepresented cultures and their training data. That’s where the problem arises. The real research question then became, How can we teach machines to identify social norms across different cultures? That’s a very interesting question. That made me wonder, How do we ourselves learn social norms? What teaches us that this behavior is acceptable and this behavior is not acceptable?

We figure out norms by observing how people react to our behaviors. If we get praised for our behavior, we are more likely to do it again. If we are punished, we learn that we should not repeat this if we do not want to bear the same consequences again. In our research, we focused especially on these sanctions to uncover social norms, and it worked. We tested our approach on two very different cultures: Hollywood in America and Bollywood in India. These cultures are very different, and the norms that we uncovered aligned with known cultural tendencies.

Hollywood norms were more about self-accountability and competence, whereas Bollywood norms were more about family honor and gender roles for women. That was something which was very interesting. You know what? Like I said in the beginning, it was my nephew who motivated me to study social norms. You would be surprised to learn that the World Bank team is actually deploying this approach to detect harmful gender stereotypes in short videos and movies. They are trying to implement this approach for developing nations to understand what kind of content is shown to young women or girls in those countries.

RAJAGOPALAN: Here, before we get into how this is deployed, can you give us an example? All of us consume a lot of. . . at least I consume a lot of movies. What would be a stereotypical Hollywood film and an example of how they depict shame? Bollywood, I’m guessing, there are a whole range of examples of how we shame women. 

RAI: Yes, of course. Take the example of the movie Kabir Singh, which was popular for all the wrong reasons. In this movie, there is a scene where Preeti, played by Kiara Advani, introduces her boyfriend, Kabir Singh (Shahid Kapoor), to her family. Her father calls her shameless for bringing dishonor to the family. He even makes himself clear that he will decide whom she marries.

That’s something very commonly depicted in Bollywood movies. Likewise, it’s not really only about the gender norms. We found that, for example, in the movie Dear Zindagi. Again, it’s a very nice movie, and the lead actress is Alia Bhatt. In one of the scenes, she is talking about the challenges of mental illness and how even her immediate family feels shame or avoids discussing mental disorders and providing the option for going to therapy. This is how very subtle cues, and this is how things come up in these movies. I would also like to take the name of Pushpa 2, which is. . .

RAJAGOPALAN: Oh, I haven’t watched that yet and probably will not. 

RAI: It made so much money. It was one of the very big blockbusters. When I was watching that movie, by then I had done this paper, and I was like, “This movie is such a huge mine of social norms.” The lead character is constantly being sanctioned and shamed for lacking his father’s name. Likewise, there were women characters in the movie who were maybe abused by other characters, and were constantly shamed for that happening, and their family members were being shamed. I felt that this is something which is still happening. You would assume that we have moved on. We are, right now, in 2025. We might have stopped shaming women for getting raped, but that’s not the case even in today’s world.

RAJAGOPALAN: At the very least, start shaming the rapists, which also you don’t see all the time.

RAI: Yes, exactly. Even today, it’s the same thing that’s happening, and that’s very sad.

RAJAGOPALAN: What are some examples of movies in the West? How do they depict shame? What would be a classic example?

RAI: In Hollywood movies, shame happens when you have failed to do a certain duty. For example, you were not honest, people discovered you were not honest, and therefore, you would be shamed. You would be shamed if you are incompetent. You are not able to make a good living, honest living; that will bring you shame. For women, it’s about social appearance. If you have a certain type of appearance, certain social etiquette, or a certain way about yourself, then you would be praised. In every society, there is this thing that if women behave more openly, it’s something that troubles people around them. Again, you do not want women to have very open opinions, I guess. 

They’re just different reasons. It’s mostly because of promiscuity. Women are more shamed in Hollywood.

RAJAGOPALAN: They’re slut-shamed?

RAI: Slut-shaming, yes.

RAJAGOPALAN: What would be an example of men being ashamed for not doing their duty or something like that? 

RAI: For example, if you went on a date night but you didn’t call back after the date. You were trying to avoid confrontation. That’s considered shameful. You should be honest in your interaction. It’s not a very strong shame marker, but at least there will be subtle shame.

Likewise, if you are selling poor-quality goods, that’s again something that will show up in the movie, and people will shame you for that kind of behavior. It’s again about ethics and honesty, and competency. These are very central themes when it comes to what is considered socially appropriate behavior in Hollywood movies.

RAJAGOPALAN: I guess the other difference in Hollywood movies, it’s a lot about the individual and what they do and don’t do. Whereas in Bollywood movies, it’s a lot to do with your relationship with someone else. Even the most famous example I can think of shaming men is Deewaar, right? You write, Mera Baap Chor Hai [My father is a thief]. You tattoo that on the hand. The kid has done nothing wrong, but that’s what you do. It’s usually in relation to, you bring shame upon the family for something that you may not even have done, but just simply because you’re in a particular social structure or milieu or something like that.

RAI: Yes, exactly. Yes, that’s completely true. That’s what we also discovered. These two societies differ so much in the way they are shaming people, as well as the reasons they are shaming people. Again, like you pointed out, India has a very interdependent culture. What children are doing will influence parents’ honor. What parents are doing will, in turn, influence children’s honor and their status in society.

Whereas in the case of Hollywood, it’s not the case. What parents have done is completely different. Maybe you benefit from it, but you will have to, again, choose your own goals and pick your own achievements to be respected by society. Honor is not so interconnected in Hollywood. Whereas in the case of Bollywood, yes, it is very interconnected.

Teaching Machines Norms 

RAJAGOPALAN: This, of course, as you point out, is a reflection of the two cultures. Now, the setup is very interesting. It’s very intuitive. All the examples you give make a lot of sense. Now, my question is, how do you go about testing something like this? 

RAI: I’m glad you asked this. We actually designed a study with two different approaches because they give us two very different kinds of insights. These large language models, normally, we definitely want something more explainable to validate the findings that we are obtaining through these black boxes.

Our first approach is the explainable component of our approach. It tells you the kind of words people are using around shame and pride expression. For example, the dictionary we use for the vocabulary method is called LIWC, Linguistic Inquiry and Word Count. It basically categorizes words into broader cognitive categories. For example, happiness will be called a “positive emotion,” and there would be so many words which are depicting happiness. They will all be grouped under positive emotion. 

Likewise, there are categories like first-person pronoun and second-person pronoun. When we are analyzing shame-related expressions, we can actually count the number of times these first-person pronouns have occurred in their vicinity. This can tell us whether shame is self-oriented. Likewise, if there are more second-person pronouns, then we can say that shame is outward-oriented. You are shaming someone else, you are regulating someone else’s behavior. But in the former case, you are regulating your own behavior. This approach helps us map the style and patterns of how sanction is expressed across cultures.

The second is the prompting method. Here we take advantage of powerful language models like GPT-4 to understand why shame and pride are expressed. The first approach is about how; the second approach is about why. If someone says, “I feel proud” or “I feel ashamed,” we basically give the surrounding dialogues to the model, and ask, “Could you tell the reason, based on the given dialogues, why this person is feeling pride or shame?”

Together, these two approaches give us both a surface view of the language—that is, what words people use—and the deeper story revealing what motivates those feelings.

RAJAGOPALAN: Now, what do you find in the difference between these two approaches? Did both of them confirm each other? Or what did you find when you tested for this?

RAI: Both of them confirmed each other. You will see that, in Bollywood, we saw an increased use of social references. For example, “family”, “we.” There is a collective sense when there is a conversation about the social emotion, whereas in the case of Hollywood, you would have noticed that there is a very high correlation with first-person pronouns. There is a very high correlation with anxiety and guilt. People actually feel bad when they do something wrong. It’s mostly about something that is not ethically correct.

Whereas in case of Bollywood, there is anger. There is a female association. The thing is that, in the case of Bollywood, shame is more like a regulatory component. It is used to control others’ behavior. Whereas in case of Hollywood, it’s more about, it was not the morally right thing to do, and therefore, you are feeling bad. You are feeling guilty and anxiety about it. Whereas in case of Bollywood, people are expressing anger, and people are calling you out because you have done something, and the idea is to make them aware of the consequences so that the same thing doesn’t happen again.

When we did the prompting approach, we found these reasons. When we clustered them together, we found a similar pattern. In the case of Hollywood, it was more about accountability, incompetence, and social etiquette. Whereas in case of Bollywood, it was more about respecting elders, gender roles, son’s duty towards parents, and things like that. We did find that the results we found were aligning with each other.

RAJAGOPALAN: Except slut-shaming, which is just universally an external sanction, and that’s just bad for women all over the world [chuckles]. What you’ve done is super impressive. You’ve actually managed to find subtitles of more than 5,000 films and run them through your model. Films are fundamentally a visual medium.

Textual Analysis in a Visual Medium

Now, it seems to me like there’s an automatic bias when you rely exclusively on dialogue. There are going to be some films which are very verbose. If something is written by Aaron Sorkin or Salim–Javed, it’s going to have lots and lots of dialogue, just nonstop dialogue. Whereas there might be other kinds of films which are just much more screenplay-driven, background music-driven. They’re not as verbose, or at least, not all the members are that verbose. What is a good way of thinking about that, and how do you control for that?

RAI: That’s a very good question. I completely agree with you that our approach can capture norm violations only if they are verbally expressed. If someone is communicating sanction through their glance or body language, it will be completely missed by our model. To overcome this, we did think about using movie scripts because they come with stage directions. It basically tells how this character is feeling, what does the surrounding look like for this character, and what do other characters think about this character? The things that we infer from the scene, they are all written in the script. The only challenge is that not many movie scripts are available online.

RAJAGOPALAN: Most of the Indian films don’t even have a movie script sometimes, let alone make them available later.

RAI: Ideally, we would have loved to use movie scripts, but those were not available. In future work, we would definitely love to integrate visual cues. Now we have multi-modal language models that can work with images as well as dialogues. That’s something we are looking forward to.

The Trouble with Subtitles and Scripts

RAJAGOPALAN: That would be great. What about songs? Songs do get picked up in subtitles. Bollywood is just full of songs, and there’s a lot expressed through songs. There’s also shame and things like that. Famously, you have Laaga Chunari Mein Daag or something like that. Do you also include songs as part of dialogues in the subtitles?

RAI: No, we do not. I agree that’s. . .

RAJAGOPALAN: You have to do a separate one just on songs, I think. That should be the next paper [chuckles].

RAI: Yes, I’ll tell you why. Like you said, songs are very metaphorical in nature. It’s not really like a literal conversation that’s happening between characters. If you want to understand songs, you need to have an additional component in your method to work with creative text. In fact, I did my PhD on metaphors. I love metaphors [chuckles]. Yes, I definitely appreciate this component. Songs do have very deep meanings, and they do have very deep messages.

In fact, songs all around us, whether they are in movies or not, they do carry a deep social meaning. What are the behaviors that you should do? For example, if you think of wedding songs, they also have so many customs and to-dos for women—“This is how you should be behaving once you get married.” That’s a very interesting avenue, but in this paper, we only focused on spoken dialogues.

RAJAGOPALAN: Yes, and I wonder if the songs, because they’re metaphorical, can actually compensate for the lack of a visual component or the lack of a screenplay that you have because there is some broader metaphor that’s at least captured through the song, though it’s not captured through the screenplay. It would be super interesting to see that in future work. Now, the next part is, you use subtitles. Basically, even though this is a cross-cultural analysis, the cross-culture is boiled down to subtitles which are in English, if I understand this correctly.

Now, the trouble is, and I’ve spent the last 18 years in the United States watching Hindi films, and so they’re usually subtitled. The subtitles are horrible, until very recently. I think last year or maybe a little bit before that, Nasreen Munni Kabir came on the podcast, and she’d recently started subtitling, and she does it amazingly well. We were just talking about how bad movie subtitles were. My favorite bad subtitle of all time is from this movie called Mangal Pandey: The Rising, where Kailash Kher is singing this rousing title song, and he says, “Mangal, Mangal, Mangal, Mangal,” and the subtitle is “Tuesday, Tuesday, Tuesday, Tuesday.”

I just laughed out loud when I saw that. I was telling Nasreen Munni Kabir, I was just grateful it didn’t say “Mars, Mars, Mars, Mars” instead of Tuesday. It can only get worse as the subtitling is done by machines. How do you solve for just automatic biases here? A lot of subtitling, especially post-Google and YouTube, is just done by their model, which was also terrible. It wasn’t done by humans. What is a good way to think about this or solve for this?

RAI: What you are referring to is ambiguity.

RAJAGOPALAN: Just plain incorrect.

RAI: [chuckles]Data quality is one of the biggest challenges when it comes to training models. Like you might have already heard, garbage in, garbage out. If you’re teaching a model “Tuesday, Tuesday,” it will also output “Tuesday, Tuesday” only. It will not, by default, learn that it’s the name of a person, a character. In theory, the cleanest solution would be to buy the movies and hire people to carefully transcribe all the dialogues. Basically, you do your own subtitling. But that just does not scale. Human transcription is expensive and slow. In fact, it could take someone two to four hours to label these situations.

RAJAGOPALAN: Per movie, and you have 5,000 movies, of which half of them are Bollywood movies. That’s a big dataset.

RAI: Yes. You would wonder, “What about automatic transcription?” Like you said, these subtitles are probably auto-generated, and that’s why they are bad. With movies, because there are multiple characters speaking sometimes at once, there are a lot of errors because of that as well. The model is not able to clearly segment dialogue, which was delivered in parts by one character, so it just jumbled up everything.

Ideally, we would have loved to have the actual transcription in the original native language so that we are not relying on these machine translations, but subtitles are the only things that are widely available, and they are mostly available in English because a person who knows Hindi or wants to watch a Hindi movie is probably aware of the Hindi language and doesn’t need Hindi subtitles.

It’s mostly the language which is more commonly spoken in the world. While subtitles are not perfect, we put in safeguards to overcome the challenges and these kinds of errors. We always manually review a random set of samples to check quality. In this particular case, since we were only interested in dialogues around shame and pride, the dataset was not that big, and we were able to manually review everything ourselves. That is how we made sure that we are not capturing any “Tuesday, Tuesday.”

RAJAGOPALAN: It’s not like “Mangal, Mangal,” “Tuesday, Tuesday.” No, I think you’re absolutely right. You’re looking for something so specific, and it’s a very clear, precise emotion with very specific words associated with it. If something goes wrong in translation, you can always correct. I’m also more broadly curious. You have this conundrum. If you rely too much on dialogue and too much on subtitling, there’s one kind of bias.

Now, how do you correct for under-representation of certain kinds of shame? Recently, it was the 50th year of Sholay, for instance. Sholay has two leading women. One of them is Hema Malini, who is Basanti, who is a working woman, and just always colorful and chatty and so on. The other one is Jaya Bachchan, and she, Radha, she’s a widow. She has maybe six lines in the entire film. It’s also very clearly shown how colorful and happy and chirpy she was before she became a widow. Then she’s just in all white.

Now, there is this social shame associated with being a widow. You can’t be flamboyant. You can’t sing. You can’t have color on you. There’s a whole bunch of things going on, but it never gets captured in dialogue. Except maybe when they’re telling the story of Radha before she became a widow. There are four sentences or five sentences there, and that’s it.

What is a way to think about these sorts of things more broadly? I’m not trying to say that your one paper has to solve for everything, but when you look at something which is so specific, like shame—more broadly, how does one think about capturing all of this? Are there things you look for in dialogue, like looking for treatment of widows or treatment of older women, or treatment of children born out of wedlock or something very specific?

RAI: Yes, that’s a very, very good point. In fact, we are completely aware that explicit keyword matching of shame and pride only covers a very small set of norm violations. There are so many situations where norm violation might be expressed through other emotions or might be expressed without saying any emotion. Maybe you are just slapping someone, and that’s the sanction. That’s where you violate the norm. In fact, domestic abuse is a form of sanction, also. There is definitely a very strong limitation when you are looking for explicit words like shame and pride.

We are trying to overcome this by trying to infer implicit sanctions. For this purpose, again, we are relying on this. We are relying on large language models, and we are basically trying to teach them what does shame look like or what does embarrassment look like. We have this big list of emotions that might cause sanction. We are explaining the definition. We are giving some examples that “this” would be considered a case of shame, or “this” would be considered a case of guilt, to basically make it understand what is something that is caused by external evaluation and what is something that is caused by internal evaluation.

That is something ongoing. In future, I hope that we will be able to figure this out and we will be able to all also quantify how many norm violations can be captured by explicit keyword matching versus how many are missed out. Right now, we don’t know what is the coverage of the approach. That’s also another challenge.

Self-Shaming vs. Other-Shaming

RAJAGOPALAN: Yes. The reason I asked the question is not to say that the paper has to have all of this completely covered and comprehensive, but more about what you’re talking about is so contextual and so socially contextual that, at some point, when you’re extrapolating, if we manage to remove that context completely, you’re going to get biased one way or another way. It’s going to be over- or underrepresented.

I guess it’s like any other complicated problem. You need to write multiple papers and look at it in multiple different ways. What was something that surprised you when you were looking at it? The gender thing doesn’t surprise either of us. The fact that young people are ashamed more than older people doesn’t surprise us. Something simple like policemen always show up five minutes late. That was pretty famous in the ’70s and ’80s movies. That doesn’t surprise us. Was there something that did surprise you when you were looking into this?

RAI: Yes. I think the one thing that surprised me was this self-shaming versus other-shaming. When you see the result, I think it becomes intuitive and you think, “Yes, of course, that is how it would be.” Before we did this paper, I wasn’t even aware that there could be these two versions of shaming or self-regulation. There is one version where you try to self-regulate rather than depending on society to tell you what is good or bad. Then there is another version where society is actually regulated so much by other people telling you what is appropriate and what is not appropriate. That is what determines what we will do in life or how we will behave with others.

That was something very surprising to me. In fact, that led me to explore this further. As of now, I’m also working on another paper where we are trying to look at how does an LLM behave when it comes to these kinds of situations. There are multiple objectives. There is self-objective and then there is other-objective. Now, who does it prioritize? Does it prioritize family over you, or does it prioritize you over work?

That’s something because these language models are mostly trained on Western data. They are being told to put yourself first and be moral and ethical. What about these cultural nuances where you are expected to sacrifice yourself or your family? You are expected to let go of your wellness because you want to provide for your children?

RAJAGOPALAN: Or give up your lover because your best friend is in love with her. That’s another classic love triangle trope, which is so annoying.

RAI: Yes, exactly. This is something we are actively analyzing. Very soon, maybe there will be another paper. I expect to release this work by the end of December.

LLM Alignment Needs a Culture Check

RAJAGOPALAN: I really look forward to reading that. I’ve been asking you all these questions about film, which is obviously what you’re looking at. But that’s also not what you primarily do. You could do this in any context. You could do this through newspapers. You could do this through surveys. You could do this through song. Now, if I ask you to put on your computer scientist hat. If you were advising people who were building some of these large language models, what would you advise them to do or not do?

AI safety is now a really big part of the conversation. Localization and making things contextual to local culture is, again, a very big part of the conversation. How can your research inform some of what’s happening there? That stuff is literally being built as we speak.

RAI: Yes, exactly. This whole domain is called norm discovery. Using these norms to align language models is also known as value alignment or LLM alignment. That is something that is actively pursued right now in the AI domain. Well, I started with the motivation that I will extract these norms, and I will also align language models. I will also fine-tune them to make them more culturally aware and more engaging for the target users. At the end of the paper, I realized that the more important thing is to characterize what you are teaching to large language models.

When we looked at this dataset, we found that, first of all, sanctioning is something very biased against the weaker sections of society. For example, women will be more shamed. Likewise, someone with lower income will be more shamed for the similar behavior or will be more sanctioned for the similar behavior. If you are teaching an LLM whether something is appropriate or not, it also needs to be considered, Is it for women? Is it for a peon? Is it for a CEO? Does it matter? Do people care if a politician is doing something inappropriate?

I think now, for alignment, we need to take a step back and carefully look at the data which we are feeding to this model. Are we teaching it to express social bias? If a woman is asking if she can go out and attend a concert, is it going to tell you that probably you should be back home by 8:00 PM? Does it say a similar thing to men? That’s something we need to be very cognizant of.

RAJAGOPALAN: Even between two women, so many Hindi films where the stepdaughter—or the Cinderella factor, as we call it—the niece or the orphan is treated so much worse than the other girls at home, and things like that. It’s the Seeta Aur Geeta phenomenon, as one would say. This stuff is really fascinating. I really look forward to reading your other work. This was a lot of fun for me to read. I was actively trying to think of examples.

If you ever want to do something on Bollywood songs, please come back and chat with us because the only thing I like more than Bollywood movies is probably Bollywood songs. It’ll be an enormous amount of fun.

RAI: Yes. It was super fun for me also. I loved the way you helped me understand some of the aspects, like the song thing, which I was not really focusing on. Maybe I will start looking at it again. Thank you so much.

RAJAGOPALAN: The song thing, it’s really funny. I was literally thinking of Laaga Chunari Mein Daag. It is such a phenomenal song. It’s raag Bhairavi, and Sahir Ludhianvi wrote the song, Manna Dey sang it. We forget the kind of shame they are talking about because the song is so fantastic. We sing this at competitions. On Indian Idol someone sings it, they get lots of claps or applause because it’s such a tough song to sing.

Actually, the content of the song . . . it’s not the best signal we’re sending out there. It’s amazing how one can think about some of these things. The songs are particularly interesting. But again, lyrics are such a tiny part of the song. I imagine you’re going to run into a lot of trouble even when you do that project, but it would be fun for me to see what you find.

RAI: Yes, of course. I think that’s what I also noticed in children’s videos. It would appear so harmless. Everybody is thinking, “What’s wrong with this depiction?” You will see that there is a group of boys, and they’re bullying girls and not letting her play with them. Maybe there is always this woman who is depicted in the role of a stereotypical mother preparing food, packing lunches, and the father is nowhere [laughs] to be seen. I really felt that, “Is this what we are teaching to our kids? That this is how you will be behaving to people around you?”

RAJAGOPALAN: You know, Sunny, you’re a lot younger than me, I believe. But I thought when we were growing up, every Hindi movie song had what I called the playful stalker. In the song, it’s like the men are harassing women. They’re pinching their cheeks, they’re pulling their dupatta, they’re pulling their hair. By the end of the song, the girl magically falls in love with them. All of us were raised to believe that this is how people fall in love with the movies.

If someone had done that in real life, you would have smacked them. You’re so right. It’s so bizarre, the content that some of us have grown up on, and maybe some of us have even internalized without realizing what it is we are consuming. Now, you have a second responsibility. You have to do this for kids, and you have to figure out how we do this in developing countries for young girls. You’re also simultaneously training neural networks and the large language models, which is like training a five-year-old in some sense.

Looking Ahead: A Final Reflection

RAI: Yes, exactly. I think now we are getting exposed to these technologies more and more. It is very important that we understand what these models will be expressing and how they will be expressing it because end users are so much in awe of these technologies. Some people actually think that whatever they are saying is right, and that’s how things should be. It’s very important that we know what we are teaching to these models and we evaluate what they are learning. Eventually, there should be some form of regulation on what kinds of things that these technologies should express and should not express.

RAJAGOPALAN: Yes. That’s why I’m looking forward to this research. I think you have your work cut out for you. You must have this long list of research projects. Thank you so much for coming on the show, Sunny, and talking to us about this research. This was a lot of fun for me to read.

RAI: Thank you so much for having me. I really enjoyed our conversation. Thank you.

About Ideas of India

Hosted by Senior Research Fellow Shruti Rajagopalan, the Ideas of India podcast examines the academic ideas that can propel India forward.