Earlier this month, I read in a Gawker article that Bill Cosby’s star on the Hollywood Walk of Fame was defiled. This got me wondering if there are other people who have stars on the Walk of Fame who have been accused of crimes. So I went to Wikipedia’s list to see if I could glean anything. It turns out that there are far too many Hollywood stars than I care to read about so I decided to do some data scraping and analysis.
First I had to get all the names from the Wikipedia webpage. I omitted three entries:
- Bird Bacaw who was the sole inductee under the “Zoology” category. It turns out Bird Bacaw does not have a Wikipedia page, and while I am sure Bird led a fascinating life, I don’t think knowing about him/her will help me answer my question.
- Mayor Tom Bradley was the sole inductee under the “Mayor” category. Again, I think that Mayor Tom Bradley is/was an interesting person, but since he is such an anomaly I decided to not include him either.
- The Dodgers were inducted under the “Special” category. It must be nice to be in a “Special” category (literally) all of one’s own.
After taking out these three, I was left with stars that were assigned the five main categories: Live Performance, Motion Pictures, Radio, Recording, and Television.
I then created a list of the Wikipedia links for each star on the Hollywood Walk of Fame. After that, I scraped and cleaned the text from each Star’s Wikipedia page. Lastly, I used regular expressions to learn about word frequencies for words dealing with crime, which I have visualized below. Not surprisingly, these distributions were all right-skewed.
Disclaimer: I am just looking at word frequencies in Wikipedia pages. If someone’s wikipedia page includes the word “murder” several times, that could indicate that they portrayed someone in a film that dealt with murder (example: Angela Landsbury), that they are anti-violence advocates, or some other explanation.
Terms of Interest
In particular I looked into words related to accusations (alleged, controversy), legal proceedings (charged, convicted, statutory, legal, illegal, police, sentenced), as well as specific crimes ( murder, harassment, assault, rape).
I looked into the root-word “accuse” and “controversy”, and plotted histrograms of the number of times each of these words or their variants in included in a Wikipedia profile for each of the stars on the Walk of Fame.
Here we can see that we have right-skewed distributions of both the number of times that “alleged” and “controversy” showed up in Wiki pages.
I decided to look into who had 5 or more mentions of alleged. Here’s who popped up:
- Paul Abdul (5)
- James Brown (8)
- Bill Cosby (10)
- New Kids on the Block (5)
- Mickey Rooney (5)
- Lizabeth Scott (6)
Here are the stars who had 5 or more mentions of controversy:
- Muhammad Ali (5)
- Marlon Brando (6)
- Charlie Chaplin (10)
- Elia Kazan (8)
- The Monkees (6)
- Martin Scorsese (5)
- The Smothers Brothers (5)
These allegations and controversies range from improprieties in reality TV show competitions (Paula Abdul) to supposed communism (Charlie Chaplin).
Legal Proceedings Terms
I looked at several different root words relating to legal proceedings, but let’s focus on sentenced and convicted.
Here are the stars who had the word sentenced appear at least 3 times in their Wikipedia page:
- Chuck Berry (6)
- Rory Calhoun (3)
- Ronald Reagan (4)
- Wesley Snipes (4)
- Kiefer Sutherland (3)
Here are the stars who had the word convicted appear at least twice:
- Chuck Berry (2)
- James Brown (4)
- Sean “Diddy” Combs (2)
- Spade Cooley (2)
- Broderick Crawford (2)
- The Doors (2)
- Farrah Fawcett (2)
- George Kennedy (2)
- Lizabeth Scott (3)
When I was playing around with these data I came up with a few questions that I didn’t get around to:
- How can I build an accurate classification system to identify people with Wikipedia pages who have been convicted of crimes?
- Where are stars born? Are some birth states over-represented on the Walk of Fame?
- Can we find a way to predict if/when someone will receive a star on the Walk of Fame?
Knowing how many times the word “convicted” is used in a Wikipedia page is an imperfect measurement how whether or not a person was actually convicted of a crime. In other words, my gut is telling me that there is low specificity and probably also low sensitivity. However, this exercise did bring some interesting cases to my attention. We have a few people who are included in more than one list. In particular, Lizabeth Scott has a fascinating story that seems like it could have been the basis for the film LA Confidential. I also got to practice my scraping skills and work on some basic text analysis. If you want to see how I did this analysis, or perhaps add to what I did, my RStudio .Rmd file is up on GitHub.