The Exploration of Demographic Factors on The Survival State of Chinese Movie Theaters in LA

Project Abstract

The project focused on finding out the reasons that contributed to the appearance and decline of Chinatown movie theaters and their financial health from 1940 to 2000 in Los Angeles. Using demographic census data and other datasets, we want to identify factors that could be linked to this phenomenon, including the density of Asian immigrants and the industry distribution in both theaters’ primary addresses and their adjacent neighborhoods. I used an interactive ArcGIS online to map theaters in LA and found two main clusters that differentiate from other counties’ or states’ Chinese Language Movie Theaters’ distribution. This raises another exploration in testing the effect of location and clustering on the theaters’ survival state. Using ordinal regression and multilevel logistic regression in R, we found that marriage rate, business industry, Chinese population, and other factors could contribute to the operating condition of theaters. Nevertheless, the location is not a significant factor impacting theaters’ performances. This research could be applied to exploring Chinatown Movie Theaters in North America and contribute to the navigation of their general history and development.

Introduction

The research is a part of a big project about Chinatown Movie Theaters in North America. Theaters in North America screened Chinese-language movies as early as the 1920s, while the peak of such theaters extended from the 1960s through the 1990s, as Hong Kong distributors disseminated Cantonese and Mandarin films on circuits through dozens of dedicated theaters in various cities. This summer, we mainly focus on the Chinese movie theaters in LA, which differentiate from other cities’ theaters since they are mainly concentrated in two clusters.

Similar to other Chinese-language theaters that peaked in the 1970s and 1980s while declining in the 1990s, their financial health changed coincidently with the demographic shift created by a new wave of immigrants from Asia and Central America, primarily fueled by the Alien Quota Act in 1965. The development of Chinatown downtown could also reveal this change as it became more diversified in the 1960s. However, unlike other cities, LA experiences another immigration trend in the 1980s. The post-1980 globalization trend has profoundly shaped Asian immigration, assimilation, and development in a new ethnoburb, San Gabriel Valley. These two Chinese aggregations match the clusters on the map where Chinese-language movies theaters locate. Therefore, we aim to solve two research questions related to this phenomenon. One is exploring the demographic factors contributing to the opening or closing state of Chinese-language theaters and their financial health. The other is finding whether location (either downtown or suburban) is a significant factor in affecting the Chinese-language theaters’ survival.

Data Source

I mainly use the demographic Census dataset retrieved from Social Explorer since the dataset on this database is tract-based (Source). As it lacks the 1950 Decennial Census, I manually input the information based on the table in the 1950 United States Census file (Source).

For the location, as we want to check more detailed demographic information, we choose tracts as individual units, find out theaters’ their neighbor commercial area’s addresses, and attach them to their corresponding tracts. Since the tracts could vary from 1940 to 2000 decades by decades, I have to use Geocoder from Census Bureau to decipher the tracts in the previous decades from 2010 Geographies. Since Social Explorer has the Census based on 2010 tracts from 1970, I mainly dig out tracts that we will use in 1940, 1950, and 1960. Here are the tracts in different decades after geocoding and theaters’ basic information, including their current location and year of survival.

Basic Information About Theaters

I use GIS to plot all theaters with their basic information by importing the dataset as the attribute tables. There are several layers on this map. The basic layer is the Dark Grey Canvas. The second layer is the counties in California, and the third layer is the tracts I choose as the observations, including the ones where theaters are placed and their adjacent tracts. As we can see, they are represented by the polygons on the map. The fourth layer is the observations that could be clustered as components within each theater. I used clustering analysis on these points and found two main clusters. One is in the LA downtown with 63 features, and another is around Monterey Park, San Gabriel, Alhambra, and Rosemead with 65 features. As we can see on the map, the cluster density in the downtown area is higher than in the suburb, which means that downtown theaters are more concentrated than theaters in the suburbs. This could also be shown by the second-level clusters that there are only two clusters downtown while there are four clusters in the suburb, which present four main theaters. This is because the Chinese movie theaters downtown are normally distributed around Chinatown, which makes them denser, while the suburb includes more cities and the population is more scattered.There are seven main theaters.

LA Downtown Theaters
- King Hing Theatre
- Kim Sing Theatre
- Pagoda Cinema
Suburb Theaters
- Bard’s Garfield Egyptian Theatre
- Kuo Hwa 2 Cinema
- Kuo Hwa Theatre
- Monterey Theatre

Variables Choose

After checking all existing variables in the Demographic Census Datasets in different decades, we choose twenty independent variables that could potentially relate to the theaters’ health and categorize the dependent variables that could represent the theaters’ opening status and financial health. We recode theaters’ opening years in a decade in three categories and recode theaters’ tax filing records retrieved from California FTB data into categories that could show the theaters’ financial conditions.

Dependent Variables:
- Opening Status
  - 0-Closing for The Entire Decade
  - 1-Opening for a Period of Time in This Decade
  - 2-Opening for the Entire Decade
- Financial Health
  - 0-unknown or closing
  - 1-Suspend or Termination while opening
  - 2-at least a penalty
  - 3-at least 1 SI
  - 4-very healthy - filing the tax for the existing year
Independent Variables：

Here are the overview variables that we choose for the first draft. We add one variable Real Estate Values retrieved from the location’s parcel assessed total value change in percentage in each decade (Source). But there are a lot of missing data for different decades.

Data Management

The next step is to download those datasets from Social Explorer and do the data management for each one.

First, we have to import all datasets. Since the 1960 dataset can only get all tract information, I import the 1960 dataset first and select the tracts we want. For the 1950 dataset, I import the manual version I collect in this sheet.

Import Datasets

Clean up 1940 dataset

x1940T$Age1<-as.numeric(x1940T$X..Under.5.Years)+as.numeric(x1940T$X..5.to.9.Years)+as.numeric(x1940T$X..10.to.14.Years)+as.numeric(x1940T$X..15.to.19.Years)
x1940T$Age2<-as.numeric(x1940T$X..20.to.24.Years)+as.numeric(x1940T$X..25.to.29.Years)+as.numeric(x1940T$X..30.to.34.Years)
x1940T$Age3<-as.numeric(x1940T$X..35.to.39.Years)+as.numeric(x1940T$X..40.to.44.Years)+as.numeric(x1940T$X..45.to.49.Years)+as.numeric(x1940T$X..50.to.54.Years)+as.numeric(x1940T$X..55.to.59.Years)+as.numeric(x1940T$X..60.to.64.Years)
x1940T$Age4<-as.numeric(x1940T$X..65.to.69.Years)+as.numeric(x1940T$X..70.to.74.Years)+as.numeric(x1940T$X..75.Years.and.over)
x1940T<-x1940T%>%
  rename(FemaleP=X..Female, MaleP=X..Male, WhiteP=X..White, BlackP=X..Black, OtherRaceP=X..Other, WhiteNativeBorn=White.Population..Native.Born,WhiteForeignBorn=White.Population..Foreign.Born)

x1940T$Educ1 <- as.numeric(x1940T$X..Population.Age.25.and.Over..Less.Than.High.School)
x1940T$Educ2 <- as.numeric(x1940T$X..Population.Age.25.and.Over..Some.High.School.Or.More)
x1940T$Educ3 <- as.numeric(x1940T$X..Population.Age.25.and.Over..Some.College.Or.More)
x1940T$Household1 <- as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home..Under..500..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...500.to..699..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...700.to..999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...1.000.to..1.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...1.500.to..1.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...2.000.to..2.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...2.500.to..2.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...3.000.to..3.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...4.000.to..4.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...5.000.to..5.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...6.000.to..7.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...7.500.to..9.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1940T$Household2 <- as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...10.000.to..14.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1940T$X..Housing.Units.Reporting.Value.of.Home...15.000.to..19.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)


x1940T<-x1940T%>%
  rename(NativeForeignP = X..White.Population..Native.Born,
         ForeignBornP = X..White.Population..Foreign.Born,
         LaborP =X..Population.Age.14.and.Over..In.Labor.Force,
         NoLaberP=X..Population.Age.14.and.Over..Not.In.Labor.Force,
         CivilianLaborP = X..Population.Age.14.and.Over..In.Labor.Force..In.Civilian.Labor.Force,
         EmployedWorker = X..Population.Age.14.and.Over..In.Labor.Force..In.Civilian.Labor.Force..Employed,
         Unemployedworker = X..Population.Age.14.and.Over..In.Labor.Force..In.Civilian.Labor.Force..Unemployed..Seeking.Work.,
         Occupation.ProfessionalWorkerP = X..Employed.Civilian.Population.Age.14.and.Over..Professional.Workers,
         Occupation.ManagersP = X..Employed.Civilian.Population.Age.14.and.Over..Proprietors.Managers.Officials,
         Occupation.ClericalP = X..Employed.Civilian.Population.Age.14.and.Over..Clerical.Sales.Kindred.Workers,
         Occupation.CraftmanP = X..Employed.Civilian.Population.Age.14.and.Over..Craftmen.Foremen.Kindred.Workers,
         Occupation.DomesticServiceP = X..Employed.Civilian.Population.Age.14.and.Over..Domestic.Service.Workers,
         Household3= X..Housing.Units.Reporting.Value.of.Home...20.000.and.Over..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         OtherRace=Other,
         Population=Total.Population.1,
         PopulationDensity=Population.Density.per.sq..mile)

x1940T<-x1940T%>%
  select(Tract, Population, PopulationDensity, Male,MaleP,Female,FemaleP,White, Black, OtherRace, WhiteP, BlackP, OtherRaceP,
         Age1, Age2, Age3, Age4, NativeForeignP, ForeignBornP,
         Educ1, Educ2, Educ3,LaborP, NoLaberP, CivilianLaborP, EmployedWorker, Unemployedworker, Occupation.ProfessionalWorkerP,Occupation.ManagersP,
         Occupation.ClericalP, Occupation.CraftmanP, Occupation.DomesticServiceP,Household1, Household2, Household3)

Clean up 1950 dataset

x1950T<-x1950T%>%
  rename(Population=Total.Population,
         PopulationDensity="Population. Density.per.sq..mile",
         LaborP=LaberP,
         NoLaberP=NolaberP)

Clean up 1960 dataset

x1960T<-x1960T%>%
  rename(Female=Total.Population..Female, Male=Total.Population..Male, FemaleP=X..Total.Population..Female, MaleP=X..Total.Population..Male, WhiteP=X..Total.Population..White, 
         White=Total.Population..White, Black=Total.Population..Black, OtherRace=Total.Population..Other.Race, BlackP=X..Total.Population..Black, OtherRaceP=X..Total.Population..Other.Race, ForeignBornP=X..Foreign.Stock.Population..Foreign.Born,ForeignBorn=Foreign.Stock.Population..Foreign.Born,
         NativeForeign=Foreign.Stock.Population..Native.of.foreign.or.mixed.parentage,NativeForeignP=X..Foreign.Stock.Population..Native.of.foreign.or.mixed.parentage)

x1960T$Age1<-as.numeric(x1960T$X..Total.Population..Under.5.Years)+as.numeric(x1960T$X..Total.Population..5.to.9.Years)+as.numeric(x1960T$X..Total.Population..10.to.14.Years)+as.numeric(x1960T$X..Total.Population..15.to.19.Years)
x1960T$Age2<-as.numeric(x1960T$X..Total.Population..20.to.24.Years)+as.numeric(x1960T$X..Total.Population..25.to.29.Years)+as.numeric(x1960T$X..Total.Population..30.to.34.Years)
x1960T$Age3<-as.numeric(x1960T$X..Total.Population..35.to.39.Years)+as.numeric(x1960T$X..Total.Population..40.to.44.Years)+as.numeric(x1960T$X..Total.Population..45.to.49.Years)+as.numeric(x1960T$X..Total.Population..50.to.54.Years)+as.numeric(x1960T$X..Total.Population..55.to.59.Years)+as.numeric(x1960T$X..Total.Population..60.to.64.Years)
x1960T$Age4<-as.numeric(x1960T$X..Total.Population..65.to.69.Years)+as.numeric(x1960T$X..Total.Population..70.to.74.Years)+as.numeric(x1960T$X..Total.Population..75.Years.and.Over)

x1960T$Educ1 <- as.numeric(x1960T$X..Population.Age.25...No.school.years.completed)+as.numeric(x1960T$X..Population.Age.25...Elementary.or.more)-as.numeric(x1960T$X..Population.Age.25...High.school.or.more)
x1960T$Educ2 <- as.numeric(x1960T$X..Population.Age.25...High.school.or.more)
x1960T$Educ3 <- as.numeric(x1960T$X..Population.Age.25...College.or.more)

x1960T<-x1960T%>%
  rename(Single = X..Population.14.years.and.over..Single,
         Married = X..Population.14.years.and.over..Married..not.separated,
         Separated=X..Population.14.years.and.over..Separated,
         Widowed=X..Population.14.years.and.over..Widowed,
         Divorced=X..Population.14.years.and.over..Divorced)

x1960T$Income1 <- as.numeric(x1960T$X..Households..Less.than..1.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1960T$X..Households...1.000....1.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1960T$X..Households...2.000....2.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1960T$X..Households...3.000....3.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1960T$X..Households...4.000....4.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1960T$X..Households...5.000....5.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1960T$X..Households...6.000....6.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1960T$X..Households...7.000....7.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1960T$X..Households...8.000....8.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1960T$X..Households...9.000....9.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1960T$Income2 <-as.numeric(x1960T$X..Households...10.000....14.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1960T$Income3 <-as.numeric(x1960T$X..Households...15.000....24.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1960T$Income4 <-as.numeric(x1960T$X..Households...25.000.and.over..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1960T<-x1960T%>%
  rename(LaborP =X..Total.Population.Age.14...In.Labor.Force,
         NoLaberP=X..Total.Population.Age.14...Not.In.Labor.Force,
         CivilianLaborP = X..Total.Population.Age.14...In.Labor.Force..In.Civilian.Labor.Force,
         EmployedWorker = X..Total.Population.Age.14...In.Labor.Force..In.Civilian.Labor.Force..Employed,
         Unemployedworker = X..Total.Population.Age.14...In.Labor.Force..In.Civilian.Labor.Force..Unemployed,
         Occupation.ProfessionalWorkerP = X..Employed.Civilians.14...Professional..technical..and.kindred.workers,
         Occupation.ManagersP = X..Employed.Civilians.14...Managers..officials..and.proprietors,
         Occupation.ClericalP = X..Employed.Civilians.14...Clerical.and.kindred.workers,
         Occupation.CraftmanP = X..Employed.Civilians.14...Craftsmen..foremen..and.kindred.workers,
         Occupation.DomesticServiceP = X..Employed.Civilians.14...Private.household.workers,
         IndustryManufactory=X..Employed.Civilians.Age.14...Machinery,
         IndustryFood=X..Employed.Civilians.Age.14...Food.and.kindred.industries,
         IndustryTextile=X..Employed.Civilians.Age.14...Textile.and.apparel,
         IndustryPublishing=X..Employed.Civilians.Age.14...Printing..publishing..and.allied,
         IndustryCommunication=X..Employed.Civilians.Age.14...Communications..utilities..sanitary.services,
         IndustrySale=X..Employed.Civilians.Age.14...Wholesale.trade,
         Hospitality=X..Employed.Civilians.Age.14...Eating.and.drinking.places,
         IndustryRetail=X..Employed.Civilians.Age.14...Other.retail,
         IndustryBusiness=X..Employed.Civilians.Age.14...Business.and.repair.services,
         Population=Total.Population,
         PopulationDensity=Population.Density.per.sq..mile)

x1960T$Household1 <- as.numeric(x1960T$X..Owner.Occupied.Units.Reporting.Value..Under..5.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1960T$X..Owner.Occupied.Units.Reporting.Value...5.000.to..9.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1960T$Household2 <- as.numeric(x1960T$X..Owner.Occupied.Units.Reporting.Value...10.000.to..14.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1960T$X..Owner.Occupied.Units.Reporting.Value...15.000.to..19.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1960T$Household3 <- as.numeric(x1960T$X..Owner.Occupied.Units.Reporting.Value...20.000.to..24.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1960T$X..Owner.Occupied.Units.Reporting.Value...25.000.or.more..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1960T<-x1960T%>%
  select(Tract, Population, PopulationDensity, Male,MaleP,Female,FemaleP,White, Black, OtherRace, WhiteP, BlackP, OtherRaceP,
         Age1, Age2, Age3, Age4, Single, Married, Separated, Widowed, Divorced,Educ1, Educ2, Educ3,LaborP, NoLaberP, CivilianLaborP, EmployedWorker, Unemployedworker, Occupation.ProfessionalWorkerP,Occupation.ManagersP,
         Occupation.ClericalP, Occupation.CraftmanP, Occupation.DomesticServiceP,Household1, Household2, Household3,IndustryManufactory,IndustryFood,IndustryTextile,
         IndustryPublishing,IndustryCommunication,IndustrySale,Hospitality,IndustryRetail,IndustryBusiness,ForeignBorn,NativeForeign,ForeignBornP,NativeForeignP,Income1,Income2,Income3,Income4)

Clean up 1970 dataset

x1970T<-x1970T%>%
  rename(Female=Total.Population..Female, Male=Total.Population..Male, FemaleP=X..Total.Population..Female, MaleP=X..Total.Population..Male, WhiteP=X..White, 
         OtherRace=Other, BlackP=X..Black, OtherRaceP=X..Other, ForeignBornP=X..Count.of.Persons..Foreign.Born,ForeignBorn=Count.of.Persons..Foreign.Born,
         NativeForeign=Count.of.Persons.of.Foreign.Stock..Native..of.Foreign.or.Mixed.Parentage.,NativeForeignP=X..Count.of.Persons.of.Foreign.Stock..Native..of.Foreign.or.Mixed.Parentage.,
         NativeSouthwestAsiaP=X..Count.of.Persons.of.Foreign.Stock..Native..of.Foreign.or.Mixed.Parentage...Southwest.Asia,
         NativeForeignJapanP=X..Count.of.Persons.of.Foreign.Stock..Native..of.Foreign.or.Mixed.Parentage...Japan,
         NativeForeignChinaP=X..Count.of.Persons.of.Foreign.Stock..Native..of.Foreign.or.Mixed.Parentage...China,
         NativeForeignOtherAsiaP=X..Count.of.Persons.of.Foreign.Stock..Native..of.Foreign.or.Mixed.Parentage...Other.Asia,
         ForeignSouthwestAsiaP=X..Count.of.Persons.of.Foreign.Stock..Foreign.Born..Southwest.Asia,
         ForeignChinaP=X..Count.of.Persons.of.Foreign.Stock..Foreign.Born..China,
         ForeignJapanP=X..Count.of.Persons.of.Foreign.Stock..Foreign.Born..Japan,
         ForeignOtherAsiaP=X..Count.of.Persons.of.Foreign.Stock..Foreign.Born..Other.Asia)

x1970T$Age1<-as.numeric(x1970T$X..Total.Population..Under.5.Years)+as.numeric(x1970T$X..Total.Population..5.to.9.Years)+as.numeric(x1970T$X..Total.Population..10.to.14.Years)+as.numeric(x1970T$X..Total.Population..15.to.17.Years)
x1970T$Age2<-as.numeric(x1970T$X..Total.Population..18.to.24.Years)+as.numeric(x1970T$X..Total.Population..25.to.34.Years)
x1970T$Age3<-as.numeric(x1970T$X..Total.Population..35.to.44.Years)+as.numeric(x1970T$X..Total.Population..45.to.54.Years)+as.numeric(x1970T$X..Total.Population..55.to.64.Years)
x1970T$Age4<-as.numeric(x1970T$X..Total.Population..65.to.74.Years)+as.numeric(x1970T$X..Total.Population..75.Years.and.over)

x1970T<-x1970T%>%
  rename(Single = X..Count.of.Persons.14.Years.Old.and.over..Never.Married,
         Married = X..Count.of.Persons.14.Years.Old.and.over..Married,
         Separated=X..Count.of.Persons.14.Years.Old.and.over..Separated,
         Widowed=X..Count.of.Persons.14.Years.Old.and.over..Widowed,
         Divorced=X..Count.of.Persons.14.Years.Old.and.over..Divorced)

x1970T$Educ1 <- as.numeric(x1970T$X..Population.25.Years.Old.and.over..No.School.Years.Completed..Includes.Nursery.and.Kindergarten.)+as.numeric(x1970T$X..Population.25.Years.Old.and.over..1.8.Years.of.Elementary.Education.or.More)-
  as.numeric(x1970T$X..Population.25.Years.Old.and.over..1.4.Years.of.High.School.Education.or.More)
x1970T$Educ2 <- as.numeric(x1970T$X..Population.25.Years.Old.and.over..1.4.Years.of.High.School.Education.or.More)
x1970T$Educ3 <- as.numeric(x1970T$X..Population.25.Years.Old.and.over..1.5.Years.of.College.Education.or.More)

x1970T$Household1 <- as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated..Less.Than..5.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...5.000....7.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...7.500....9.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1970T$Household2 <- as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...10.000....12.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...12.500....14.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...15.000....17.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...17.500....19.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1970T$Household3 <- as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...20.000....24.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...25.000....34.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...35.000....49.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Units.for.Which.Value.Is.Tabulated...50.000.or.More..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1970T$Income1 <- as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Without.Income..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..1....999.or.Loss..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..1.000....1.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..2.000....2.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..3.000....3.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..4.000....4.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..5.000....5.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..6.000....6.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..7.000....7.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..8.000....8.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..9.000....9.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1970T$Income2 <-as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..10.000....14.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1970T$Income3 <-as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..15.000....24.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1970T$Income4 <-as.numeric(x1970T$X..Count.of.Persons.14.Years.Old.and.over..Income..25.000.and.over..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1970T$IndustryManufactory<-as.numeric(x1970T$X..Employed.Population.16...Machinery..Except.Electrical)+
  as.numeric(x1970T$X..Employed.Population.16...Electrical.Machinery..Equipment..and.Supplies)
x1970T$IndustryBusiness<-as.numeric(x1970T$X..Employed.Population.16...Business.Services)+
  as.numeric(x1970T$X..Employed.Population.16...Repair.Services)
x1970T$IndustryRetail<-as.numeric(x1970T$X..Employed.Population.16...General.Merchandise.Retailing)+
  as.numeric(x1970T$X..Employed.Population.16...Motor.Vehicles.Retailing.and.Service.Stations)+
  as.numeric(x1970T$X..Employed.Population.16...Other.Retail.Trade)

x1970T<-x1970T%>%
  rename(LaborP =X..Population.16.Years.Old.and.over..in.Labor.Force,
         NoLaberP=X..Population.16.Years.Old.and.over..Not.in.Labor.Force,
         CivilianLaborP = X..Population.16.Years.Old.and.over..in.Labor.Force..in.Civilian.Labor.Force,
         EmployedWorker = X..Population.16.Years.Old.and.over..in.Labor.Force..in.Civilian.Labor.Force..Employed,
         Unemployedworker = X..Population.16.Years.Old.and.over..in.Labor.Force..in.Civilian.Labor.Force..Unemployed,
         Occupation.ProfessionalWorkerP = X..Count.of.Employed.Persons.16.Years.Old.and.over..Professional..Technical..and.Kindred.Workers,
         Occupation.ManagersP = X..Count.of.Employed.Persons.16.Years.Old.and.over..Managers.and.Administrators..Except.Farm,
         Occupation.ClericalP = X..Count.of.Employed.Persons.16.Years.Old.and.over..Clerical.and.Kindred.Workers,
         Occupation.CraftmanP = X..Count.of.Employed.Persons.16.Years.Old.and.over..Craftsmen..Foremen..and.Kindred.Workers,
         Occupation.DomesticServiceP = X..Count.of.Employed.Persons.16.Years.Old.and.over..Private.Household.Workers,
         IndustryFood=X..Employed.Population.16...Food.and.Kindred.Products,
         IndustryTextile=X..Employed.Population.16...Textile.Mill.and.Other.Fabricated.Textile.Products,
         IndustryPublishing=X..Employed.Population.16...Printing..Publishing..and.Allied.Industries,
         IndustryCommunication=X..Employed.Population.16...Communications,
         IndustrySale=X..Employed.Population.16...Wholesale.Trade,
         Hospitality=X..Employed.Population.16...Eating.and.Drinking.Places,
         Population=Total.Population,
         PopulationDensity=Population.Density..per.sq..mile.)

x1970T<-x1970T%>%
  select(Tract, Population, PopulationDensity, Male,MaleP,Female,FemaleP,White, Black, OtherRace, WhiteP, BlackP, OtherRaceP,
         Age1, Age2, Age3, Age4, Single, Married, Separated, Widowed, Divorced,Educ1, Educ2, Educ3,LaborP, NoLaberP, CivilianLaborP, EmployedWorker, Unemployedworker, Occupation.ProfessionalWorkerP,Occupation.ManagersP,
         Occupation.ClericalP, Occupation.CraftmanP, Occupation.DomesticServiceP,Household1, Household2, Household3,IndustryManufactory,IndustryFood,IndustryTextile,
         IndustryPublishing,IndustryCommunication,IndustrySale,Hospitality,IndustryRetail,IndustryBusiness,ForeignBorn,NativeForeign,ForeignBornP,NativeForeignP,Income1,Income2,Income3,Income4,
         ForeignChinaP,ForeignJapanP,ForeignSouthwestAsiaP,ForeignOtherAsiaP,NativeSouthwestAsiaP,NativeForeignChinaP,NativeForeignJapanP,NativeForeignJapanP,NativeForeignOtherAsiaP)

Clean up 1980 dataset

x1980T$OtherRace<-as.numeric(x1980T$Total.Population..Dollars.adjusted.for.inflation.to.match.value.in.2010..3)-
  as.numeric(x1980T$Total.Population..White..Dollars.adjusted.for.inflation.to.match.value.in.2010.)-as.numeric(x1980T$Total.Population..Black..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1980T$OtherRaceP<-100-as.numeric(x1980T$X..Total.Population..White..Dollars.adjusted.for.inflation.to.match.value.in.2010.)-
  as.numeric(x1980T$X..Total.Population..Black..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1980T$Asian<-as.numeric(x1980T$Asian.and.Pacific.Islander..Dollars.adjusted.for.inflation.to.match.value.in.2010.)-
  as.numeric(x1980T$Asian.and.Pacific.Islander..Hawaiian..Dollars.adjusted.for.inflation.to.match.value.in.2010.)-
  as.numeric(x1980T$Asian.and.Pacific.Islander..Guamanian..Dollars.adjusted.for.inflation.to.match.value.in.2010.)-
  as.numeric(x1980T$Asian.and.Pacific.Islander..Samoan..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1980T$AsianP<-100-as.numeric(x1980T$X..Asian.and.Pacific.Islander..Hawaiian..Dollars.adjusted.for.inflation.to.match.value.in.2010.)-
  as.numeric(x1980T$X..Asian.and.Pacific.Islander..Guamanian..Dollars.adjusted.for.inflation.to.match.value.in.2010.)-
  as.numeric(x1980T$X..Asian.and.Pacific.Islander..Samoan..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1980T<-x1980T%>%
  rename(Female=Total.Population..Female..Dollars.adjusted.for.inflation.to.match.value.in.2010., Male=Total.Population..Male..Dollars.adjusted.for.inflation.to.match.value.in.2010., 
         FemaleP=X..Total.Population..Female..Dollars.adjusted.for.inflation.to.match.value.in.2010., MaleP=X..Total.Population..Male..Dollars.adjusted.for.inflation.to.match.value.in.2010., 
         White=Total.Population..White..Dollars.adjusted.for.inflation.to.match.value.in.2010.,Black=Total.Population..Black..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         WhiteP=X..Total.Population..White..Dollars.adjusted.for.inflation.to.match.value.in.2010., 
         BlackP=X..Total.Population..Black..Dollars.adjusted.for.inflation.to.match.value.in.2010., 
         JapaneseP=X..Asian.and.Pacific.Islander..Japanese..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         ChineseP=X..Asian.and.Pacific.Islander..Chinese..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         FilipinoP=X..Asian.and.Pacific.Islander..Filipino..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         KoreanP=X..Asian.and.Pacific.Islander..Korean..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         AsianIndianP=X..Asian.and.Pacific.Islander..Asian.Indian..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         VietnameseP=X..Asian.and.Pacific.Islander..Vietnamese..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         HouseholdWhiteP=X..Households..White..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         HouseholdBlackP=X..Households..Black..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         HouseholdAsianandPacificP=X..Households..Asian.and.Pacific.Islander..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         HouseholdOthersP=X..Households..Other..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         AsianBelowPovertyLevelP=X..Asian.and.Pacific.Islander.Population.for.Whom.Poverty.Status.is.Determined..Below.Poverty.Level..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         AsianAbovePovertyLevelP=X..Asian.and.Pacific.Islander.Population.for.Whom.Poverty.Status.is.Determined..Above.Poverty.Level..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         ForeignBornP=X..Total.Population..Foreign.Born..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         ForeignBorn=Total.Population..Foreign.Born..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Population=Total.Population,
         PopulationDensity=Population.Density..per.sq..mile.)

x1980T$Age1<-as.numeric(x1980T$X..Total.Population..Under.5.Year..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1980T$X..Total.Population..5.to.9.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1980T$X..Total.Population..10.to.14.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1980T$X..Total.Population..15.to.17.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1980T$Age2<-as.numeric(x1980T$X..Total.Population..18.to.24.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1980T$X..Total.Population..25.to.34.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1980T$Age3<-as.numeric(x1980T$X..Total.Population..35.to.44.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1980T$X..Total.Population..45.to.54.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1980T$X..Total.Population..55.to.64.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1980T$Age4<-as.numeric(x1980T$X..Total.Population..65.to.74.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1980T$X..Total.Population..75.to.84.Years..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+as.numeric(x1980T$X..Total.Population..85.Years.and.over..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1980T<-x1980T%>%
  rename(Single = X..Persons.15.Years.and.Over..Single..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Married = X..Persons.15.Years.and.Over..Now.Married..Except.Separated..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Separated=X..Persons.15.Years.and.Over..Separated..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Widowed=X..Persons.15.Years.and.Over..Widowed..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Divorced=X..Persons.15.Years.and.Over..Divorced..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1980T<-x1980T%>%
  rename(Educ1=X..Persons.25.Years.Old.and.Over..Elementary..0.to.8.Years..or.less..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Educ2=X..Persons.25.Years.Old.and.Over..High.School.or.more..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Educ3=X..Persons.25.Years.Old.and.Over..College.or.more..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1980T$Income1 <- 100*(as.numeric(x1980T$Households..Less.than..2.500..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1980T$Households...2.500.to..4.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1980T$Households...5.000.to..7.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1980T$Households...7.500.to..9.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.))/as.numeric(x1980T$Households..Dollars.adjusted.for.inflation.to.match.value.in.2010..1)
x1980T$Income2 <- 100*(as.numeric(x1980T$Households...10.000.to..12.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...12.500.to..14.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.))/as.numeric(x1980T$Households..Dollars.adjusted.for.inflation.to.match.value.in.2010..1)
x1980T$Income3 <- 100*(as.numeric(x1980T$Households...15.000.to..17.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...17.500.to..19.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...20.000.to..22.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...22.500.to..24.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.))/as.numeric(x1980T$Households..Dollars.adjusted.for.inflation.to.match.value.in.2010..1)
x1980T$Income4 <- 100*(as.numeric(x1980T$Households...25.000.to..27.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...27.500.to..29.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...30.000.to..34.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...35.000.to..39.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...40.000.to..49.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...50.000.to..74.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1980T$Households...75.000.or.more..Dollars.adjusted.for.inflation.to.match.value.in.2010.))/as.numeric(x1980T$Households..Dollars.adjusted.for.inflation.to.match.value.in.2010..1)

x1980T<-x1980T%>%
  rename(NoLaberP=X..Persons.16.Years.and.Over..Not.in.Labor.Force..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         CivilianLaborP = X..Persons.16.Years.and.Over..Civilian.Labor.Force..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         EmployedWorker = X..Persons.16.Years.and.Over..Civilian.Labor.Force..Employed..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Unemployedworker = X..Persons.16.Years.and.Over..Civilian.Labor.Force..Unemployed..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Occupation.ProfessionalWorkerP = X..Employed.Persons.16.Years.and.Over..Professional.and.Related.Services..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Occupation.ManagersP = X..Employed.Persons.16.Years.and.Over..Managerial.and.Professional.Specialty.Occupations..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Occupation.ClericalP = X..Employed.Persons.16.Years.and.Over..Finance..Insurance..and.Real.Estate..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Occupation.CraftmanP = X..Employed.Persons.16.Years.and.Over..Precision.Production..Craft..and.Repair.Occupations..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Occupation.DomesticServiceP = X..Employed.Persons.16.Years.and.Over..Service.Occupations..Private.Household.Occupations..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         IndustryManufactory=X..Employed.Persons.16.Years.and.Over..Manufacturing..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         IndustryCommunication=X..Employed.Persons.16.Years.and.Over..Communications.and.Other.Public.Utilities..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         IndustrySale=X..Employed.Persons.16.Years.and.Over..Wholesale.Trade..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         Hospitality=X..Employed.Persons.16.Years.and.Over..Personal..Entertainment..and.Recreation.Services..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         IndustryRetail=X..Employed.Persons.16.Years.and.Over..Retail.Trade..Dollars.adjusted.for.inflation.to.match.value.in.2010.,
         IndustryBusiness=X..Employed.Persons.16.Years.and.Over..Business.and.Repair.Services..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1980T$LaborP <-100-as.numeric(x1980T$NoLaberP)

x1980T<-x1980T%>%
  select(Tract, Population, PopulationDensity, Male,MaleP,Female,FemaleP,White, Black, OtherRace, WhiteP, BlackP, OtherRaceP,
         JapaneseP,ChineseP, FilipinoP,KoreanP, AsianIndianP, VietnameseP, HouseholdWhiteP,HouseholdBlackP,HouseholdAsianandPacificP,HouseholdOthersP,
         AsianBelowPovertyLevelP,AsianAbovePovertyLevelP,Asian, AsianP, 
         Age1, Age2, Age3, Age4, Single, Married, Separated, Widowed, Divorced,Educ1, Educ2, Educ3,LaborP, NoLaberP, CivilianLaborP, EmployedWorker, Unemployedworker, Occupation.ProfessionalWorkerP,Occupation.ManagersP,
         Occupation.ClericalP, Occupation.CraftmanP, Occupation.DomesticServiceP,IndustryManufactory,
         IndustryCommunication,IndustrySale,Hospitality,IndustryRetail,IndustryBusiness,ForeignBorn,ForeignBornP,Income1,Income2,Income3,Income4)

Clean up 1990 dataset

x1990T$AsianP<-  as.numeric(x1990T$Asian)/as.numeric(x1990T$Total.Population)
x1990T<-x1990T%>%
  rename(Female=Total.Population..Female, Male=Total.Population..Male, 
         FemaleP=X..Total.Population..Female, MaleP=X..Total.Population..Male, 
         White=Persons..White,Black=Persons..Black,
         WhiteP=X..Persons..White, BlackP=X..Persons..Black, 
         JapaneseP=X..Asian..Japanese,
         ChineseP=X..Asian..Chinese,
         FilipinoP=X..Asian..Filipino,
         KoreanP=X..Asian..Korean,
         AsianIndianP=X..Asian..Asian.Indian,
         VietnameseP=X..Asian..Vietnamese,
         HouseholdWhiteP=X..Households.With.a.White.Householder,
         HouseholdBlackP=X..Households.With.a.Black.Householder,
         HouseholdAsianandPacificP=X..Households.With.a.Asian.or.Pacific.Islander.Householder,
         AsianBelowPovertyLevelP=X..Asian.or.Pacific.Islander.Persons.for.whom.poverty.status.is.determined..Income.in.1989.below.poverty.level,
         AsianAbovePovertyLevelP=X..Asian.or.Pacific.Islander.Persons.for.whom.poverty.status.is.determined..Income.in.1989.above.poverty.level,
         ForeignBornP=X..Total.Population..Foreign.born,
         ForeignBorn=Total.Population..Foreign.born,
         ForeignBornBefore1960=X..Foreign.born.persons..Before.1960,
         ForeignBorn1960to1969=X..Foreign.born.persons..1960.to.1969,
         ForeignBorn1970to1979=X..Foreign.born.persons..1970.to.1979,
         ForeignBornafter1980=X..Foreign.born.persons..1980.to.1990)
x1990T$OtherRace<-as.numeric(x1990T$Persons.1)-as.numeric(x1990T$White)-as.numeric(x1990T$Black)
x1990T$OtherRaceP<-100-as.numeric(x1990T$WhiteP)-as.numeric(x1990T$BlackP)

x1990T$Age1<-as.numeric(x1990T$X..Persons..Under.5.year)+as.numeric(x1990T$X..Persons..5.to.9.years)+as.numeric(x1990T$X..Persons..10.to.14.years)+as.numeric(x1990T$X..Persons..15.to.17.years)
x1990T$Age2<-as.numeric(x1990T$X..Persons..18.to.24.years)+as.numeric(x1990T$X..Persons..25.to.34.years)
x1990T$Age3<-as.numeric(x1990T$X..Persons..35.to.44.years)+as.numeric(x1990T$X..Persons..45.to.54.years)+as.numeric(x1990T$X..Persons..55.to.64.years)
x1990T$Age4<-as.numeric(x1990T$X..Persons..65.to.74.years)+as.numeric(x1990T$X..Persons..75.to.84.years)+as.numeric(x1990T$X..Persons..85.years.and.over)


x1990T<-x1990T%>%
  rename(Single = X..Persons.15.years.and.over..Never.married,
         Married = X..Persons.15.years.and.over..Now.married..except.separated,
         Separated=X..Persons.15.years.and.over..Separated,
         Widowed=X..Persons.15.years.and.over..Widowed,
         Divorced=X..Persons.15.years.and.over..Divorced)

x1990T<-x1990T%>%
  rename(Educ1=X..Persons.25.years.and.over..Less.Than.High.School,
         Educ2=X..Persons.25.years.and.over..High.school.graduate.or.more..includes.equivalency.,
         Educ3=X..Persons.25.years.and.over..Some.college.or.more)

x1990T$Income1 <- as.numeric(x1990T$X..Households..Less.than..5.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...5.000.to..9.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1990T$Income2 <- as.numeric(x1990T$X..Households...12.500.to..14.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1990T$Income3 <- as.numeric(x1990T$X..Households...15.000.to..17.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...17.500.to..19.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...20.000.to..22.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...22.500.to..24.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1990T$Income4 <- as.numeric(x1990T$X..Households...25.000.to..27.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...27.500.to..29.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...30.000.to..32.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...32.500.to..34.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...35.000.to..37.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...37.500.to..39.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...40.000.to..42.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...42.500.to..44.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
                         as.numeric(x1990T$X..Households...45.000.to..47.499..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Households...47.500.to..49.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Households...50.000.to..54.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Households...55.000.to..59.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Households...60.000.to..74.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Households...75.000.to..99.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Households...100.000.to..124.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Households...125.000.to..149.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Households...150.000.or.more..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1990T$Household1 <- as.numeric(x1990T$X..Specified.owner.occupied.housing.units..Less.than..20.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)/2
x1990T$Household2<- as.numeric(x1990T$X..Specified.owner.occupied.housing.units..Less.than..20.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)/2
x1990T$Household3 <- as.numeric(x1990T$X..Specified.owner.occupied.housing.units...20.000.to..49.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Specified.owner.occupied.housing.units...50.000.to..99.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Specified.owner.occupied.housing.units...100.000.to..149.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x1990T$Household4<-as.numeric(x1990T$X..Specified.owner.occupied.housing.units...150.000.to..299.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Specified.owner.occupied.housing.units...300.000.to..499.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x1990T$X..Specified.owner.occupied.housing.units...500.000.or.more..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x1990T<-x1990T%>%
  rename(LaborP =X..Population.16.years.and.over..In.labor.force,
         NoLaberP=X..Population.16.years.and.over..Not.in.labor.force,
         CivilianLaborP = X..Population.16.years.and.over..In.labor.force..Civilian,
         EmployedWorker = X..Population.16.years.and.over..In.labor.force..Civilian..Employed,
         Unemployedworker = X..Population.16.years.and.over..In.labor.force..Civilian..Unemployed,
         Occupation.ProfessionalWorkerP = X..Employed.persons.16.years.and.over..Professional.and.related.services,
         Occupation.ManagersP = X..Employed.persons.16.years.and.over..Managerial.and.professional.specialty.occupations,
         Occupation.ClericalP = X..Employed.persons.16.years.and.over..Finance..insurance..and.real.estate,
         Occupation.CraftmanP = X..Employed.persons.16.years.and.over..Precision.production..craft..and.repair.occupations,
         Occupation.DomesticServiceP = X..Employed.persons.16.years.and.over..Service.occupations..Private.household.occupations,
         IndustryManufactory=X..Employed.persons.16.years.and.over..Manufacturing..nondurable.goods,
         IndustryCommunication=X..Employed.persons.16.years.and.over..Communications.and.other.public.utilities,
         IndustrySale=X..Employed.persons.16.years.and.over..Wholesale.trade,
         Hospitality=X..Employed.persons.16.years.and.over..Entertainment.and.recreation.services,
         IndustryRetail=X..Employed.persons.16.years.and.over..Retail.trade,
         IndustryBusiness=X..Employed.persons.16.years.and.over..Business.and.repair.services,
         Population=Total.Population,
         PopulationDensity=Population.Density..per.sq..mile.)

x1990T<-x1990T%>%
  select(Tract, Population, PopulationDensity, Male,MaleP,Female,FemaleP,White, Black, OtherRace, WhiteP, BlackP, OtherRaceP,
         JapaneseP,ChineseP, FilipinoP,KoreanP, AsianIndianP, VietnameseP, HouseholdWhiteP,HouseholdBlackP,HouseholdAsianandPacificP,
         AsianBelowPovertyLevelP,AsianAbovePovertyLevelP,Asian, AsianP, 
         Age1, Age2, Age3, Age4, Single, Married, Separated, Widowed, Divorced,Educ1, Educ2, Educ3,LaborP, NoLaberP, CivilianLaborP, EmployedWorker, Unemployedworker, Occupation.ProfessionalWorkerP,Occupation.ManagersP,
         Occupation.ClericalP, Occupation.CraftmanP, Occupation.DomesticServiceP,IndustryManufactory,
         IndustryCommunication,IndustrySale,Hospitality,IndustryRetail,IndustryBusiness,ForeignBorn,ForeignBornP,Income1,Income2,Income3,Income4, Household1, Household2, Household3,Household4,
         ForeignBornafter1980,ForeignBorn1970to1979,ForeignBorn1960to1969,ForeignBornBefore1960)

Clean up 2000 dataset

x2000T$OtherRace<-as.numeric(x2000T$Total.Population.4)-as.numeric(x2000T$White.Alone)-as.numeric(x2000T$Black.or.African.American.Alone)
x2000T$OtherRaceP<-100-as.numeric(x2000T$X..White.Alone)-as.numeric(x2000T$X..Black.or.African.American.Alone)
x2000T$HouseholdAsianandPacificP<-as.numeric(x2000T$X..Households..with.a.Householder.Who.is.Asian.Alone)+as.numeric(x2000T$X..Households..with.a.Householder.Who.is.Native.Hawaiian.and.Other.Pacific.Islander.Alone)
x2000T<-x2000T%>%
  rename(FemaleP=X..Female, MaleP=X..Male, 
         White=White.Alone,Black=Black.or.African.American.Alone,
         WhiteP=X..White.Alone, BlackP=X..Black.or.African.American.Alone, 
         Asian=Asian.Alone, AsianP=X..Asian.Alone,
         JapaneseP=X..Japanese,
         ChineseP=X..Chinese..Except.Taiwanese,
         FilipinoP=X..Filipino,
         KoreanP=X..Korean,
         TaiwaneseP=X..Taiwanese,
         AsianIndianP=X..Asian.Indian,
         VietnameseP=X..Vietnamese,
         HouseholdWhiteP=X..Households..with.a.Householder.Who.is.White.Alone,
         HouseholdBlackP=X..Households..with.a.Householder.Who.is.Black.or.African.American.Alone,
         AsianBelowPovertyLevelP=X..Asian.Population.for.Whom.Poverty.Status.is.Determined..Income.in.1999.Below.Poverty.Level,
         AsianAbovePovertyLevelP=X..Asian.Population.for.Whom.Poverty.Status.is.Determined..Income.in.1999.at.or.above.Poverty.Level,
         ForeignBornP=X..Foreign.Born,
         ForeignBorn=Foreign.Born,
         ForeignBornBefore1960=X..Year.of.Entry.for.the.Foreign.Born.Population..Before.1965,
         ForeignBorn1960to1969=X..Year.of.Entry.for.the.Foreign.Born.Population..1965.to.1969)
x2000T$ForeignBorn1970to1979<-as.numeric(x2000T$X..Year.of.Entry.for.the.Foreign.Born.Population..1970.to.1974)+as.numeric(x2000T$X..Year.of.Entry.for.the.Foreign.Born.Population..1975.to.1979)
x2000T$ForeignBornafter1980<-as.numeric(x2000T$X..Year.of.Entry.for.the.Foreign.Born.Population..1980.to.1984)+as.numeric(x2000T$X..Year.of.Entry.for.the.Foreign.Born.Population..1985.to.1989)+as.numeric(x2000T$X..Year.of.Entry.for.the.Foreign.Born.Population..1990.to.1994)+
  as.numeric(x2000T$X..Year.of.Entry.for.the.Foreign.Born.Population..1995.to.March.2000)

x2000T$Age1<-as.numeric(x2000T$X..Under.5.Years)+as.numeric(x2000T$X..5.to.9.Years)+as.numeric(x2000T$X..10.to.14.Years)+as.numeric(x2000T$X..15.to.17.Years)
x2000T$Age2<-as.numeric(x2000T$X..18.to.24.Years)+as.numeric(x2000T$X..25.to.34.Years)
x2000T$Age3<-as.numeric(x2000T$X..35.to.44.Years)+as.numeric(x2000T$X..45.to.54.Years)+as.numeric(x2000T$X..55.to.64.Years)
x2000T$Age4<-as.numeric(x2000T$X..65.to.74.Years)+as.numeric(x2000T$X..75.to.84.Years)+as.numeric(x2000T$X..85.Years.and.over)

x2000T<-x2000T%>%
  rename(Single = X..Population.15.Years.and.Over..Never.Married,
         Married = X..Population.15.Years.and.Over..Now.Married..not.Including.Separated.,
         Separated=X..Population.15.Years.and.Over..Separated,
         Widowed=X..Population.15.Years.and.Over..Widowed,
         Divorced=X..Population.15.Years.and.Over..Divorced)

x2000T<-x2000T%>%
  rename(Educ1=X..Population.25.Years.and.Over..Less.than.High.School,
         Educ2=X..Population.25.Years.and.Over..High.School.Graduate.or.More..Includes.Equivalency.,
         Educ3=X..Population.25.Years.and.Over..Some.College.or.more)

x2000T$Income1 <- as.numeric(x2000T$X..Household.Income..Less.than..10.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x2000T$Income2 <- as.numeric(x2000T$X..Household.Income...10.000.to..14.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x2000T$Income3 <- as.numeric(x2000T$X..Household.Income...15.000.to..19.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...15.000.to..19.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...20.000.to..24.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x2000T$Income4 <- as.numeric(x2000T$X..Household.Income...25.000.to..29.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...30.000.to..34.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...35.000.to..39.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...40.000.to..44.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...50.000.to..59.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...60.000.to..74.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...75.000.to..99.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...100.000.to..124.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...125.000.to..149.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...150.000.to..199.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Household.Income...200.000.or.more..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x2000T$Household1 <- as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units..Less.than..20.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)/2
x2000T$Household2<- as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units..Less.than..20.000..Dollars.adjusted.for.inflation.to.match.value.in.2010.)/2
x2000T$Household3 <- as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units...20.000.to..49.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units...50.000.to..99.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units...100.000.to..149.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)
x2000T$Household4<-as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units...150.000.to..299.999..Dollars.adjusted.for.inflation.to.match.value.in.2010)+
  as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units...300.000.to..499.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units...500.000.to..749.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units...750.000.to..999.999..Dollars.adjusted.for.inflation.to.match.value.in.2010.)+
  as.numeric(x2000T$X..Value.for.All.Owner.Occupied.Housing.Units...1.000.000.or.more..Dollars.adjusted.for.inflation.to.match.value.in.2010.)

x2000T<-x2000T%>%
  rename(LaborP =X..Population.16.Years.and.Over..In.Labor.Force,
         NoLaberP=X..Population.16.Years.and.Over..Not.in.Labor.Force,
         CivilianLaborP = X..Population.16.Years.and.Over..In.Labor.Force..Civilian,
         EmployedWorker = X..Population.16.Years.and.Over..In.Labor.Force..Civilian..Employed,
         Unemployedworker = X..Population.16.Years.and.Over..In.Labor.Force..Civilian..Unemployed,
         EmployedAsianLaborP=X..Asian.16.Years.Old.in.Civilian.Labor.Force..Employed,
         UnemployedAsianLaborP=X..Asian.16.Years.Old.in.Civilian.Labor.Force..Unemployed,
         Occupation.ProfessionalWorkerP = X..Employed.Civilian.Population.16.Years.and.Over..Professional.and.Related.Occupations,
         Occupation.ManagersP = X..Employed.Civilian.Population.16.Years.and.Over..Management..Business..and.Financial.Operations.Occupations,
         Occupation.ClericalP = X..Employed.Civilian.Population.16.Years.and.Over..Finance..Insurance..Real.Estate.and.Rental.and.Leasing,
         Occupation.CraftmanP = X..Employed.Civilian.Population.16.Years.and.Over..Production.Occupations,
         Occupation.DomesticServiceP =X..Employed.Civilian.Population.16.Years.and.Over..Personal.Care.and.Service.Occupations,
         IndustryManufactory=X..Employed.Civilian.Population.16.Years.and.Over..Manufacturing,
         IndustrySale=X..Employed.Civilian.Population.16.Years.and.Over..Wholesale.Trade,
         IndustryPublishing=X..Employed.Civilian.Population.16.Years.and.Over..Information,
         IndustryFood=X..Employed.Civilian.Population.16.Years.and.Over..Food.Preparation.and.Serving.Related.Occupations,
         Hospitality=X..Employed.Civilian.Population.16.Years.and.Over..Arts..Entertainment..Recreation..Accommodation.and.Food.Services,
         IndustryRetail=X..Employed.Civilian.Population.16.Years.and.Over..Retail.Trade,
         IndustryBusiness=X..Employed.Civilian.Population.16.Years.and.Over..Sales.and.Related.Occupations,
         Population=Total.Population,
         PopulationDensity=Population.Density..per.sq..mile.)

x2000T<-x2000T%>%
  select(Tract, Population, PopulationDensity, Male,MaleP,Female,FemaleP,White, Black, OtherRace, WhiteP, BlackP, OtherRaceP,
         JapaneseP,ChineseP, FilipinoP,KoreanP, AsianIndianP, VietnameseP, HouseholdWhiteP,HouseholdBlackP,HouseholdAsianandPacificP,
         AsianBelowPovertyLevelP,AsianAbovePovertyLevelP,Asian, AsianP, 
         Age1, Age2, Age3, Age4, Single, Married, Separated, Widowed, Divorced,Educ1, Educ2, Educ3,LaborP, NoLaberP, CivilianLaborP, EmployedWorker, Unemployedworker, Occupation.ProfessionalWorkerP,Occupation.ManagersP,
         Occupation.ClericalP, Occupation.CraftmanP, Occupation.DomesticServiceP,IndustryManufactory,
         IndustrySale,Hospitality,IndustryRetail,IndustryBusiness,ForeignBorn,ForeignBornP,Income1,Income2,Income3,Income4, Household1, Household2, Household3,Household4,
         ForeignBornafter1980,ForeignBorn1970to1979,ForeignBorn1960to1969,ForeignBornBefore1960,TaiwaneseP,EmployedAsianLaborP,UnemployedAsianLaborP,IndustryPublishing,IndustryFood)

Merge dataset

x1940T$Year <- "1940"
x1950T$Year <- "1950"
x1960T$Year <- "1960"
x1970T$Year <- "1970"
x1980T$Year <- "1980"
x1990T$Year <- "1990"
x2000T$Year <- "2000"

col_names1940 <- names(x1940T)
col_names1950 <- names(x1950T)
col_names1960 <- names(x1960T)
col_names1970 <- names(x1970T)
col_names1980 <- names(x1980T)
col_names1990 <- names(x1990T)
col_names2000 <- names(x2000T)
x1940T[,col_names1940] <- lapply(x1940T[,col_names1940] , factor)
x1950T[,col_names1950] <- lapply(x1950T[,col_names1950] , factor)
x1960T[,col_names1960] <- lapply(x1960T[,col_names1960] , factor)
x1970T[,col_names1970] <- lapply(x1970T[,col_names1970] , factor)
x1980T[,col_names1980] <- lapply(x1980T[,col_names1980] , factor)
x1990T[,col_names1990] <- lapply(x1990T[,col_names1990] , factor)
x2000T[,col_names2000] <- lapply(x2000T[,col_names2000] , factor)
bind_rows(x1940T,x1950T,x1960T,x1970T,x1980T,x1990T,x2000T)->all_data
all_data<- all_data%>%
  mutate(Tract=gsub("Census Tract ","",Tract))%>%
  mutate(Tract=gsub("0000","",Tract))%>%
  mutate(Tract=gsub("116.0","116",Tract))%>%
  mutate(Tract=gsub("117.0","117",Tract))%>%
  mutate(Tract=gsub("118.0","118",Tract))%>%
  mutate(Tract=gsub("119.0","119",Tract))%>%
  mutate(Tract=gsub("478.0","478",Tract))%>%
  mutate(Tract=gsub("479.0","479",Tract))%>%
  mutate(Tract=gsub("480.0","480",Tract))%>%
  mutate(Tract=gsub("481.0","481",Tract))

#Import dataset that have variable theaters, years and tracts. 
Final<- read_csv("~/Desktop/theater/allyears/Final - Table.csv", col_types = cols(Year = col_integer(),TotalAssessedChange = col_number()))

#Merge all datasets with the matching ids Tract and Year
df = merge(x = Final, y = all_data, by = c("Tract", "Year"),all.x = TRUE)

df[, c(8:92)]<-apply(df[, c(8:92)],2,function(x) as.numeric(as.character(x)))

Recode some variables

# Make a new variable that categorize Population into six levels

df$PopLevel[df$Population >5000]<- 6
df$PopLevel[df$Population > 4000 & 5000>=df$Population]<-5
df$PopLevel[df$Population > 3000 & 4000>=df$Population]<-4
df$PopLevel[df$Population > 2000 & 3000>=df$Population]<-3
df$PopLevel[df$Population > 1000 & 2000>=df$Population]<-2
df$PopLevel[1000>=df$Population]<-1

#Make a new variable that show whether the theater is in downtown or in suburb
df$Location[df$Theaters=="Kim Sing Theatre"|df$Theaters=="King Hing Theatre"|df$Theaters=="Pagoda Cinema"]<-"Downtown"
df$Location[df$Theaters=="Monterey Theatre"|df$Theaters=="Bard’s Garfield Egyptian Theatre\n"|df$Theaters=="Kuo Hwa 2 Cinema"|df$Theaters=="Kuo Hwa Theatre"]<-"Suburb"
DataDT<- split(df,df$Location)$ Downtown
DataSB<- split(df,df$Location)$ Suburb

#Turn into some variables into factors
df$`Main Address`[df$`Main Address`== "\bYes" | df$`Main Address`=="Yes"] <-"Yes"
df$`Main Address`<-as.factor(as.character(df$`Main Address`))
df$status<-factor(df$Opening,labels=c("Closing Entire Decade","Opening Several Years", "Opening Entire Decade"))
df$Address<-factor(df$`Main Address`,labels=c("Theaters Neighborhood","Theater Main Location"))
df$Tract<-as.factor(df$Tract)
df$Theaters<-as.factor(df$Theaters)
df$Year<-as.factor(df$Year)
df$PopLevel<-as.factor(df$PopLevel)
df$Location<-as.factor(df$Location)
df$AgeYoung<-df$Age1+df$Age2
df$LowEduc<-df$Educ1

df<-df%>%
  select(Tract, Year,Theaters, AgeYoung, Opening, "Financial Health", TotalAssessedChange, "Main Address", Location,PopLevel,PopulationDensity, MaleP,WhiteP,BlackP,OtherRaceP, ForeignBornP, LaborP, NoLaberP, CivilianLaborP, EmployedWorker, Unemployedworker, Occupation.ProfessionalWorkerP,Occupation.ManagersP, Occupation.ClericalP, Occupation.CraftmanP,Occupation.DomesticServiceP, Household1,Household2,Household3, Income1,Income2,Income3,Income4,Single,Married, Separated, Widowed, Divorced, IndustryManufactory,IndustryPublishing, Hospitality , IndustryRetail, IndustryBusiness,JapaneseP, ChineseP, FilipinoP, KoreanP, AsianIndianP, VietnameseP,HouseholdAsianandPacificP, AsianBelowPovertyLevelP, AsianP,  ForeignBornafter1980,ForeignBorn1970to1979, ForeignBorn1960to1969,ForeignBornBefore1960,LowEduc)

df$Opening <- factor(df$Opening)
df$`Financial Health` <- factor(df$`Financial Health`)
df$Address<-factor(df$`Main Address`,labels=c("Theaters Neighborhood","Theater Main Location"))
df$status<-factor(df$Opening,labels=c("Closing Entire Decade","Opening Several Years", "Opening Entire Decade"))

Dataset basic summary

library(qacEDA)
contents(df)

## 
## The data frame df has 147 observations and 59 variables.
## 
## Overall
##  pos varname                        type    n_unique n_miss pct_miss
##   1  Tract                          factor   42        0    0%      
##   2  Year                           factor    7        0    0%      
##   3  Theaters                       factor    7        0    0%      
##   4  AgeYoung                       numeric  88        0    0%      
##   5  Opening                        factor    3        0    0%      
##   6  Financial Health               factor    5        0    0%      
##   7  TotalAssessedChange            numeric  40       87    59%     
##   8  Main Address                   factor    2        0    0%      
##   9  Location                       factor    2        0    0%      
##  10  PopLevel                       factor    6        0    0%      
##  11  PopulationDensity              numeric 102        0    0%      
##  12  MaleP                          numeric  80        0    0%      
##  13  WhiteP                         numeric  86        0    0%      
##  14  BlackP                         numeric  71        0    0%      
##  15  OtherRaceP                     numeric  86        0    0%      
##  16  ForeignBornP                   numeric  87        0    0%      
##  17  LaborP                         numeric  85        0    0%      
##  18  NoLaberP                       numeric  85        0    0%      
##  19  CivilianLaborP                 numeric  86        0    0%      
##  20  EmployedWorker                 numeric  86        0    0%      
##  21  Unemployedworker               numeric  80        0    0%      
##  22  Occupation.ProfessionalWorkerP numeric  83        0    0%      
##  23  Occupation.ManagersP           numeric  84        0    0%      
##  24  Occupation.ClericalP           numeric  77        0    0%      
##  25  Occupation.CraftmanP           numeric  81        0    0%      
##  26  Occupation.DomesticServiceP    numeric  72        0    0%      
##  27  Household1                     numeric  41       46    31%     
##  28  Household2                     numeric  42       46    31%     
##  29  Household3                     numeric  50       46    31%     
##  30  Income1                        numeric  75       21    14%     
##  31  Income2                        numeric  73       21    14%     
##  32  Income3                        numeric  64       21    14%     
##  33  Income4                        numeric  69       21    14%     
##  34  Single                         numeric  74       21    14%     
##  35  Married                        numeric  77       21    14%     
##  36  Separated                      numeric  58       42    29%     
##  37  Widowed                        numeric  64       42    29%     
##  38  Divorced                       numeric  58       42    29%     
##  39  IndustryManufactory            numeric  62       42    29%     
##  40  IndustryPublishing             numeric  34       84    57%     
##  41  Hospitality                    numeric  61       42    29%     
##  42  IndustryRetail                 numeric  61       42    29%     
##  43  IndustryBusiness               numeric  57       42    29%     
##  44  JapaneseP                      numeric  37       84    57%     
##  45  ChineseP                       numeric  42       84    57%     
##  46  FilipinoP                      numeric  40       84    57%     
##  47  KoreanP                        numeric  37       84    57%     
##  48  AsianIndianP                   numeric  34       84    57%     
##  49  VietnameseP                    numeric  40       84    57%     
##  50  HouseholdAsianandPacificP      numeric  44       84    57%     
##  51  AsianBelowPovertyLevelP        numeric  41       85    58%     
##  52  AsianP                         numeric  44       84    57%     
##  53  ForeignBornafter1980           numeric  29      105    71%     
##  54  ForeignBorn1970to1979          numeric  30      105    71%     
##  55  ForeignBorn1960to1969          numeric  28      105    71%     
##  56  ForeignBornBefore1960          numeric  28      105    71%     
##  57  LowEduc                        numeric  86        0    0%      
##  58  Address                        factor    2        0    0%      
##  59  status                         factor    3        0    0%      
## 
## Numeric Variables
##                                  n    mean      sd  skew    min     p25  median
## AgeYoung                       147   53.35    9.37  0.41  25.47   47.86   52.41
## TotalAssessedChange             60   64.87  114.74  2.06 -61.06   17.43   19.51
## PopulationDensity              147 9879.44 6301.17  1.37  36.21 5734.39 8730.02
## MaleP                          147   51.63    9.09  1.69  21.43   47.03   48.22
## WhiteP                         147   60.38   35.39 -0.31   4.66   26.28   68.88
## BlackP                         147    5.42   10.86  2.94   0.00    0.15    0.97
## OtherRaceP                     147   34.20   35.27  0.56   0.00    1.66   21.75
## ForeignBornP                   147   40.95   24.50  0.28   0.00   19.59   36.22
## LaborP                         147   51.14   11.02 -1.84   0.17   48.93   52.69
## NoLaberP                       147   48.86   11.02  1.84  35.19   41.96   47.31
## CivilianLaborP                 147   50.73   11.23 -1.73   0.17   47.21   52.66
## EmployedWorker                 147   46.46   11.53 -1.22   0.17   40.45   49.53
## Unemployedworker               147    4.26    2.95  1.65   0.00    2.38    3.58
## Occupation.ProfessionalWorkerP 147   10.87    6.99  0.20   0.00    6.31    9.20
## Occupation.ManagersP           147   10.36    6.94  0.95   0.00    6.36    7.53
## Occupation.ClericalP           147   11.44    8.90  0.96   0.00    5.63    9.11
## Occupation.CraftmanP           147   11.43    6.65  1.83   0.00    8.55    9.88
## Occupation.DomesticServiceP    147    1.70    1.70  1.30   0.00    0.50    1.10
## Household1                     101    0.99    1.55  1.86   0.00    0.00    0.24
## Household2                     101    1.94    3.84  3.01   0.00    0.00    0.32
## Household3                     101   62.39   41.85 -0.40   0.00   11.55   94.71
## Income1                        126   32.85   29.79  1.13   0.00   11.04   25.81
## Income2                        126    7.21    4.77  0.33   0.00    3.31    7.04
## Income3                        126   14.47    9.14  0.01   0.00    9.91   14.57
## Income4                        126   43.50   26.44 -0.32   0.00   20.75   42.44
## Single                         126   29.17    8.90  0.40  14.14   23.34   28.35
## Married                        126   52.39   10.84  0.02  29.31   47.57   50.91
## Separated                      105    3.54    3.27  2.24   0.59    1.83    2.54
## Widowed                        105    9.17    2.94  0.00   2.98    6.84    9.24
## Divorced                       105    6.46    2.86  0.80   1.90    4.71    6.49
## IndustryManufactory            105   14.74   10.98  0.53   0.00    5.79    9.93
## IndustryPublishing              63    2.20    1.51  0.85   0.00    1.60    2.08
## Hospitality                    105    7.53    9.12  1.52   0.00    1.91    3.21
## IndustryRetail                 105   15.48    9.26  1.30   0.00   10.00   12.40
## IndustryBusiness               105    5.78    4.01  1.37   0.00    3.22    4.64
## JapaneseP                       63    6.33    8.67  1.42   0.36    0.53    2.11
## ChineseP                        63   68.20   19.79 -1.07   7.27   58.04   73.16
## FilipinoP                       63    4.58    5.36  1.34   0.07    0.42    2.40
## KoreanP                         63    2.84    3.95  1.82   0.08    0.30    0.89
## AsianIndianP                    63    0.88    1.16  2.16   0.00    0.27    0.38
## VietnameseP                     63   12.32    5.66  1.07   4.29    7.48   11.67
## HouseholdAsianandPacificP       63   51.70   29.73 -0.28   0.00   22.23   55.43
## AsianBelowPovertyLevelP         62   26.68   11.30  2.13   8.00   21.60   23.21
## AsianP                          63   54.83   42.86 -0.29   0.03    0.82   67.78
## ForeignBornafter1980            42   69.63   10.86 -0.33  40.06   62.81   67.28
## ForeignBorn1970to1979           42   20.13    7.74  0.73   8.75   13.03   21.20
## ForeignBorn1960to1969           42    5.09    2.67  0.25   0.00    3.37    4.56
## ForeignBornBefore1960           42    5.15    1.98  0.38   1.30    4.17    5.14
## LowEduc                        147   41.91   19.35  0.43   0.00   26.69   37.23
##                                     p75      max
## AgeYoung                          58.59    79.46
## TotalAssessedChange               32.78   443.68
## PopulationDensity              14267.88 40489.24
## MaleP                             51.37    91.31
## WhiteP                            97.84   100.00
## BlackP                             3.77    57.39
## OtherRaceP                        65.87    94.03
## ForeignBornP                      58.05    86.04
## LaborP                            58.04    64.81
## NoLaberP                          51.07    99.83
## CivilianLaborP                    57.85    64.70
## EmployedWorker                    54.16    62.43
## Unemployedworker                   4.90    13.40
## Occupation.ProfessionalWorkerP    16.48    25.90
## Occupation.ManagersP              14.44    31.03
## Occupation.ClericalP              16.78    34.47
## Occupation.CraftmanP              14.05    34.19
## Occupation.DomesticServiceP        2.35     6.88
## Household1                         1.32     8.11
## Household2                         2.47    17.64
## Household3                        99.88   100.00
## Income1                           39.75    97.60
## Income2                           10.87    18.30
## Income3                           21.94    36.60
## Income4                           67.64    93.25
## Single                            33.63    50.42
## Married                           60.27    75.84
## Separated                          3.27    16.46
## Widowed                           11.31    14.57
## Divorced                           8.28    19.05
## IndustryManufactory               24.72    42.59
## IndustryPublishing                 2.83     7.06
## Hospitality                        8.53    30.46
## IndustryRetail                    18.57    39.53
## IndustryBusiness                   6.88    22.22
## JapaneseP                          6.88    28.41
## ChineseP                          83.12    88.50
## FilipinoP                          6.61    21.56
## KoreanP                            4.04    16.62
## AsianIndianP                       1.08     5.61
## VietnameseP                       16.92    32.54
## HouseholdAsianandPacificP         78.84    87.49
## AsianBelowPovertyLevelP           29.44    83.50
## AsianP                            99.50   100.00
## ForeignBornafter1980              80.34    88.09
## ForeignBorn1970to1979             24.41    45.89
## ForeignBorn1960to1969              7.50    11.15
## ForeignBornBefore1960              6.65    10.84
## LowEduc                           58.52    80.24
## 
## Categorical Variables
##  variable         level                n   pct 
##  Tract            116                    2 0.01
##                   117                    2 0.01
##                   118                    6 0.04
##                   119                    8 0.05
##                   2060.10                4 0.03
##                   2060.20                4 0.03
##                   2061                   3 0.02
##                   2065                   4 0.03
##                   2071                   1 0.01
##                   2071.01                4 0.03
##                   (32 more levels)     109 0.74
##  Year             1940                  21 0.14
##                   1950                  21 0.14
##                   1960                  21 0.14
##                   1970                  21 0.14
##                   1980                  21 0.14
##                   1990                  21 0.14
##                   2000                  21 0.14
##  Theaters         Bard’s Garfield Egyp  21 0.14
##                   Kim Sing Theatre      21 0.14
##                   King Hing Theatre     21 0.14
##                   Kuo Hwa 2 Cinema      21 0.14
##                   Kuo Hwa Theatre       21 0.14
##                   Monterey Theatre      21 0.14
##                   Pagoda Cinema         21 0.14
##  Opening          0                     78 0.53
##                   1                     36 0.24
##                   2                     33 0.22
##  Financial Health 0                     66 0.45
##                   1                     39 0.27
##                   2                      3 0.02
##                   3                      3 0.02
##                   4                     36 0.24
##  Main Address     No                    98 0.67
##                   Yes                   49 0.33
##  Location         Downtown              63 0.43
##                   Suburb                84 0.57
##  PopLevel         1                      6 0.04
##                   2                     28 0.19
##                   3                     27 0.18
##                   4                     21 0.14
##                   5                     25 0.17
##                   6                     40 0.27
##  Address          Theaters Neighborhoo  98 0.67
##                   Theater Main Locatio  49 0.33
##  status           Closing Entire Decad  78 0.53
##                   Opening Several Year  36 0.24
##                   Opening Entire Decad  33 0.22

Dependent Variables:

Dependent Variables:
- Opening Status
  - 0-Closing for The Entire Decade
  - 1-Opening for a Period of Time in This Decade
  - 2-Opening for the Entire Decade
- Financial Health
  - 0-unknown or closing
  - 1-Suspend or Termination while opening
  - 2-at least a penalty
  - 3-at least 1 SI
  - 4-very healthy - filing the tax for the existing year

df$Opening <- factor(df$Opening,
                     levels=c(0,1,2),
                     ordered = FALSE)

df$`Financial Health` <- factor(df$`Financial Health`,
                                levels=c(0,1,2,3,4),
                                ordered = TRUE)

Independent Variables

PopLevel
- 1-Population less than 1000
- 2-Population 1000-2000
- 3-Population 2000-3000
- 4-Population 3000-4000
- 5-Population 4000-5000
- 6-Population more than 5000
AgeYoung: Proportion of population age 35 and younger
TotalAssessedChange: The location’s parcel assessed total value change in percentage in each decade
LowEduc: Proportion of Population have never been to High School
Household Value
- Household1 - Percent of household value less than 10000
- Household2 - Percent of household value less than 20000 more than 10000
- Household3 - Percent of household value more than 20000
Household Income
- Income1 -> Proportion of people earn household income less than 10000
- Income2 -> Proportion of people household income 10000-15000
- Income3 -> Proportion of people household income 15000-25000
- Income4 -> Proportion of people household income more than 25000
MaleP: Proportion of male in the total population
Race
- WhiteP: Proportion of Population are White
- BlackP: Proportion of Population are Black
- OtherRaceP: Proportion of Population are Other Races
- AsianP: Proportion of Population are Asian
- ChineseP: Proportion of Population are Chinese
- FilipinoP: Proportion of Population are Filipino
- KoreanP: Proportion of Population are Korean
- AsianIndianP: Proportion of Population are AsianIndian
- VietnameseP: Proportion of Population are Vietnamese
- HouseholdAsianandPacificP: Proportion of Household are Asian and Pacific
ForeignBornP: Proportion of Population Born in Foreign Countries
Labor:
- LaborP: Proportion of Population in the labor force
- NoLaborP: Proportion of Population not in the labor force
- CivilianLaborP: Proportion of Population in the civilian labor force
- EmployedWorker: Proportion of Population in the civilian labor force are employed
- UnemployedWorker: Proportion of Population in the civilian labor force are unemployed
Occupation
- Occupation.ProfessionalWorkerP: Proportion of Population (16+) Have Professional Specialty Occupations
- Occupation.ManagersP: Proportion of Population (16+) Have Executive, Administrative, and Managerial Occupations
- Occupation.ClericalP：Proportion of Population (16+) Have Administrative Support Occupations, Including Clerical
- Occupation.CraftmanP: Proportion of Population (16+) are Craftsmen, Foremen, and Kindred Workers
- Occupation.DomesticServiceP:Proportion of Population (16+) Have Private Household Occupations
Marriage Status
- Single: Proportion of Population (16+) are single
- Married: Proportion of Population (16+) are in marriage
- Separated: Proportion of Population (16+) are in separation
- Widowed: Proportion of Population (16+) are widows
Industry
- IndustryManufactory: Proportion of Population (16+) are in the Manufacturing industry
- IndustryPublishing: Proportion of Population (16+) are in the Printing and Publishing industry
- Hospitality: Proportion of Population (16+) are in the Food, Entertainment, and Recreation Services industry
- IndustryRetail: Proportion of Population (16+) are in the Retail Trade industry
- IndustryBusiness: Proportion of Population (16+) are in the Business industry
Foreign Born Year
- ForeignBornafter1980: Proportion of Population (16+) are born in foreign countries after 1980
- ForeignBorn1970to1979: Proportion of Population (16+) are born in foreign countries from 1970 to 1979
- ForeignBorn1960to1969: Proportion of Population (16+) are born in foreign countries from 1960 to 1969
- ForeignBornBefore1960: Proportion of Population (16+) are born in foreign countries before 1960
AsianBelowPovertyLevelP: Proportion of Asian Population are Below Standard Poverty Level

Data Visualization with Statistical Plots

library(ggplot2)
library("ggthemes")
ggplot(df, aes(Year, ForeignBornP, color=Location, shape=Address))+
  geom_point(size = 4)+
  theme_light()+
  ylab("Percentage of Birth in Foreign Countries")+
  xlab("Year")+
  ggtitle("Proportion of Foreign Born in the Population from 1940 to 2000")

ggplot(data=df)+
  geom_bar(aes(x=Year,
               fill=status),
           position="fill", width=0.5)+
  scale_fill_brewer("Opening Status",palette = "BuPu")+
  facet_grid(Location~.)+
  ylab("Opening Status Proportion")+
  xlab("Year")+
  theme_bw()

As we can see in the graph, the proportion of the population born in foreign countries increased from the 1940s to the 1970s, both downtown and suburb. During this period, the proportion of foreigners in the downtown area is higher than that of foreigners in suburban areas. This could be explained by the existence of a new Chinatown built in the 1930s, which is the exact location where downtown theaters are.

Beginning in the 1960s, Chinatown became much more diverse in the backgrounds of its Chinese residents and business people. Immigrants, whose numbers grew steadily between the 1950s and the 1990s, came from many different parts of China, as well as from Hong Kong, Taiwan, and Southeast Asia. Chinatown changed particularly fast during the 1980s, as more and more Chinese from Southeast Asia opened up businesses there. Therefore, we can see that the theaters are more likely to open from 1970 to 1990, especially in downtown areas.

While from 1970 to 2000, more and more foreigners come to the suburbs, especially the San Gabriel Valley, where the suburban theaters locate. Such a phenomenon could relate to the new immigration trend of Asians from 1960, which peaked in 1990. This new Ethnoburb mainly attracts Chinese immigrants from Hong Kong, Taiwan, and Mainland China, which could also reveal why some Chinese movie theatres that specifically project Hong Kong films long existed in the 1980s and 1990s.

It is worth noting that the post-1980 globalization trend has profoundly shaped Asian immigration, assimilation, and development in the San Gabriel Valley. The San Gabriel Valley Chinese community was created under global, national, and local contexts and had stronger global connections and internal stratification.

df$FinancialStatus<- factor(df$`Financial Health`, labels = c("Unknown or Closing","Suspend or Termination While Opening","At Least a Penalty","At Least 1 SI-Minor Warn","Very Healthy-Always Filing The Tax"))
ggplot(data=df)+
  geom_bar(aes(x=Year,
               fill=FinancialStatus),
           position="fill", width=0.5)+
  scale_fill_brewer("Financial Health")+
  facet_grid(Location~.)+
  ylab("Financial Health")+
  xlab("Year")+
  theme_bw()

The distribution of financial health from 1940 to 2000 is quite similar to the relationship between the year and opening status, and both of these response variables peak at nearly 1970. While the opening status for theaters in downtown and suburbs varies during the time and perform differently, the financial health of these theaters has a similar trend. For the downtown, the theaters performed financially well in the 1960s and the 1970s, while didn’t do well in other decades. It is possibly caused by the lack of information for their tax records in some specific decades, but it could also bring out the phenomenon that Chinatown has become more diversified and more financially successful as more immigrants come to that area from 1960 and Asian domestic immigration from downtown to suburbs from 1980.

However, although the financial health of theaters in suburbs didn’t have considerable change during this period, it still aggravated from 1980, similar to downtown theaters’ financial health trend. It is important and fascinating to check the reasons that could potentially relate to such a phenomenon and affect the opening status of the theaters. As we set our dependent variables as categorical and ordinal, I first choose ordinal regression ( a statistical technique that is used to predict behavior of ordinal level dependent variables with a set of independent variables ）to see whether those variables have correlations. Later I will do another regression with clustering to do the multilevel logistic regression (a statistical technique that is used to estimate the probability that an event will occur “the yes/no outcome” while taking the dependency of data into account such as the fact that pupils are nested in classrooms) so we can both consider their locations (Downtown or Suburb) as a big group while controlling theaters that distribute in these two clusters as the lower level groups.

Choosing the variables

Because we have too many variables and not enough observations due to the limited sample size, we have to reduce variables by checking their significance level with the dependent variables separately and finally choose ten variables in the final regression. However, we have a lot of missing values for a group of variables that only appear from 1980 to 2000. I will not consider these variables in the final regression but take them in the additional regression that only has the observations between 1980 and 2000 to see their significance. Here is an example of how I chose the variables based on checking their p-value. In addition, it is important to check the correlation between predictors since we don’t want predictors to be strongly correlated.

library(ordinal)

## 
## Attaching package: 'ordinal'

## The following object is masked from 'package:dplyr':
## 
##     slice

library(reshape2)
library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

##Test significance level of variables which have few missing values
mod1<-clm(`Financial Health`~Theaters+Location+AgeYoung, data=df)
summary(mod1)

## formula: `Financial Health` ~ Theaters + Location + AgeYoung
## data:    df
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  147  -135.29 292.59 7(2)  2.91e-13 4.8e+05
## 
## Coefficients: (1 not defined because of singularities)
##                           Estimate Std. Error z value Pr(>|z|)    
## TheatersKim Sing Theatre  -1.93635    0.62652  -3.091 0.001997 ** 
## TheatersKing Hing Theatre -0.66423    0.63620  -1.044 0.296457    
## TheatersKuo Hwa 2 Cinema  -3.22192    0.78058  -4.128 3.67e-05 ***
## TheatersKuo Hwa Theatre    2.67054    0.76922   3.472 0.000517 ***
## TheatersMonterey Theatre  -2.42591    0.66508  -3.648 0.000265 ***
## TheatersPagoda Cinema     -0.87559    0.59759  -1.465 0.142864    
## LocationSuburb                  NA         NA      NA       NA    
## AgeYoung                   0.01573    0.01885   0.835 0.403957    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 0|1  -0.6144     1.0845  -0.567
## 1|2   1.2072     1.0888   1.109
## 2|3   1.3933     1.0896   1.279
## 3|4   1.5743     1.0897   1.445

#Finding correlations between independent variables
sub1<-subset(df, select=c("Opening","OtherRaceP","LowEduc","Unemployedworker","Occupation.ProfessionalWorkerP","Occupation.DomesticServiceP","Income2","Married","IndustryManufactory","IndustryBusiness"))
sub1<-data.frame(apply(sub1,2,function(x) as.numeric(as.character(x))))

cor_plot(sub1,number=TRUE)

In the correlation matrix, the red color represents the negative correlation and the purple color represents the positive correlation. The lighter the color, the weaker the correlation is; the darker the color, the stronger the correlation is.

Based on the plot, we could know which two variables have a strong correlation so we can better choose possible independent variables which should not relate to each other and have a potential relationship with dependent variables. For instance, in the plot, we can see the occupation as professional work, and the proportion of people getting lower education level are strongly negatively related, which means more people who have lower education fewer people will become experienced workers in this area. In addition, occupation as a professional worker is also negatively related to Income2(Proportion of people’s household income 10000-15000). Income2 is the median level of people’s wages, and its share represents the proportion of the middle-class. So the correlation shows the hypothesis that the greater the middle-class proportion, the fewer people will join the professional sector. Thus, I rule out the factor of occupation as professional work since it could be present by either LowEduc or Income. Based on this rule and their correlation levels and one-to-one significance test, I finally choose ten variables in the final regression model(not including those that only appear in late decades).

Ordinal Regression (Opening Status)

We finally chose ten variables in the final regression based on the correlation plot and their significance levels with the opening status.

require(ordinal)
modFinal<-clm(Opening~Theaters+Year+OtherRaceP+LowEduc+Unemployedworker+Occupation.DomesticServiceP+Income2+Married+IndustryManufactory+IndustryBusiness, data=df)

summary(modFinal)

## formula: 
## Opening ~ Theaters + Year + OtherRaceP + LowEduc + Unemployedworker + Occupation.DomesticServiceP + Income2 + Married + IndustryManufactory + IndustryBusiness
## data:    df
## 
##  link  threshold nobs logLik AIC    niter max.grad cond.H 
##  logit flexible  105  -50.49 140.99 7(0)  6.68e-11 8.9e+06
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## TheatersKim Sing Theatre     -3.08724    2.53960  -1.216  0.22412    
## TheatersKing Hing Theatre    -2.15027    2.54552  -0.845  0.39826    
## TheatersKuo Hwa 2 Cinema      0.33004    1.07894   0.306  0.75969    
## TheatersKuo Hwa Theatre       4.51794    1.30949   3.450  0.00056 ***
## TheatersMonterey Theatre     -0.93464    1.06410  -0.878  0.37976    
## TheatersPagoda Cinema         1.52522    2.55507   0.597  0.55055    
## Year1970                      1.36166    1.23412   1.103  0.26988    
## Year1980                     -1.83230    3.62106  -0.506  0.61285    
## Year1990                     -4.80889    3.49219  -1.377  0.16850    
## Year2000                    -14.65039    5.51489  -2.657  0.00790 ** 
## OtherRaceP                    0.04548    0.03233   1.407  0.15948    
## LowEduc                       0.07495    0.06247   1.200  0.23026    
## Unemployedworker             -0.38432    0.31373  -1.225  0.22058    
## Occupation.DomesticServiceP   0.33000    0.47039   0.702  0.48296    
## Income2                       0.08883    0.19861   0.447  0.65467    
## Married                      -0.18642    0.07903  -2.359  0.01833 *  
## IndustryManufactory           0.07362    0.09901   0.744  0.45712    
## IndustryBusiness              0.49258    0.28407   1.734  0.08291 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 0|1   -5.993      5.007  -1.197
## 1|2   -1.549      4.969  -0.312
## (42 observations deleted due to missingness)

exp(-0.18642)

## [1] 0.829925

exp(0.49258)

## [1] 1.636533

##Data Visualization based on the regression
new_obs1<-expand.grid(Theaters="Kim Sing Theatre",
                      Year=c("1960","1970","1980","1990","2000"),
                      OtherRaceP=mean(df$OtherRaceP,na.rm = TRUE ),
                      LowEduc=mean(df$LowEduc,na.rm = TRUE),
                      Unemployedworker=mean(df$Unemployedworker,na.rm = TRUE),
                      Occupation.ProfessionalWorkerP=mean(df$Occupation.ProfessionalWorkerP,na.rm = TRUE),
                      Occupation.DomesticServiceP=mean(df$Occupation.DomesticServiceP,na.rm = TRUE),
                      Income2=mean(df$Income2,na.rm = TRUE),
                      Married=seq(0,80,by=20),
                      IndustryManufactory=mean(df$IndustryManufactory,na.rm = TRUE),
                      IndustryBusiness=mean(df$IndustryBusiness,na.rm = TRUE))

predictions1<-predict(modFinal,new_obs1,type="p")
prediction_data1<-cbind(new_obs1,predictions1)
prediction_data1_long <-melt(prediction_data1, id=c("Married","Year","Theaters","OtherRaceP","LowEduc","Unemployedworker","Occupation.ProfessionalWorkerP","Occupation.DomesticServiceP","Income2","Married","IndustryManufactory","IndustryBusiness"))

prediction_data1_long$status<-factor(prediction_data1_long$variable,labels=c("Closing Entire Decade","Opening Several Years", "Opening Entire Decade"))
prediction_data1_long1<-prediction_data1_long[,c("Theaters","Married","value","status","Year")]

ggplot(data=prediction_data1_long1)+
  geom_area(aes(x=Married, y=value, fill=status),
           stat="identity", alpha=0.5)+
  facet_grid(.~Year, labeller = label_both)+
  scale_fill_manual("Theater Opening Status",values=c("pink","blue","navy"))+
  ylab("Predicted Cumulative Proportions")+
  xlab("Proportion of Married Population")+
  theme_clean()

As we can see in the summary of the model, besides controlling the theaters and years, two more variables significantly influence the opening status. The first one is the marriage rate, and it has a negative relationship with the opening status. It shows that more people are getting married in that area during that decade, and the theaters are more likely to close. It suggests that people who get married are unlikely to go to the Chinese Movie Theaters, which relates to marriage and leisure activities. The plot shows that from 1940 to 2000, marriage negatively impacted theaters’ operating status. However, its impact varies over decades. In 2000, even though the marriage rate was low, it was still hard for theaters to be open. For the later decades, the weight of marriage is not a significant factor that impacts Chinese Movie Theaters’ status. This raises another question: what else is contributing to the closure of Chinese Movie Theaters.

boxplot(Married~Year,data=df, main="Final Data",
   xlab="Proportion of Marriage People in the Population", ylab="Year")

Another assumption is that the marriage rate could relate to the immigration of the population, as the marriage rate was unusual in 1980 and 1990, which correspond to the time when a new trend of immigration appeared. It also shows new trends in people’s lifestyles and recreational activities in the 1980s and 1990s, and I am still exploring some of the factors that could account for this phenomenon.

More specifically, according to the logarithm used, the chances that the married group would contribute to the theaters opening is 0.186 units less than the chances that the non-married group would contribute to the opening of the theaters. That is, as the proportion of people getting married increases 1 unit, the odds that the theaters are more likely to be in the opening status by a factor of exp(-0.186) = 0.83 times, holding other variables fixed.

Another variable that significantly influences opening status is the prosperity of the business industry, and it has a positive relationship with the dependent variable. As more people join the business industry, the more likely the theaters long exist in that period in the area. To be more specific, as the proportion of people in the business industry increases by 1 unit, the odds that the theaters are more likely to be in the opening status by a factor of exp(0.493) = 1.64 holding other variables fixed. It is reasonable since the more prosperous the area is, the more likely the theaters will be open. Both downtown areas and suburbs become the clusters where the Chinese achieve financial success. Since Chinatown is a place that mainly develops hospitality, I test whether hospitality strongly correlates with the theaters’ opening status. It shows that they are not highly related and enable us to think of other businesses in those clusters beside the food and recreational industry.

Ordinal Regression (Financial Health)

modFinal2<-clm(`Financial Health`~Theaters+LowEduc+Unemployedworker+Occupation.DomesticServiceP+Married+IndustryManufactory+OtherRaceP+IndustryBusiness+`Main Address`, data=df)
summary(modFinal2)

## formula: 
## `Financial Health` ~ Theaters + LowEduc + Unemployedworker + Occupation.DomesticServiceP + Married + IndustryManufactory + OtherRaceP + IndustryBusiness + `Main Address`
## data:    df
## 
##  link  threshold nobs logLik AIC    niter max.grad cond.H 
##  logit flexible  105  -62.07 160.14 9(1)  3.83e-13 3.4e+06
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## TheatersKim Sing Theatre     0.136626   1.874335   0.073 0.941891    
## TheatersKing Hing Theatre    5.803995   2.035792   2.851 0.004359 ** 
## TheatersKuo Hwa 2 Cinema    -8.105591   1.697566  -4.775 1.80e-06 ***
## TheatersKuo Hwa Theatre      6.069761   1.446116   4.197 2.70e-05 ***
## TheatersMonterey Theatre    -5.098475   1.330559  -3.832 0.000127 ***
## TheatersPagoda Cinema        4.095790   1.900574   2.155 0.031160 *  
## LowEduc                     -0.138456   0.049998  -2.769 0.005619 ** 
## Unemployedworker             0.134936   0.227610   0.593 0.553288    
## Occupation.DomesticServiceP -0.026345   0.216971  -0.121 0.903359    
## Married                      0.001012   0.041547   0.024 0.980572    
## IndustryManufactory         -0.089403   0.038131  -2.345 0.019045 *  
## OtherRaceP                  -0.006342   0.014044  -0.452 0.651567    
## IndustryBusiness            -0.563349   0.129630  -4.346 1.39e-05 ***
## `Main Address`Yes           -0.326672   0.522556  -0.625 0.531877    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 0|1  -12.619      3.085  -4.091
## 1|2   -6.418      2.384  -2.692
## 2|3   -5.917      2.376  -2.490
## 3|4   -5.390      2.364  -2.280
## (42 observations deleted due to missingness)

exp(-0.56)

## [1] 0.5712091

exp(-0.089)

## [1] 0.9148456

exp(-0.138)

## [1] 0.8710987

As we can see in the regression result, low-level education, Manufacture Industry, and Business Industry all have a significant negative correlation with the financial health of the theaters. Surprisingly, this time the business industry doesn’t contribute to the theaters’ success but negatively impacts the economic success of theaters. As the proportion of people in the business industry increases by 1 unit, the odds that the theaters are less likely to have good financial health by a factor of exp(-0.56) = 0.57 holding other variables fixed. Therefore, the more people joining the business industry, the worse for theaters’ financial health.

It could be applied to the same situation between the manufacturing industry and theaters’ financial health. Though they are not so strongly correlated, it turns out that the more people join the manufacturing industry, the worse of theaters’ financial health. As the proportion of people in the manufactory industry increases by 1 unit, the odds that the theaters are less likely to have good financial health by a factor of exp(-0.089) = 0.92 holding other variables fixed. It could be explained by an assumption that more people joining the manufacturing industry lead to a higher labor force which conflicts with their spare time in going to the theaters.

The third variable is low education level which negatively affects the theaters’ financial health. As the proportion of people who haven’t been to high school increases by 1 unit, the odds that the theaters are less likely to have good financial health by a factor of exp(-0.138) = 0.87 holding other variables fixed. Therefore, to have good financial health for theaters, the education level is essential and good education could contribute to theaters’ success. People with higher education are more likely to go to the theaters, and people with a higher literacy level are more likely to spend their time in Chinese movies or going to the Chinese Movie Theaters.

1980-2000 Ordinal Logistic Regression

We then introduce the second regression with only variables with values from the 1940s to the 1970s. The regression is mainly based on the 1980s to the 2000s dataset (delete the observations with missing values). For this regression, I recoded opening status from three levels to two levels that only state open or closed status. Thus we will use ordinal logistic regression.

##recode the dependent variable
df2<-df
df2$Opening1[df2$Opening==1|df2$Opening==2]<-1
df2$Opening1[df2$Opening==0]<-0
df2$Opening1<-as.factor(df2$Opening1)
df3<-df2

df2<-subset(df2,select=c("Opening1", "Year","Location","Theaters","ChineseP","JapaneseP","AsianIndianP","KoreanP","VietnameseP","AsianBelowPovertyLevelP"))

df2<-na.omit(df2)
modadd<-glm(Opening1~Location+Theaters+JapaneseP+ChineseP+KoreanP+VietnameseP+AsianBelowPovertyLevelP, family="binomial",data=df2)
summary(modadd)

## 
## Call:
## glm(formula = Opening1 ~ Location + Theaters + JapaneseP + ChineseP + 
##     KoreanP + VietnameseP + AsianBelowPovertyLevelP, family = "binomial", 
##     data = df2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.95118  -0.01253   0.00000   0.01601   1.48163  
## 
## Coefficients: (1 not defined because of singularities)
##                            Estimate Std. Error z value Pr(>|z|)  
## (Intercept)               -192.8316  3645.3735  -0.053   0.9578  
## LocationSuburb              -9.0857  3643.3437  -0.002   0.9980  
## TheatersKim Sing Theatre   -36.2082  3643.5350  -0.010   0.9921  
## TheatersKing Hing Theatre  -26.5222  3643.3451  -0.007   0.9942  
## TheatersKuo Hwa 2 Cinema    -2.0398     2.7791  -0.734   0.4630  
## TheatersKuo Hwa Theatre     14.3433     9.2115   1.557   0.1194  
## TheatersMonterey Theatre   -19.3673    11.2743  -1.718   0.0858 .
## TheatersPagoda Cinema            NA         NA      NA       NA  
## JapaneseP                    3.3846     2.0434   1.656   0.0977 .
## ChineseP                     2.1315     1.2232   1.742   0.0814 .
## KoreanP                      0.7267     0.6628   1.096   0.2729  
## VietnameseP                  1.3597     0.8554   1.590   0.1119  
## AsianBelowPovertyLevelP      0.9429     0.5880   1.604   0.1088  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 84.33  on 61  degrees of freedom
## Residual deviance: 22.92  on 50  degrees of freedom
## AIC: 46.92
## 
## Number of Fisher Scoring iterations: 19

exp(2.13)

## [1] 8.414867

exp(3.38)

## [1] 29.37077

levels(df$Theaters)

## [1] "Bard’s Garfield Egyptian Theatre\n" "Kim Sing Theatre"                  
## [3] "King Hing Theatre"                  "Kuo Hwa 2 Cinema"                  
## [5] "Kuo Hwa Theatre"                    "Monterey Theatre"                  
## [7] "Pagoda Cinema"

new_obs2<-expand.grid(Theaters="Pagoda Cinema",
                      Location=c("Downtown","Suburb"),
                      JapaneseP=mean(df2$JapaneseP,na.rm = TRUE ),
                      AsianIndianP=mean(df2$AsianIndianP,na.rm = TRUE),
                      VietnameseP=mean(df2$VietnameseP,na.rm = TRUE),
                      ChineseP=seq(55,68,by=0.01),
                      KoreanP=mean(df2$KoreanP,na.rm = TRUE),
                      AsianBelowPovertyLevelP=mean(df2$AsianBelowPovertyLevelP,na.rm = TRUE))

predictions2<-predict(modadd,new_obs2,type="response")
prediction_data2<-cbind(new_obs2,predictions2)
prediction_data2_long <-melt(prediction_data2, id=c("JapaneseP","ChineseP","AsianIndianP","KoreanP","VietnameseP","AsianBelowPovertyLevelP","Location","Theaters"))

prediction_data2_long$status<-factor(prediction_data2_long$variable, labels = c("Opening"))

ggplot(data=prediction_data2_long)+
  geom_area(aes(x=ChineseP, y=value, fill="Opening"))+
  scale_fill_manual("Theater Status",values = "#C3D7A4")+
  geom_line(aes(x=ChineseP, y=value))+
  facet_grid(Location~., labeller = label_both)+
  ylab("Predicted Cumulative Proportion of Theaters in Opening Status")+
  xlab("Proportion of Chinese in The Total Population")+
  theme_clean()

As we can see in this regression result, no variable strongly impacts theaters’ opening status if we set the confidence level to be 95%. However, it could still count the Chinese and Japanese Proportion in the population as two variables that positively relate to theaters’ opening status. As the Proportion of Chinese in the population increases by 1 unit, the odds that the theaters are likely to be open by a factor of exp(2.13) = 8.41 holding other variables fixed. As the Proportion of Japanese in the population increases by 1 unit, the odds that the theaters are likely to be open by a factor of exp(3.38) = 29.37, holding other variables fixed. It is essential to mention that Proportion of Chinese is really high compared with other Asian races, so the influence of the Chinese contributing to the theaters is statistically smaller than other race factors. Still, it contributes a lot since it has a large base population both in downtown and suburbs.

The graph of one example of a theater indicates that the more Chinese in the area, the more likely the theaters are to be open. It also illustrates that the increasing Chinese population can more easily contribute to theaters’success downtown than in the suburbs. If the Chinese proportion is 60%, it seems that a downtown theater has a 50% likelihood of being open while it is more likely to be closed in the suburbs.

Hierarchical Logistic Regression

Finally, we take additional hierarchical logistic regression to make a better model for the dataset. It is because we have several levels here, the first level is the theater, and then the second level is the location. The lower level presents the small unit, and the higher level presents the bigger group. Since downtown and suburb are two clusters that are obvious on the map where theaters distribute, and for each theater, it could be regarded as a small cluster since I also count its tract and its adjacent tracts into the dataset. Therefore, we could use hierarchical logistic regression to find the significant variables better.

library(lme4)

## Loading required package: Matrix

## 
## Attaching package: 'lme4'

## The following objects are masked from 'package:ordinal':
## 
##     ranef, VarCorr

mod2<-glmer(Opening1~(1 + 1|Location/Theaters)+LowEduc+Occupation.ProfessionalWorkerP+Married+IndustryBusiness+Location, family = "binomial",data=df3)
summary(mod2)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## Opening1 ~ (1 + 1 | Location/Theaters) + LowEduc + Occupation.ProfessionalWorkerP +  
##     Married + IndustryBusiness + Location
##    Data: df3
## 
##      AIC      BIC   logLik deviance df.resid 
##     80.7    101.9    -32.3     64.7       97 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0367 -0.0532  0.0004  0.1568  5.2921 
## 
## Random effects:
##  Groups            Name        Variance  Std.Dev. 
##  Theaters:Location (Intercept) 1.845e+01 4.2956575
##  Location          (Intercept) 1.307e-08 0.0001143
## Number of obs: 105, groups:  Theaters:Location, 7; Location, 2
## 
## Fixed effects:
##                                Estimate Std. Error z value Pr(>|z|)   
## (Intercept)                    50.84100   18.70888   2.717  0.00658 **
## LowEduc                        -0.04663    0.05657  -0.824  0.40977   
## Occupation.ProfessionalWorkerP -0.73456    0.38063  -1.930  0.05363 . 
## Married                        -0.65591    0.24630  -2.663  0.00774 **
## IndustryBusiness               -1.12843    0.42237  -2.672  0.00755 **
## LocationSuburb                  5.52195    5.68680   0.971  0.33154   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) LowEdc Oc.PWP Marrid IndstB
## LowEduc     -0.241                            
## Occptn.PrWP -0.935  0.068                     
## Married     -0.968  0.054  0.943              
## IndstryBsns -0.926  0.042  0.897  0.933       
## LocatinSbrb  0.542  0.296 -0.713 -0.684 -0.632

exp(-0.65591)

## [1] 0.5189696

exp(-1.12843)

## [1] 0.3235408

new_obs3<-expand.grid(Theaters=c("King Hing Theater"),
                      Location=c("Downtown"),
                      LowEduc=mean(df$LowEduc,na.rm = TRUE),
                      Married=mean(df$Married,na.rm = TRUE),
                      Occupation.ProfessionalWorkerP=mean(df$Occupation.ProfessionalWorkerP,na.rm = TRUE),
                      IndustryBusiness=seq(2,8,by=1))

predictions3<-predict(mod2,new_obs3,allow.new.levels = TRUE,type="response")
prediction_data3<-cbind(new_obs3,predictions3)
prediction_data3_long <-melt(prediction_data3, id=c("LowEduc","Theaters","Occupation.ProfessionalWorkerP","IndustryBusiness","Married","Location"))



prediction_data3_long$status<-factor(prediction_data3_long$variable, labels = c("Opening"))

  
ggplot(data=prediction_data3_long)+
  geom_area(aes(x=IndustryBusiness, y=value, fill="Opening"))+
  scale_fill_manual("Theater Status",values=c("lightblue"))+
  geom_line(aes(x=IndustryBusiness, y=value))+
  ylab("Predicted Cumulative Proportion of Theaters in Opening Status")+
  xlab("Proportion of People in The Business Industry")+
  theme_classic()

As we can see in the regression result and graph, it is surprisingly finding out that it is almost the same result as what we get from ordinal regression since only two significant variables are “Married” and “IndustryBusiness.” However, as the married population is still negatively related to the opening status, the business industry turns from positively to negatively related to the opening status. We could compare two models’ AIC to see which one could more accurately predict based on the model. In statistics, AIC is used to compare different possible models and determine which one is the best fit for the data. Normally, lower AIC values indicate a better-fit model. As the logistic regression’s AIC( 80.7) is lower than the ordinal one (141), it is better to choose the logistic one. Thus, as the proportion of people getting married in the population increases by 1 unit, the odds that the theaters are less likely to be open by a factor of exp(-0.656) = 0.519, holding other variables fixed. As the proportion of people joining the business industry in the population increases by 1 unit, the odds that the theaters are less likely to be open by a factor of exp(-1.128) = 0.324, holding other variables fixed. As I assumed before, such a correlation could relate to immigrants’ new entertaining fashion and some businesses that could potentially compete with the theater industry.

1980-2000 Hierarchical Logistic Regression

mod3<-glmer(Opening1~(1|Location/Theaters)+Location+JapaneseP+ChineseP+KoreanP+VietnameseP+AsianBelowPovertyLevelP, family="binomial",data=df2)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00655929 (tol = 0.002, component 1)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?

summary(mod3)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## Opening1 ~ (1 | Location/Theaters) + Location + JapaneseP + ChineseP +  
##     KoreanP + VietnameseP + AsianBelowPovertyLevelP
##    Data: df2
## 
##      AIC      BIC   logLik deviance df.resid 
##     68.2     87.3    -25.1     50.2       53 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3933 -0.2210  0.0095  0.1767  1.4278 
## 
## Random effects:
##  Groups            Name        Variance  Std.Dev.
##  Theaters:Location (Intercept) 1.682e+01 4.101350
##  Location          (Intercept) 7.654e-05 0.008749
## Number of obs: 62, groups:  Theaters:Location, 7; Location, 2
## 
## Fixed effects:
##                         Estimate Std. Error z value Pr(>|z|)  
## (Intercept)             -82.0838    38.9055  -2.110   0.0349 *
## LocationSuburb            4.0305     4.5227   0.891   0.3728  
## JapaneseP                 1.1202     0.5609   1.997   0.0458 *
## ChineseP                  0.8032     0.3810   2.108   0.0350 *
## KoreanP                   0.7744     0.4946   1.566   0.1174  
## VietnameseP               0.6666     0.3275   2.035   0.0418 *
## AsianBelowPovertyLevelP   0.3029     0.1665   1.819   0.0690 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) LctnSb JapnsP ChinsP KorenP VtnmsP
## LocatinSbrb -0.647                                   
## JapaneseP   -0.956  0.596                            
## ChineseP    -0.997  0.605  0.958                     
## KoreanP     -0.502  0.202  0.260  0.507              
## VietnameseP -0.844  0.409  0.751  0.848  0.641       
## AsnBlwPvrLP -0.842  0.605  0.852  0.827  0.217  0.483
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.00655929 (tol = 0.002, component 1)
## Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?

exp(1.1202)

## [1] 3.065467

exp(0.667)

## [1] 1.948383

exp(0.8032)

## [1] 2.232674

new_obs3<-expand.grid(Theaters="Pagoda Cinema",
                      Location=c("Downtown","Suburb"),
                      JapaneseP=mean(df2$JapaneseP),
                      AsianIndianP=mean(df2$AsianIndianP),
                      VietnameseP=seq(0,20,by=0.1),
                      ChineseP=mean(df2$ChineseP),
                      KoreanP=mean(df2$KoreanP),
                      AsianBelowPovertyLevelP=mean(df2$AsianBelowPovertyLevelP))


predictions3<-predict(mod3,new_obs3,allow.new.levels = TRUE,type="response")
prediction_data3<-cbind(new_obs3,predictions3)
prediction_data3_long <-melt(prediction_data3, id=c("JapaneseP","ChineseP","AsianIndianP","KoreanP","VietnameseP","AsianBelowPovertyLevelP","Location","Theaters"))

prediction_data3_long$status<-prediction_data3_long$variable

prediction_data3_long$status<-factor(prediction_data3_long$variable)

  
ggplot(data=prediction_data3_long)+
  geom_area(aes(x=VietnameseP, y=value, fill="Opening"))+
  geom_line(aes(x=VietnameseP, y=value))+
  scale_fill_manual("Theater Status",values=c("#52854C"))+
  facet_grid(Location~., labeller = label_both)+
  ylab("Predicted Cumulative Proportion of Theaters in Opening Status")+
  xlab("Proportion of Vietnamese in The Total Population")+
  theme_clean()

I also ran another regression for the small dataset that ranges from 1980 to 2000, so we could know better about those variables that only show up in this period. It turns out that besides the two variables “ChineseP” and “JapaneseP” having a strong positive correlation with the opening status that we have known from the ordinal regression, the new variable “VietnameseP” also shows a significant positive correlation with the opening status of theaters. It accounts for the phenomenon that a lot of Vietnamese immigrants to America, especially in suburban areas, from 1980 with the well-known trend of Chinese immigration.

More specifically, as the proportion of Chinese in the population increases by 1 unit, the odds that the theaters are likely to be open by a factor of exp(0.8032) = 2.23 holding other variables fixed. As the proportion of Japanese or Vietnamese in the population increases by 1 unit, the odds that the theaters are likely to be open by a factor of exp(1.1202) = 3.06 or exp(0.667)= 1.95 respectively while holding other variables fixed.

It is easy to observe the positive relationship between the Vietnamese Population and the opening status of Chinese Movie Theaters in the graph（one example of a theater), and there is no significant difference between their performances in downtown or suburb of this correlation.

Let’s compare the two models again. This time the AIC for the hierarchical logistic model is 68.2, while the ordinal logistic regression’s AIC is 46.92. Although the ordinal logistic regression’s AIC is lower, there are some other ways we need to consider in choosing the model. In the ordinal model, location is a factor could impact the correlation between theaters’ opening status, but in the multilevel hierarchical logistic model, location is no longer the important factor. This is because the multilevel model brings out the concept of clusters and we take in theaters’ distribution. Therefore, it is more rational to consider the cluster in our case and use the multilevel logistic regression.

Conclusion

Based on the ordinal models and multilevel logistic regressions, we surprisingly found that among a lot of demographic factors, such as education, age, marriage status, population, race, income, household value, industry, and occupation, two variables are significantly negatively related to the theaters’ opening status. As more people get married, they are less likely to go to the theaters, which makes us think of married people’s distribution of recreational activities and leisure time. After checking the number of married people in different decades, we found that in the 1980s and the 1990s, the number was lower than marriage rates in other decades, which makes us consider the influence of the immigration policies and people’s changing lifestyles. As for the business industry, the result is quite different as we run different models. Based on the ordinal model, it is positively related to the opening status, while the logistic reveals that it is negatively related to the opening status. As we test that the business industry is negatively associated with the theaters’ financial health and AIC for the logistic model is lower than the one we get from the ordinal model, it is more reasonable to assume that the business industry harms the theaters’ success. The more people join the business industry, or the area is more focused on the business, the less likely the theaters will be open or financially successful. A study shows that when the local labor market becomes tighter, the minority labor force, which normally concentrates at the lower level of the job hierarchy-is most likely to be squeezed out of employment. This could partially explain the decline of the Chinese Movie Theaters in the 1990s and the negative relationship between business industry prosperity and theaters’ success. Since Movie Industry in Chinatown is mainly operated by minorities and they are self-employed, concentrating on low-skill sectors with easy entry. Such industry could be regarded as being in a low job hierarchy, and it is easy to be ruled out while competing with other businesses operated by whites or not in a lower order. Therefore, Chinese Movie Theaters quickly decreased in numbers as other businesses developed in the 1990s. Such phenomenon is also brought by another finding that the manufacturing industry and low education level are also negatively related to the theaters’ financial health. It seems that people who go the Chinese movie theaters have a higher education level and literacy level. It makes us consider the target audiences of Chinese movie theaters and their backgrounds.

Although we didn’t find out a direct correlation between the Asian population and the theater opening status or prove our assumption that immigration is a significant factor contributing to the theaters’ success (AsianP is not significantly related to the dependent variable), we have shown that Chinese population, Japanese population as well as Vietnamese population are positively correlated with theaters’ opening status. Those Chinese movie theaters’ target audiences are Asians, and most of them are operated by Chinese. Therefore, from the 1970s to the 2000s, many Chinese and Vietnamese immigrants appeared in that period, hugely contributing to theaters’ financial success and keeping them open for a long time. We cannot test their correlation because we lack detailed information about Asian races from the 1940s to the 2000s, but we can still assume they are possibly related. Another interesting finding is that location is not an essential factor influencing the opening status of theaters. This means that no matter whether theaters are in downtown or the suburbs, the site could not hugely impact their operating conditions.

Therefore, the business industry, Asian population (Japanese, Chinese, Vietnamese), married population, education level, and manufacturing industry are all important factors that could affect the opening status of theaters and their financial health. However, location is not a significant factor affecting theaters’ survival status. Our findings could be more accurate if we include more theaters based in LA and other counties or states to have a more significant sample and know the general factors that could influence the financial health of Chinese-language movie theaters. It is necessary to conduct more research on immigration policy and theaters’ histories to have more concrete evidence to support such correlations. As for the chosen model, I have tried ordinal, multinomial, and hierarchical logistic regression. It could be most accurate as we run the hierarchical logistic regression since it is mainly cluster-based and could have several levels of groups. We have to control those variables to predict the possible factors better. It could be more profound for the post estimation, and I am interested in exploring more findings after the regression.

Acknowledgement:

I would like to thank Prof.Dombrowski for her continued guidance and support and Sophie Gilbert for her help . I would also like to thank Prof.Kaparakis, Prof. Nazarro, Prof.Kabacoff, Prof.Rose, Prof. Gooyabadi, Prof.Oleinikov, and the QAC.

References

Acs, Z.J. 2007. Entrepreneurship, economic growth and public policy. Small Business Economics 28: 109–122.

Allen, J. P. and Turner, E. (1997). The Ethnic Quilt. Northridge, California: The Center for Geographical Studies, California State University, Northridge.

Social Explorer. Los Angeles. From https://www.socialexplorer.com/explore-tables

1950 census of population.From https://www2.census.gov/library/publications/decennial/1950/population-volume-1/vol-01-01.pdf

Resources:

Data Source: https://www.socialexplorer.com/explore-tables

1950 Census: https://www2.census.gov/library/publications/decennial/1950/population-volume-3/41557421v3p2ch07.pdf

Census Tract Geocoder: https://geocoding.geo.census.gov/geocoder/geographies/address?form

Articles:

CHINESE IMMIGRATION AND ITS IMPLICATIONS ON URBAN MANAGEMENT IN LOS ANGELES: https://www.jstor.org/stable/pdf/24872617.pdf

1990s: The Golden Decade : CHINATOWN LOS ANGELES : Revitalized Community Rises From Shock Waves of Change: https://www.latimes.com/archives/la-xpm-1990-01-15-ss-97-story.html