Blog l NAD Digital Evangelism l Columbia, MD

Discovering Mission-Oriented Patterns with Open Data in New York City

1/5/2017

Harvey Alférez, Ph.D

Data Scientist, School of Engineering and Technology, Montemorelos University, Mexico
www.harveyalferez.com

Traffic in NYC as captured by my wife Doris’ camera lens

There is tons of open data on the Web. This data can be freely used by Seventh-day Adventists to try to figure out ways to help the inhabitants in the cities. This post describes how the students at my Pattern Recognition course at Montemorelos University and I have used open data and machine learning, which is a key component of data science, to discover interesting mission-oriented patterns for the church at NYC.

In my courses, I mostly focus on analyzing open data from NYC because of two reasons: 1) NYC has pivotal significance in our church’s ongoing Mission to the Cities project; and 2) NYC provides a portal that makes the wealth of public data generated by various NYC agencies and other city organizations available for public use [1].

Although the number of traffic deaths in NYC has fallen [2], city officials and traffic-safety groups agree that more aggressive steps must be taken to reach Mayor Bill de Blasio’s goal of eliminating traffic deaths in the city [3]. With this problem in mind, we analyzed a dataset of motor vehicle collisions in NYC, which is freely provided by the Police Department [4].

The studied dataset was created in 2014 and subsequently updated in 2016. This dataset registers motor vehicle collisions in Bronx, Brooklyn, Manhattan, Queens, and Staten Island from 2014 to 2016. This is a large dataset with 932,904 registered incidents! Moreover, each registered incident has 30 variables.

With traditional queries and spreadsheet analysis it is quite difficult (and sometimes impossible) to obtain timely answers to unseen patterns in large quantities of data, such as in our case study. In this kind of cases, machine learning, which “gives computers the ability to learn without being explicitly programmed” [5], can help us to grasp patterns we did not know that even exist.

From the set of 30 variables, we chose a subset to carry out the experiments. First, we chose the variables Date and Time because we wanted to know the day and time of each traffic incident. The Zip Code, Borough, Longitude, and Latitude variables were chosen because we wanted to know the demographic information of the accidents. Also, we had interest in figuring out the demographic groups that were injured the most. Therefore, we included in the experiments the Injured Persons, Injured Pedestrians, Injured Motorists and Injured Cyclists variables. Last but not least, we wanted to determine what provoked the accident and the type of vehicle that caused the accident. Therefore, we chose the Contributing Vehicle 1 and the Vehicle Type Code 1 variables from the dataset.

In order to analyze the data, we used Weka, which is a powerful tool for machine learning [6]. Although Weka contains a large range of machine learning algorithms, for our exploration we used the K-Means algorithm because the input data is unlabeled.

Our findings are as follows:

On Fridays, around Prospect Park, Brooklyn, there are around 77,000 pedicab accidents registered in the dataset, which goes from 2014 to 2016. This finding can help Seventh-day Adventist congregations in the area (see the map below) to bring awareness to the community in order to prevent this kind of accidents. For example, pathfinders could go to the park on a field day and ride away with banners as flags on their bicycles or images and messages on shirts saying “Beware of your surroundings. Be swift on the brakes”. Also, information boots could be on display with pamphlets offering cycling safety tips to avoid collisions as well as what to do if people get involved in a traffic accident. Also a mini clinic to attend minor injuries in the vicinity would come in handy.

Alcohol has been one of the biggest accident contributors. Moreover, accidents in which bicycles were involved have caused the highest number of deaths and injuries. Church members could look for innovative ways to inform the general population of the dangers of consuming alcohol and driving, and about cycling safety.

On Thursdays, Fridays, and Saturdays, drivers tend to drive aggressively. This situation increases the number of accidents during those days. A solution the church could offer to this problem is to launch a social media campaign on stress management at the end of the week.

Let us use the knowledge that open data offers us to make a difference in the cities. As shown in the results above, big problems could have implementable simple solutions in which church members could make an extraordinary difference in their communities. Although a manual process could have been carried out to analyze the large dataset in our case study, it would have taken weeks or even months. In our case, the process just took a few days and a considerable low human-based analysis (computers did the hard work).

I thank the students at my Pattern Recognition course, Anthony, Claudia, Carlos, Isaías, Jairo, Eduard, Marco, Jaziel and Carlos, for their intense work on the experiments.

References:

1. The City of New York, “NYC Open Data,” (n.d.), https://data.cityofnewyork.us.

2. E. G. Fitzsimmons, “Number of Traffic Deaths in New York Falls for a Second Straight Year,” (2016), http://www.nytimes.com/2016/01/02/nyregion/number-of-traffic-deaths-in-new-york-falls-for-a-second-year-in-a-row.html.

3. M. Flegenheimer, “De Blasio Outlines Steps to Eliminate Traffic Deaths,” (2014), https://www.nytimes.com/2014/02/19/nyregion/de-blasio-unveils-plans-to-eliminate-traffic-deaths.html.

4. NYPD, “NYPD Motor Vehicle Collisions,” (2014), https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95.

5. P. Simon, Too Big to Ignore: The Business Case for Big Data (Hoboken, NJ.: Wiley, 2013).

6. The University of Waikato, “Weka 3: Data Mining Software in Java,” (n.d.), http://www.cs.waikato.ac.nz/ml/weka/.

Angie

1/6/2017 08:53:40 pm

Well done! This is a great example of how the Church can respond better to the needs of their surrounding societies.
An infographic showing the results of some of those data analysis carried would give more insight to the reader of the complexity but power of your method.
Keep up the good work!

Jamie Schneider

1/7/2017 09:44:57 pm

Thanks! Good suggestion.

Harvey link

1/19/2017 08:49:00 am

Thank you for your suggestion, Angie! Let's keep in touch. God bless.

Paul Kujawa link

10/10/2022 02:34:13 pm

Also, we had interest in figuring out the demographic groups that were injured the most. Therefore, we included in the experiments the Injured Persons, Injured Pedestrians, Thank you for sharing your great post!

Bmw Diesel Tuning link

2/13/2023 06:33:27 am

We had interest in figuring out the demographic groups that were injured the most. Therefore, we included in the experiments the Injured Persons, Injured Pedestrians, Injured Motorists and Injured Cyclists variables. Thank you for making this such an awesome post!

Jeffrey Sheehan link

3/31/2023 02:14:40 pm

Also, information boots could be on display with pamphlets offering cycling safety tips to avoid collisions as well as what to do if people get involved in a traffic accident. Thank you for the beautiful post!

Fred Houston link

4/29/2023 07:37:50 am

Fence Company Raleigh NC link

5/3/2023 07:32:15 am

Such an interesting post. Thanks for sharing this one!

William Brown link

6/15/2023 09:10:52 am

As shown in the results above, big problems could have implementable simple solutions in which church members could make an extraordinary difference in their communities. Thank you for sharing your great post!

Paul Jeffries link

7/5/2023 12:23:57 pm

https://www.rockymountainoils.com/products/jasmine-essential-oil link

12/10/2023 05:01:33 pm

Harvey Alférez's exploration of open data and machine learning for mission-oriented patterns in New York City is inspiring. It's a testament to the evolving landscape of data science and its potential to address critical societal issues.

Rocky Mountain Oils offers a premium Jasmine Essential Oil that is renowned for its exquisite floral aroma and therapeutic properties. This pure and undiluted oil is sourced from the fragrant blossoms of the jasmine flower, making it a popular choice for aromatherapy and skincare routines. With its captivating scent and potential wellness benefits, Rocky Mountain Oils' Jasmine Essential Oil is a versatile addition to any essential oil collection.

Comments are closed.

Blog

Discovering Mission-Oriented Patterns with Open Data in New York City

Harvey Alférez, Ph.D

Archives

Categories

Location

Contact Us

Subscribe Today!