[vc_row row_type=”3″ blox_image=”20714″ align_center=”aligncenter” page_title=”page-title-x” blox_dark=”true” parallax_speed=”8″][vc_column]
[vc_column_text]
Extraction and analysis of the Non-Profit organization data
using BigData engineering and data collection tools
[/vc_column_text]
[/vc_column][/vc_row][vc_row][vc_column width=”1/4″]
AboutThe Client
[/vc_column][vc_column width=”1/2″]
[vc_column_text]
The project was implemented for the startup based in the US.
[/vc_column_text]
[/vc_column][vc_column width=”1/4″][/vc_column][/vc_row][vc_row][vc_column width=”1/3″][vc_single_image image=”21718″ img_size=”550×350″][/vc_column][vc_column width=”1/3″][vc_single_image image=”21368″ img_size=”550×350″][/vc_column][vc_column width=”1/3″][vc_single_image image=”21333″ img_size=”550×350″][/vc_column][/vc_row][vc_row][vc_column]
[vc_column_text]
About this project
[/vc_column_text][vc_row_inner][vc_column_inner width=”1/2″][vc_column_text]
Project Description:
There are thousands of non-profit organizations in the USA. These non-profit organizations are tax-exempt organizations. However, twice per year, they fill a number of tax information related forms. These forms are produced and provided by the US state agencies and are open. The filled forms are then digitized and stored in the Amazon S3 bucket. Hence they are publicly available. The project aims to parse the XML format of these forms and gather data. Our pipeline is focused on gathering the useful information of the staff of these non-profit organizations such as salary, the number of working hours per week, titles, emails, phone numbers, and so on. Our pipeline is also capable of filtering the data on the organization level to extract information for each non-profit organization, such as number of employees, their contact information, the revenue of the organization etc. The final aim of the project is to have the talent pool of the staff members of these non-profit organizations and provide this data to interested parties.
[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/2″][vc_column_text]
Technologies:
- AWS, Python, Scrapy, Pandas, Postgres.
[/vc_column_text][/vc_column_inner][/vc_row_inner]
[/vc_column][/vc_row]