Simple ETL process using Java/Spark
$30-250 USD
Paid on delivery
Write a ETL process using Java, Spark & HDFS.
Copy the input file to HDFS
Read the input file from HDFS using Java & Spark
Perform below function on the dataset
Average_Calculation()
For each stock , calculate the average trading volume for each month, average trading price for each month.
so for each stock , for each month , calculate the avg of volumen and average of stock close price
STOCK, AVG_VOLUME, AVG_CLOSING_PRICE, MONTH, YEAR
AAPL, 4343434 , 85, JULY, 2007
Write the output file back to HDFS
Run the ETL process using Spark in Cluster mode and client mode
Document all errors encountered and error resolution
------------------------------------------------------------------------------------------------------------------------------
Source input file:
STOCK, ASK_PRICE, BID_PRICE,OPEN_PRICE,CLOSE_PRICE, VOLUME,DATE
AAPL, 100.01, 100.02, 99.5, 99.7, 343434000, 12/7/2001
Destination file
STOCK, AVG_VOLUME, AVG_CLOSING_PRICE, MONTH, YEAR
AAPL, 4343434 , 85, JULY, 2007
Project ID: #16258187
About the project
10 freelancers are bidding on average $153 for this job
hi, I have expertise on spark,scala, java, hadoop.... done production scripts, scala job which process hdfs data and write back to hdfs. have read JSON, XML, CSV, tab, avro, parquet, orc file format. have read hive More
Hi, I am Amit. I have experience in Spark and Java. I can write the code as per the requirement you have given. Please share the input file for testing. And can provide you with documentation as well. Looking forward More
Hi, I am a professional Big Data Consultant with over 5 years of experience. I have read your request and interested to work for you as I am expert of Spark with Scala, and HDFS and can write a spark script for this pr More
I have briefly read the description on java development, and I can deliver as per the requirements. .................
I am interested to work on this project as I have relevant experience in Big Data,Sqoop, Hadoop, Spark, Hive, Kafka, Spark Streaming, Rdd, Datframe, Dataset , Python, Scala and Java. I am well versed in Installation an More