Cloud-based Data Infrastructure and ETL

Client Challenges

  • Lack of Data Engineering and BI Expertise with the core team of the Client
  • Tight timelines (the new Data Infrastructure and ETL to populate the BI-ready data had to be the part of the upcoming new ERP platform release)
  • Operational teams in the country markets and the central management team in EU needed the same outlook on the business performance metrics/KPI

Services

  • Building the entire BI infrastructure in cloud from the scratch (data sources: relational databases of the Client online e-commerce micro-services; various third-party Web APIs – Exponea, LeadSquared, Google Analytics, Mixpanel, K2, logzio, prism.io etc.)
  • Enabling the Client to start building the award-winning ML solutions on top of the reliable and clean master data •Helping the Client to start its own BI reporting team capable of using Power BI as a reporting tool
  • Training the customer  personnel (BI analysts, product managers, software developers, data analysts) to use their DWH and BI infrastructure
  • GCP stack used extensively (namely, BigQuery, DataFlow, Dataproc,Composer, GCS, Cloud Functions, Metabase, and Stich)

Client: E-commerce platform operator (used car sales, operated in 10 countries)

Segment: Big Data, Data Warehouse, ETL

Users base: Hundreds of users in different country offices of the Client

Technology stack: GCP, Python, PySpark, SQL

Team structure: 1 Data Architect, 1 Sr Data Engineer, 1 Jr Data Engineer