GitLab now enforces expiry dates on tokens that originally had no set expiration date. Those tokens were given an expiration date of one year later. Please review your personal access tokens, project access tokens, and group access tokens to ensure you are aware of upcoming expirations. Administrators of GitLab can find more information on how to identify and mitigate interruption in our documentation.
1. Follow these steps from this [page](https://cloud.google.com/bigquery/docs/authentication#client-libs)
Set up Application Default Credentials (ADC) in your local environment:
* Install the Google Cloud CLI, then initialize it by running the following command:
`gcloud init`
* Create local authentication credentials for your Google Account:
`gcloud auth application-default login`
* A login screen is displayed. After you log in, your credentials are stored in the local credential file used by ADC.
For more information about working with ADC in a local environment, see [Local development environment](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev).
1. Run `./gradlew build`
You should see: `BUILD SUCCESSFUL`
4. Project should be ready
# Usage
Project can be run by command: `./gradlew run`
This will be run in default configuration
To change configuration, you should use command parameters:
1.**files** - number of files to fetch from Github
1.**batch** - number of files in one batch
1.**mode**
mode=1 - it will use example files (for testing purposes)
mode=2 - it will fetch files from github
mode=3 - it will delete local DB with queries (for testing purposes)
1.**offset** - number of already processed files (files from github are ordered by repozitory name) so processed files won't be fetched again
1.**sample**
sample=1 - it will use sample github collections (for testing purposes)
sample=2 - it will use original github collection
Parameters can be used like this:
*`./gradlew run -Dfiles=100 -Dbatch=50 -Dmode=2 -Doffset=200 -Dsample=2`
* This command will fetch **100 files** in two **50 files batches** from **github** (mode=2) from **offset 200** (so there will be products from 200 to 300) from **original github collection** (sample=2)
In the testing process, i found out that most efficient **batch** size is **100**. In this configurrations, it takes around **5 minutes** to fetch this batch (100 files) from github. From this we can assume that **10000 files** will take around **8 hours**.
Here are some test runs which proves the lines above: