Skip to content
Snippets Groups Projects
Commit f488b82a authored by samuelvasecka's avatar samuelvasecka
Browse files

third commit

parent 660fd3cf
2 merge requests!2Master,!1Master
# Poject set up
# Project set up
1.
\ No newline at end of file
1. Clone this repo
1. Follow these steps from this [page](https://cloud.google.com/bigquery/docs/authentication#client-libs)
 
Set up Application Default Credentials (ADC) in your local environment:
* Install the Google Cloud CLI, then initialize it by running the following command:
`gcloud init`
* Create local authentication credentials for your Google Account:
`gcloud auth application-default login`
* A login screen is displayed. After you log in, your credentials are stored in the local credential file used by ADC.
For more information about working with ADC in a local environment, see [Local development environment](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev).
 
1. Run `./gradlew build`
You should see: `BUILD SUCCESSFUL`
4. Project should be ready
# Usage
Project can be run by command: `./gradlew run`
This will be run in default configuration
To change configuration, you should use command parameters:
1. **files** - number of files to fetch from Github
1. **batch** - number of files in one batch
1. **mode**
mode=1 - it will use example files (for testing purposes)
mode=2 - it will fetch files from github
mode=3 - it will delete local DB with queries (for testing purposes)
1. **offset** - number of already processed files (files from github are ordered by repozitory name) so processed files won't be fetched again
1. **sample**
sample=1 - it will use sample github collections (for testing purposes)
sample=2 - it will use original github collection
Parameters can be used like this:
* `./gradlew run -Dfiles=100 -Dbatch=50 -Dmode=2 -Doffset=200 -Dsample=2`
* This command will fetch **100 files** in two **50 files batches** from **github** (mode=2) from **offset 200** (so there will be products from 200 to 300) from **original github collection** (sample=2)
In the testing process, i found out that most efficient **batch** size is **100**. In this configurrations, it takes around **5 minutes** to fetch this batch (100 files) from github. From this we can assume that **10000 files** will take around **8 hours**.
Here are some test runs which proves the lines above:
Batch size: 10
Github total request time: 17,4 min
```
Github requests time in ms: 1049982(98,22%)
ANTLR parsing time in ms: 7533(0,70%)
Parsing tree string finding: 5841(0,55%)
Whole time: 1068991
Number of all found queries: 18(from 100 files)
Number of TSql queries: 0
Number of Postgre SQL queries: 15
Number of PlSql queries: 0
Number of MySql queries: 3
New offset to use for query: 100
```
Batch size: 50
Github total request time: 6,3 min
```
Github requests time in ms: 376292(98,18%)
ANTLR parsing time in ms: 1688(0,44%)
Parsing tree string finding: 448(0,12%)
Whole time: 383269
Number of all found queries: 2(from 100 files)
Number of TSql queries: 0
Number of Postgre SQL queries: 2
Number of PlSql queries: 0
Number of MySql queries: 0
New offset to use for query: 200
```
Batch size: 100
Github total request time: 5,2 min
```
Github requests time in ms: 312213(95,48%)
ANTLR parsing time in ms: 5346(1,63%)
Parsing tree string finding: 5180(1,58%)
Whole time: 326986
Number of all found queries: 36(from 100 files)
Number of TSql queries: 0
Number of Postgre SQL queries: 0
Number of PlSql queries: 0
Number of MySql queries: 36
New offset to use for query: 300
```
Batch size: 100
Github total request time: 5,2 min
```
Github requests time in ms: 312213(95,48%)
ANTLR parsing time in ms: 5346(1,63%)
Parsing tree string finding: 5180(1,58%)
Whole time: 326986
Number of all found queries: 36(from 100 files)
Number of TSql queries: 0
Number of Postgre SQL queries: 0
Number of PlSql queries: 0
Number of MySql queries: 36
New offset to use for query: 300
```
\ No newline at end of file
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment