Commit b21d33de authored by Antony Saba's avatar Antony Saba Committed by Alex Hanselka

Howto for log archive GCS to BigQuery

parent 8bf87452
......@@ -118,6 +118,10 @@ longer retention and to preserve logs in object storage for 180days.
They are created by https://github.com/GoogleCloudPlatform/pubsubbeat , I don't see a way we can remove them without forking the project.
#### What if I need to query logs older than 30 days?
See [logging_gcs_archive_bigquery.md](logging_gcs_archive_bigquery.md) for
instructions on loading logs into `BigQuery` from their GCS archive files.
### Configuration
......
# Loading StackDriver archives from Google Cloud Storage (GCS) into BiqQuery
## Summary
Loading logs from a StackDriver sink In order to load a BigQuery table from a
StackDriver produced log archive, a schema must be defined using `RECORD`
types for the nested JSON elements.
### Why
* You need to query logs older than 30 days
* You need aggregate operators, and eventually
* summarized export of results
* visualization
### What
Logs that come in to StackDriver (see [logging.md](logging.md)) are also sent
to Google Cloud Storage in batches using an export sink. After 30 days, the
log messages are expired in StackDriver, but remain in GCS.
# How
## Using the UI
These instructions are similiar in both the new style (within `console.cloud.google.com`)
and the old style (external page), but the screen shots may appear with
differing styles.
1. Create a dataset if necessary.
2. Click on a control to "Add a new table"
3. Choose "Google Cloud Storage" with "JSON (Newline Delimted)" as the `Source data`.
![source data](../img/create_table_source.png)
4. Unselect "Auto detect Schema and input parameters" if selected.
5. Add records for fields, using `RECORD` type for nested fields and adding
subfields using the `+` on the parent record. It should look something like this:
![record type](../img/bigquery_schema_record.png)
6. In `Advanced options`, check `Ignore unknown values`
7. If data to be imported is large, consider whether partioning will be necessary.
1. Add `timestamp` field of type `TIMESTAMP`
2. In `Advanced options`, select it as the partitioning field:
![partition by timestamp](../img/bigquery_table_partition.png)
8. Create the table. If everything is right, a background job will run to
load the data into the new table.
## Alternative: Starting from an existing schema
To save time and increase usability, the text version of a table schema can be
dumped with `bq`:
```
$ bq show --schema --format=prettyjson myproject:myhaproxy.haproxy > haproxy_schema.json
```
The result can be copied and pasted into BigQuery by selecting `Edit as text`
when creating the schema.
Contribute changes or new schemas back to [logging_bigquery_schemas](../logging_bigquery_schemas).
# TODO
* It's probably possible to perform the above tasks with the `bq` command line.
\ No newline at end of file
[
{
"mode": "NULLABLE",
"name": "logName",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "type",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "instance_id",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "zone",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "project_id",
"type": "STRING"
}
],
"mode": "NULLABLE",
"name": "labels",
"type": "RECORD"
}
],
"mode": "NULLABLE",
"name": "resource",
"type": "RECORD"
},
{
"mode": "NULLABLE",
"name": "textPayload",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "host",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "pid",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "ident",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "message",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "environment",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "fqdn",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "hostname",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "ssl_version",
"type": "STRING"
}
],
"mode": "NULLABLE",
"name": "jsonPayload",
"type": "RECORD"
},
{
"mode": "NULLABLE",
"name": "timestamp",
"type": "TIMESTAMP"
},
{
"mode": "NULLABLE",
"name": "receiveTimestamp",
"type": "TIMESTAMP"
},
{
"mode": "NULLABLE",
"name": "severity",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "insertId",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "requestMethod",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "requestUrl",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "requestSize",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "status",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "responseSize",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "userAgent",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "remoteIp",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "serverIp",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "referer",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "cacheLookup",
"type": "BOOLEAN"
},
{
"mode": "NULLABLE",
"name": "cacheHit",
"type": "BOOLEAN"
},
{
"mode": "NULLABLE",
"name": "cacheValidatedWithOriginServer",
"type": "BOOLEAN"
},
{
"mode": "NULLABLE",
"name": "cacheFillBytes",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "protocol",
"type": "STRING"
}
],
"mode": "NULLABLE",
"name": "httpRequest",
"type": "RECORD"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "tag",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "compute_googleapis_com_resource_name",
"type": "STRING"
}
],
"mode": "NULLABLE",
"name": "labels",
"type": "RECORD"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "id",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "producer",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "first",
"type": "BOOLEAN"
},
{
"mode": "NULLABLE",
"name": "last",
"type": "BOOLEAN"
}
],
"mode": "NULLABLE",
"name": "operation",
"type": "RECORD"
},
{
"mode": "NULLABLE",
"name": "trace",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "spanId",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "traceSampled",
"type": "BOOLEAN"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "file",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "line",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "function",
"type": "STRING"
}
],
"mode": "NULLABLE",
"name": "sourceLocation",
"type": "RECORD"
}
]
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment