Data and Python Related FAQs

Where do I get started?

Like most projects, our advice is to treat this like any agile project with a specified timeline but without specific requirements.

  1. Understand the scope of the problem: link

  1. Get the data: link

  1. Understand the cloud infrastructure provided: link

  1. Work on a simple solution

  1. Update on the current result/model and go over Item 4 until satisfied.

My AWS account / SageMaker doesn't have the tutorial notebooks.

You need to change your AWS region. In the upper right, it will show "Ohio." Click on the dropdown menu and change it to N. Virginia.

I am entering the Sagemaker for the first time, permission is denied.

As of 04/12/21 - We have moved our access format to SSO. All participants should have received a guide to accessing your instances through this method.

I am lost. What is the "Big Picture" here?

We understand that this competition is to solve a challenging problem because of its complexity and multi-disciplinary data environment. Compared to lithology or log prediction, this challenge requires more domain knowledge. Our mentors are ready to help with any of your questions.

The "Big Picture" is: using sparse well logs, interpreted horizons, and seismic amplitude data -> predict rock properties at seismic trace positions.

Results will be analyzed based on the performance of the model on testing wells that have been withheld.  You can access the data using the S3 bucket. Using this data, you will perform feature engineering, train ML models, and use it/them rock properties. You will be able to access the data using these annotated notebooks on our GitHub repo here.

Is it better to continue with the competition on my local computer or SageMaker?

SageMaker notebook is a hosted anaconda environment (it runs miniconda). It isn't super convenient to go back and forth between developing in SageMaker and developing on your local machine.

Can I download any of the competition data from AWS S3?

Yes, this is possible. However, due to the sheer size of the seismic volumes, we recommend competitors use the data querying functionalities of the libraries we are providing in our GitHub repo. You can find all sorts of examples here. If you haven't checked our website's "data" section, please take a look there as well, it has a lot of useful information: link

However, if you still want to download all files follow these instructions

Everyone has the right permissions to download the files. This command cannot be done in the GUI for S3. Instead, you need to download the AWS CLI and use it from your local machine to copy (ie download) the contents of the bucket.

  1. Download the AWS command-line interface (CLI)

  2. configure the AWS CLI in your terminal with aws configure You will need your access ID and a secret that was emailed to you.

  3. Use this command in the terminal to download all the contents of the bucket: aws s3 cp s3://geophysics-on-cloud/poseidon/ home --recursive. This will download EVERYTHING to a folder on your local machine called “home”. You can change that path to something else. If you only want the VDS seismic data, then you could use this command: aws s3 cp s3://geophysics-on-cloud/poseidon/seismic/OpenVDS/ home --recursive


What rock properties am I trying to predict?

GITC 2021 requires that teams develop a machine learning seismic inversion algorithm to predict and generate: compressional Impedance (AI), shear impedance (SI), and density (RHOB) property volumes.

We provided DTC, DTS, and RHOB curves to be able to calculate impedance on your own. The reason we gave these measurements is that DTS or RHOB logs are not populated to the same extent as the DTC curves, hence we wanted to give competitors the opportunity to predict missing DTS/RHOB data or augment it before generating impedance.

Follow these steps to calculate impedances from DTC, DTS, and RHOB.

Compression Impedance (AI) = Vp * RHOB
Shear Impedance (SI) = Vs * RHOB

But the conversion from sonic (DTC - microseconds per foot) to Vp (feet per second) is below:

Vp = 1,000,000 / DTC
Vs = 1,000,000 / DTS

Likewise, from sonic (DTC - microseconds per foot) to Vp (meter per second)

Vp = 0.3048 * 1,000,000 / DTC
Vs = 0.3048 * 1,000,000 / DTS

This way, you can generate compressional velocity (Vp) and shear velocity (Vs).

Then it is possible to generate P-wave (compressional) impedance and S-wave (shear) impedance.

This is usually what regular inversions achieve; however, some inversion algorithms also estimate Vp, Vs, and Density instead of impedances and Density. Using the above formula, we can transform between the velocities and impedances.

Wells have different depth/time ranges. What is time range for test data?

 The test data is horizon constrained. The final model will cover:

  1. (top) The Jamieson horizon and

  2. (base) ~200ms beneath the Near Plover

It should be roughly around 2,600 and 3,400 milliseconds.

What are evaluation metrics?

It will most likely be R^2 at testing locations and some metric of total model variance (that measures stability).

Why are the well inline and crosslines floats and not integers?

We calculated exact inline and crossline values from the deviation corrected X/Y coordinates. This is so you can extract the proper borehole (composite) traces from the seismic. You could round them up to the nearest inline/crossline values, but keep in mind that may cause jumps in your extracted traces. Two hints:

  1. You can extract the nearest trace +/- a few more and do a weighted average, or

  2. You can interpolate the seismic traces to the exact wellbore points using scipy.interpolate.

In SageMaker, there is an instance named "seismic-lambda-vavourak-5". What is it?

There are 2 notebook instances named ML-Geo-Teams-Scratch-Area and ML-for-geo. The teams only have write access to the first one. In the scratch area, everyone will see everyone else's files etc. There is a folder called 0_team_folders under the root of the first instance, please create a folder in there for your team and try to contain all your work there.

When I try to read rss files, I get 'Nothing found at the path' error.

It is related to the Python environment. Please use:

  1. Any Python 3.6+ kernel

  2. Make sure you pip install s3fs==0.5.0 and real-simple-seismic.

After that configuration, it should work fine. It not; there may be something wrong with your kernel at that time, try another one.


Is there a way to use PyTorch with GPU? Looks like available instances are without GPUs.

For SageMaker (and other ML tools like DataBricks), you can declare the compute used for different tasks. This saves a lot of time and cost because you don't need a GPU for most of the notebook; usually, you only need it for the training steps. For the competition, the base notebook is not a GPU, but you can now use GPUs as needed for training as `training jobs`.

Can I run AWS Lambda functions? It says I am not authorized.

Competitors are not allowed to create lambda functions in these accounts. This may be confusing because AWS Constantine's video utilized lambdas. However, that video was only designed to showcase some of the cool ways that the cloud puts flexible compute within reach of users.