Jekyll2022-04-06T07:29:42+00:00https://www.ericcolvinmorgan.com/feed.xmlEric Colvin MorganBlog and project site for Eric Colvin Morgan.Eric Colvin Morgansite@ericcolvinmorgan.comOpenGL Moon Simulation2021-07-24T22:52:33+00:002021-07-24T22:52:33+00:00https://www.ericcolvinmorgan.com/projects/2021/07/24/moonsimulation<p>NASA provides a number of color and elevation maps sourced from data assembled by the Lunar Reconnaissance Orbiter camera and laser altimeter instrument teams (located <a href="https://svs.gsfc.nasa.gov/4720">here</a>). I thought creating a realistic rendering of the Moon utilizing the texture and height fields available from NASA would be fun way to dig into OpenGL concepts and GLSL. To start I created a sphere containing ~2M vertices. This allowed for 4 vertices per degree of latitude and longitude. To this sphere I then applied the moon texture.</p>
<p>For purposes of simulating the surface detail I allowed a user to render using two approaches. The first, displacement mapping, reads in the displacement height map floats available. This was then used to recalculate both the vertex locations as well as the surface normals in the fragment and vertex shaders. This approach actually edits the surface geometry and is thus more computationally expensive to setup.</p>
<table style="text-align:center;">
<tbody>
<tr>
<td><img style="max-width:275px;" src="/assets/images/2021-07-24-moonsimulation/textureonly.jpg" /></td>
<td><img style="max-width:275px;" src="/assets/images/2021-07-24-moonsimulation/displacementmapping-noex.jpg" /></td>
</tr>
<tr>
<td>Texture Only</td>
<td>Displacement Mapping - No Exaggeration</td>
</tr>
</tbody>
</table>
<p>The images above show a rendering of the moon with just the textures applied and no height field mappings on the left, and with displacement mapping applied at the same scale as measured by NASA sensors (not particularly exciting given our distance away in the images).</p>
<p>In addition, I also enabled a bump mapping approach. This approach does not edit the the surface geometry. Instead, it simulates the geometry in the fragment shader. My implementation allows a user to shrink/grow the height fields to show an exaggerated view. You’ll notice there isn’t much of a visual difference between the two approaches from far away when looking at the interior of the Moon. If you look at the edges of the moon, however, you’ll easily notice the geometric differences.</p>
<table style="text-align:center;">
<tbody>
<tr>
<td><img style="max-width:275px;" src="/assets/images/2021-07-24-moonsimulation/displacementmapping-largeex.jpg" /></td>
<td><img style="max-width:275px;" src="/assets/images/2021-07-24-moonsimulation/bumpmapping-largeex.jpg" /></td>
</tr>
<tr>
<td>Displacement Mapping - Large Exaggeration</td>
<td>Bump Mapping - Large Exaggeration</td>
</tr>
</tbody>
</table>
<p>This simulation also features a fly by view available to users. This was created by calculating and placing way points over the sphere, then calculating Catmull-Rom curves between the points to enable a smooth animation path between the way points. This creates a rail that the camera follows while flying and banking across the moon’s surface. This view also highlights the differences between displacement and bump mapping approaches.</p>
<table style="text-align:center;">
<tbody>
<tr>
<td><img style="max-width:275px;" src="/assets/images/2021-07-24-moonsimulation/flyby-dm-largeex.jpg" /></td>
<td><img style="max-width:275px;" src="/assets/images/2021-07-24-moonsimulation/flyby-bm-largeex.jpg" /></td>
</tr>
<tr>
<td>Displacement Mapping - Fly-by - Large Exaggeration</td>
<td>Bump Mapping - Fly-by - Large Exaggeration</td>
</tr>
</tbody>
</table>
<p>For purposes of rendering the moon I used a combination of BMP textures and TIF displacement maps. I quickly learned that the way this information is read into the software differs greatly between the two formats. For example, BMP files store pixels with bytes detailing the rows starting from the bottom of the image to the top. The TIF files utilized for this process store the displacement detail with bytes starting from the top of the image to the bottom.</p>
<p>Incorrectly pulling in file detail can cause obvious disconnects. The image to the below shows the result of incorrectly mapping height fields to texture detail. This image shows the same geographic location as that in the images above, but with the height maps not corrected for file format differences. You can see it leads to a very different scene.</p>
<table style="text-align:center;">
<tbody>
<tr>
<td><img style="max-width:275px;" src="/assets/images/2021-07-24-moonsimulation/incorrectheightfields.jpg" /></td>
<td><img style="max-width:275px;" src="/assets/images/2021-07-24-moonsimulation/flyby-path.jpg" /></td>
</tr>
<tr>
<td>Incorrectly Mapped Height Fields</td>
<td>Fly-by Example</td>
</tr>
</tbody>
</table>
<p>The BMP texture file detail was easy to extract, but the TIFF detail for the height field maps was a bit more complicated. I utilized the <a href="https://www.adobe.io/content/dam/udp/en/open/standards/tiff/TIFF6.pdf">TIF specification standards</a> and built a TIF reader class. The class written for this purpose allows a user to read in both the UInt and float displacement maps located at the previously mentioned NASA website. The TIF specification is fairly broad, and thus no attempt was made to handle TIFs that would contain features such as compression or tiling as they were outside of the scope required for reading the NASA provided displacement map.</p>
<p><a href="https://github.com/ericcolvinmorgan/MoonSimulation">OpenGL Moon Simulation</a></p>Eric Colvin Morgansite@ericcolvinmorgan.comNASA provides a number of color and elevation maps sourced from data assembled by the Lunar Reconnaissance Orbiter camera and laser altimeter instrument teams (located here). I thought creating a realistic rendering of the Moon utilizing the texture and height fields available from NASA would be fun way to dig into OpenGL concepts and GLSL. To start I created a sphere containing ~2M vertices. This allowed for 4 vertices per degree of latitude and longitude. To this sphere I then applied the moon texture.Text Extraction Engine2021-06-20T20:25:33+00:002021-06-20T20:25:33+00:00https://www.ericcolvinmorgan.com/projects/2021/06/20/textextraction<p>Very commonly a requirement of extracting data from a PDF/image presents itself. When individuals receive PDF documents they come in a variety of formats. Some have been system generated and thus text is generally embedded directly in the PDF, while some are scanned documents and are generally (various quality) image files embedded in the PDF. In all scenarios, while PDF viewers contain some tools to extract text, they do not output text in a consistent method across various documents. This makes it difficult to automate the extract of detail from documents. On the image side, there is generally no structural information available about the content and there is no embedded text available.</p>
<p>The Text Extraction Engine provides users with a screen to upload documents to the service. After uploading, documents will be processed via a number of AWS Lambda services coordinated via AWS Step Functions. PDF documents have text extracted via Poppler, while PDF files with embedded images or image files are processed via Tesseract. Custom Lambda layers have been created containing a build of each of these services (See <a href="/projects/2021/04/27/popplertesseractopencvawslayers.html">here</a> for more information). After the document has been successfully processed the end user is able to download the output as JSON.</p>
<p><a href="/assets/images/2021-06-20-textextraction/validation.jpg"><img align="right" style="max-width:450px; padding:5px;" src="/assets/images/2021-06-20-textextraction/validation.jpg" /></a></p>
<p>In addition, a validation screen has been created allowing users to validate OCR results via an easy to use UI. A toggle has been provided so a user can toggle their desired confidence levels and the associated color coding is changed on the fly. The original image has results overlayed and color-coded green, yellow, or red based on whether they are high, medium, or low confidence results. This output is written to a canvas element for rendering.</p>
<p>I currently do not have a demo hosted, though everything is available via my <a href="https://github.com/ericcolvinmorgan/TextExtraction">GitHub</a>. I’ve also included a CodeFormation template to ease the creation of the various related AWS services required for hosting. Enjoy!</p>
<p><a href="https://github.com/ericcolvinmorgan/TextExtraction">Text Extraction Engine</a></p>Eric Colvin Morgansite@ericcolvinmorgan.comVery commonly a requirement of extracting data from a PDF/image presents itself. When individuals receive PDF documents they come in a variety of formats. Some have been system generated and thus text is generally embedded directly in the PDF, while some are scanned documents and are generally (various quality) image files embedded in the PDF. In all scenarios, while PDF viewers contain some tools to extract text, they do not output text in a consistent method across various documents. This makes it difficult to automate the extract of detail from documents. On the image side, there is generally no structural information available about the content and there is no embedded text available.Calculation Engine2021-05-09T17:42:33+00:002021-05-09T17:42:33+00:00https://www.ericcolvinmorgan.com/projects/2021/05/09/calculationengine<p>One of the larger complaints encountered by tax modeling teams and clients/auditors consuming models is a lack of transparency into calculations produced by SaaS systems. Coupled with the fact that regulations can change frequently and/or legal positions can differ dramatically depending on various fact patterns models/calculations need to be updated very often leading to a proliferation of ad-hoc Excel models tweaked in various ways. While new positions can be added to a development roadmap, turn around time to develop and move through the various change management layers is often unacceptable when deadlines are measured in days and not weeks. This project was created in an attempt to address some of these concerns.</p>
<p>Excel is powerful in the sense that it is very easy to use and produce auditable reports, though managing and updating 100s of files with new positions is tedious. The CalcEngine includes a C#-based service that tokenizes and extracts the calculation tree from a base file produced by subject matter experts. Once a dataset has been produced, an additional C++-based calculation service runs CSV datasets through the calculation trees producing calculations based on the Excel logic provided by SMEs. The service does not have dependencies on Excel, so can easily be deployed where needed (an EC2 instance, AWS Lambda, etc.). This allows users to quickly update a theoretical SaaS system with new calculations, while maintaining the calculation tree enabling the production of auditable reports downstream.</p>
<p><a href="https://github.com/ericcolvinmorgan/CalcEngine">Calculation Engine</a></p>Eric Colvin Morgansite@ericcolvinmorgan.comOne of the larger complaints encountered by tax modeling teams and clients/auditors consuming models is a lack of transparency into calculations produced by SaaS systems. Coupled with the fact that regulations can change frequently and/or legal positions can differ dramatically depending on various fact patterns models/calculations need to be updated very often leading to a proliferation of ad-hoc Excel models tweaked in various ways. While new positions can be added to a development roadmap, turn around time to develop and move through the various change management layers is often unacceptable when deadlines are measured in days and not weeks. This project was created in an attempt to address some of these concerns.Creating Poppler, Tesseract, and OpenCV AWS Lambda Layers2021-04-28T04:31:56+00:002021-04-28T04:31:56+00:00https://www.ericcolvinmorgan.com/projects/2021/04/28/popplertesseractopencvawslayers<p>The scripts in <a href="https://github.com/ericcolvinmorgan/TextExtraction/tree/master/services/layers/textextraction">this</a> directory provide and example of creating Poppler, Tesseract, and OpenCV layers for AWS Lambda based services. Projects are compiled in the amazon/aws-lambda-python:3.8 image.</p>
<p>To use, from the services/layers/textextraction path first download all current targeted dependencies utilized by the subsequent image builds by running the <code class="language-plaintext highlighter-rouge">./fetchDependencies.sh</code> script. You can change versions as desired in this script. The commands given will also redirect all relevant STDOUT and STDERR to an output.txt file for future reference.</p>
<h4 id="base-image">Base Image</h4>
<p>Next, we will build a base image that contains a number of base build tool dependencies for subsequent builds.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ./base
docker build . -t textextraction-base --progress=plain 2>&1 | tee output.txt
</code></pre></div></div>
<h4 id="development-dependencies-image">Development Dependencies Image</h4>
<p>We will then build a dev image that contains a number of dev dependencies used by the indiviudal builds.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ../dev
docker build . -t textextraction-dev --progress=plain 2>&1 | tee output.txt
</code></pre></div></div>
<h4 id="opencv-tesseract-and-poppler-images">OpenCV, Tesseract, and Poppler Images</h4>
<p>Finally, we will build our individual components. For each component, we’ll build our container, compile the applicable software, and extract our needed files into a layer.zip file.</p>
<p><strong>Poppler</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ../poppler
docker build . -t textextraction-poppler:latest --progress=plain 2>&1 | tee output.txt
./extractLayer.sh
</code></pre></div></div>
<p><strong>Tesseract</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ../tesseract
docker build . -t textextraction-tesseract:latest --progress=plain 2>&1 | tee output.txt
./extractLayer.sh
</code></pre></div></div>
<p>Depending on how you’re copying file you may find you’re receiving the following error when you try to run commands against pytesseract:</p>
<blockquote>
<p>TesseractNotFoundError()</p>
</blockquote>
<p>Assuming everything is in the correct path if you try to run the program directly you’ll likely see a message saying <code class="language-plaintext highlighter-rouge">/opt/bin/tesseract: Permission denied</code>. Simply <code class="language-plaintext highlighter-rouge">chmod +x /opt/bin/tesseract</code> to address this issue and pytesseract should work again. Other potential issues you’ll see with pytesseract is an error message thrown when you do not have your language files imported correctly. You’ll need to download those and configure following the instructions <a href="https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html">here</a>.</p>
<p><strong>OpenCV/Tesseract</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ../opencv
docker build . -t textextraction-opencv:latest --progress=plain 2>&1 | tee output.txt
./extractLayer.sh
</code></pre></div></div>
<p>If you look at the Dockerfile for this layer, you’ll notice I’m pulling from the previous Tesseract image. I’ve included these together for my convenience, so adjust to pull from the dev image if you prefer just to build OpenCV. This folder also contains a file called <code class="language-plaintext highlighter-rouge">aws-cv2-config-3.8.py</code> file. You’ll need to update the path in this file for cv2 to be imported correctly otherwise you’ll likely receive the following error:</p>
<blockquote>
<p>ERROR: recursion is detected during loading of “cv2” binary extensions. Check OpenCV installation.</p>
</blockquote>
<h4 id="testing">Testing</h4>
<p>There are a number of functions in this project that consume these layers. I’ve created Dockerfiles that create containers against the <code class="language-plaintext highlighter-rouge">amazon/aws-lambda-*</code> images and copy the applicable layer and latest code into the container for testing directly against a similar environment to which I am deploying. Testing can then be accomplished as follows:</p>
<p>Command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build . -t [NEW IMAGE NAME]:latest
docker run -p 9000:8080 [NEW IMAGE NAME]:latest
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
</code></pre></div></div>
<p>You can also spin up and run these environments directly to bash using the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run -it --entrypoint /bin/bash [IMAGE NAME]:latest
</code></pre></div></div>
<h4 id="helpful-links">Helpful Links</h4>
<p><a href="https://aws.amazon.com/premiumsupport/knowledge-center/lambda-layer-simulated-docker/">AWS - How To Create a Lambda Layer Using a Simulated Environment</a><br />
<a href="https://docs.aws.amazon.com/lambda/latest/dg/images-test.html">AWS - Testing Lambda Container Images Locally</a><br />
<a href="https://tesseract-ocr.github.io/tessdoc/">Tesseract Documentation</a></p>Eric Colvin Morgansite@ericcolvinmorgan.comThe scripts in this directory provide and example of creating Poppler, Tesseract, and OpenCV layers for AWS Lambda based services. Projects are compiled in the amazon/aws-lambda-python:3.8 image.Tax Form Data Extraction - Environment Setup2021-04-14T02:55:33+00:002021-04-14T02:55:33+00:00https://www.ericcolvinmorgan.com/projects/2021/04/14/taxformdataextraction<p>This post details steps in setting up an environment used for extracting text from tax forms. Today I’ll be walking through the steps to install the necessary tools needed going forward. The environment in which I will be working will be Windows 10 with WSL2 configured and running Ubuntu 20.04 (WSL installation instructions <a href="https://docs.microsoft.com/en-us/windows/wsl/install-win10">here</a>). It is possible to install all of the dependencies mentioned today directly in a Windows environment, but it will require a lot more setup; I strongly suggest embracing WSL. It shouldn’t take more than a couple hours to get everything up and running.</p>
<p>We start by updating our package manager:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get update # Update package source list
sudo apt-get upgrade # Update installed packages
</code></pre></div></div>
<p>Next, install Python3. The following command will show you if Python is installed.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python3 --version
</code></pre></div></div>
<p>If the command was not installed then install Python3 with the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install python3
</code></pre></div></div>
<p>You should also see <code class="language-plaintext highlighter-rouge">/usr/bin/python3</code> as the output if you type <code class="language-plaintext highlighter-rouge">which python3</code>. If you see something different there were other versions installed on your machine at another time, so these instructions may change.</p>
<p>Next, we’ll install development headers</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install python3-dev
</code></pre></div></div>
<p>I like to make use of virutal environments to better manage packages while developing, so we will need to setup our dependencies for these:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install python3-pip # Install pip3
sudo pip3 install virtualenv # Install virtualenv
</code></pre></div></div>
<p>From here, you can create a virtual environment for your project as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd $PROJECT_IRECTORY_HERE # Navigate to project directory
virtualenv .venv # Create virtual environment
source .venv/bin/activate # Activate environment
</code></pre></div></div>
<p>You can deactivate your environment by simply running <code class="language-plaintext highlighter-rouge">deactivate</code>. We won’t start our environment now, but we will come back to this.</p>
<h2 id="open-cv">Open CV</h2>
<p>Preinstall Note - There may be packages available via pip that streamline a lot of this installation. Most likely target older versions of Open CV and/or not have all features included.</p>
<p><a href="https://www.opencv.org/">Open Source Computer Vision (Open CV)</a> is a library that contains a large number of computer vision algorithms. Why do we need this to extract detail from a tax form? At the most basic level we’ll likely need to clean and manipulate a PDF document to get it into a state that it is ready to run OCR across it. OCR works best on documents that are clean, and OpenCV contains a number of tools that will help us with pre-processing. We can also use OpenCV to help us define structure in a document.</p>
<p>We start by updating our package manager:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get update # Update package source list
sudo apt-get upgrade # Update installed packages
</code></pre></div></div>
<p>Next, we start our installation of OpenCV. The instructions below are adapted from the instructions <a href="https://docs.opencv.org/4.4.0/d7/d9f/tutorial_linux_install.html">here</a>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install build-essential # Required compiler dependencies
sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev # Required packages
</code></pre></div></div>
<p>We can install additional optional dependencies next. These are optional for OpenCV, but we’ll be using them for our projects so go ahead and install:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install python3-dev python3-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev # Optional packages
</code></pre></div></div>
<p>These are as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- python3-dev # Python development headers
- python3-numpy # Numpy for Python3
- libtbb2 libtbb-dev # Parallelism libraries for C++
- libjpeg-dev libpng-dev libtiff-dev # Libraries for various media
</code></pre></div></div>
<p>We will need to build OpenCV, so we’ll be downloading the latest version of OpenCV and OpenCV’s extra modules next:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd ~
wget -O opencv.zip https://github.com/opencv/opencv/archive/4.5.2.zip
wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.5.2.zip
unzip opencv.zip
unzip opencv_contrib.zip
</code></pre></div></div>
<p>After everything has been unzipped, we can build OpenCV. This process may take some time. OpenCV is an open source library, and utilizes a number of algorithms that are free for personal or academic purposes, but are not for commercial purposes. Depending on your use case you can turn these patented algorithms off using the <code class="language-plaintext highlighter-rouge">OPENCV_ENABLE_NONFREE</code> flag below.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cd opencv-4.5.2 # Open unzipped directory
mkdir build # Make build directory
cd build # Open build directory
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D INSTALL_C_EXAMPLES=ON \
-D INSTALL_PYTHON_EXAMPLES=ON \
-D OPENCV_GENERATE_PKGCONFIG=ON \
-D OPENCV_ENABLE_NONFREE=OFF \
-D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib-4.5.2/modules \
-D BUILD_EXAMPLES=ON ..
</code></pre></div></div>
<p>You should see output similar to the below when successful:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-- General configuration for OpenCV 4.5.2 =====================================
-- Version control: unknown
--
-- Extra modules:
-- Location (extra): /home/eric/opencv_contrib-4.5.2/modules
-- Version control (extra): unknown
--
-- Platform:
-- Timestamp: 2021-04-13T21:53:55Z
-- Host: Linux 4.19.104-microsoft-standard x86_64
-- CMake: 3.16.3
-- CMake generator: Unix Makefiles
-- CMake build tool: /usr/bin/make
-- Configuration: RELEASE
--
-- CPU/HW features:
-- Baseline: SSE SSE2 SSE3
-- requested: SSE3
-- Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
-- requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
-- SSE4_1 (17 files): + SSSE3 SSE4_1
-- SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2
-- FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
-- AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
-- AVX2 (31 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
-- AVX512_SKX (7 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
--
-- C/C++:
-- Built as dynamic libs?: YES
-- C++ standard: 11
-- C++ Compiler: /usr/bin/c++ (ver 9.3.0)
-- C++ flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
-- C++ flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
-- C Compiler: /usr/bin/cc
-- C flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
-- C flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
-- Linker flags (Release): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed
-- Linker flags (Debug): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed
-- ccache: NO
-- Precompiled headers: NO
-- Extra dependencies: dl m pthread rt
-- 3rdparty dependencies:
--
-- OpenCV modules:
-- To be built: aruco bgsegm bioinspired calib3d ccalib core datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
-- Disabled: world
-- Disabled by dependency: -
-- Unavailable: alphamat cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv hdf java julia matlab ovis python2 sfm viz
-- Applications: tests perf_tests examples apps
-- Documentation: NO
-- Non-free algorithms: NO
--
-- GUI:
-- GTK+: YES (ver 2.24.32)
-- GThread : YES (ver 2.64.6)
-- GtkGlExt: NO
-- VTK support: NO
--
-- Media I/O:
-- ZLib: /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11)
-- JPEG: /usr/lib/x86_64-linux-gnu/libjpeg.so (ver 80)
-- WEBP: build (ver encoder: 0x020f)
-- PNG: /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.6.37)
-- TIFF: /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
-- JPEG 2000: build (ver 2.4.0)
-- OpenEXR: build (ver 2.3.0)
-- HDR: YES
-- SUNRASTER: YES
-- PXM: YES
-- PFM: YES
--
-- Video I/O:
-- DC1394: NO
-- FFMPEG: YES
-- avcodec: YES (58.54.100)
-- avformat: YES (58.29.100)
-- avutil: YES (56.31.100)
-- swscale: YES (5.5.100)
-- avresample: NO
-- GStreamer: NO
-- v4l/v4l2: YES (linux/videodev2.h)
--
-- Parallel framework: pthreads
--
-- Trace: YES (with Intel ITT)
--
-- Other third-party libraries:
-- Intel IPP: 2020.0.0 Gold [2020.0.0]
-- at: /home/eric/opencv-4.5.2/build/3rdparty/ippicv/ippicv_lnx/icv
-- Intel IPP IW: sources (2020.0.0)
-- at: /home/eric/opencv-4.5.2/build/3rdparty/ippicv/ippicv_lnx/iw
-- VA: NO
-- Lapack: NO
-- Eigen: NO
-- Custom HAL: NO
-- Protobuf: build (3.5.1)
--
-- OpenCL: YES (no extra features)
-- Include path: /home/eric/opencv-4.5.2/3rdparty/include/opencl/1.2
-- Link libraries: Dynamic load
--
-- Python 3:
-- Interpreter: /usr/bin/python3 (ver 3.8.5)
-- Libraries: /usr/lib/x86_64-linux-gnu/libpython3.8.so (ver 3.8.5)
-- numpy: /usr/lib/python3/dist-packages/numpy/core/include (ver 1.17.4)
-- install path: lib/python3.8/dist-packages/cv2/python-3.8
--
-- Python (for build): /usr/bin/python2.7
--
-- Java:
-- ant: NO
-- JNI: NO
-- Java wrappers: NO
-- Java tests: NO
--
-- Install to: /usr/local
-- -----------------------------------------------------------------
--
-- Configuring done
-- Generating done
-- Build files have been written to: /home/eric/opencv-4.5.2/build
</code></pre></div></div>
<p>After you have produced the make files, it’s time to build and install. Type <code class="language-plaintext highlighter-rouge">make -j5</code> (or just <code class="language-plaintext highlighter-rouge">make</code>) into the command line and step away for awhile. This process will take 30 - 1 hour.</p>
<p>Finally, run <code class="language-plaintext highlighter-rouge">sudo make install</code> to complete the installation.</p>
<p>Installation can be verified using the following. If succesful you will see the applicable version number for both of the commands below:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python3 -c "import cv2; print(cv2.__version__)" # Verify Python success
pkg-config --modversion opencv4 # Verify Open CV is available to pkg-config (not applicable for just Python users)
</code></pre></div></div>
<p>To enable cv2 in a Python environment run the following command to copy the necessary package files.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cp /usr/local/lib/python3.8/dist-packages/cv2 $ENVIRONMENT_PATH/.venv/lib/python3.8/site-packages
</code></pre></div></div>
<h2 id="tesseract">Tesseract</h2>
<p><a href="https://tesseract-ocr.github.io/tessdoc/Home.html">Tesseract</a> is an open source Optical Character Recognition (OCR) solution. This library will be used to extract text from image files. Installation here will be significantly easier than the previous OpenCV installation.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install tesseract-ocr # Installs Tesseract
tesseract -v # Verify Tesseract has been installed
</code></pre></div></div>
<p>If successful you should see something similar to the following. This lists the tesseract version and supported image libraries.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tesseract 4.1.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
</code></pre></div></div>
<p>At this point, typing <code class="language-plaintext highlighter-rouge">tesseract $IMAGE_PATH stdout</code> should display output text from an image directly to the command line. You should see the text found in the image.</p>
<p>You can install a tesseract wrapper for Python by activating your environment and installing pytesseract. You can verify it works by running the below command. You should see the same output as the <code class="language-plaintext highlighter-rouge">tesseract $IMAGE_PATH stdout</code> command run above.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip3 install pytesseract
python3 -c "import pytesseract; from PIL import Image; print(pytesseract.image_to_string(Image.open('$IMAGE_PATH')))" # Verify Python succes
</code></pre></div></div>
<h2 id="poppler">Poppler</h2>
<p><a href="https://poppler.freedesktop.org/">Poppler</a> is a utility that can convert PDFs to image files, and can also extract text embedded in PDFs directly. As embedded text is definitely preferred over having to OCR a document to extract text, we’ll use this as our first pass on a tax form. Install the tools needed using the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt-get install poppler-utils libpoppler-cpp-dev
</code></pre></div></div>
<p>You can confirm installation by typing <code class="language-plaintext highlighter-rouge">pdftotext -v</code> to view version details.</p>
<p>After Poppler has been successfully installed, you can activate your Python environment and install the Python wrapper:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip install python-poppler
</code></pre></div></div>
<p>That’s it! You should now be good to go for each of the above libraries.</p>Eric Colvin Morgansite@ericcolvinmorgan.comThis post details steps in setting up an environment used for extracting text from tax forms. Today I’ll be walking through the steps to install the necessary tools needed going forward. The environment in which I will be working will be Windows 10 with WSL2 configured and running Ubuntu 20.04 (WSL installation instructions here). It is possible to install all of the dependencies mentioned today directly in a Windows environment, but it will require a lot more setup; I strongly suggest embracing WSL. It shouldn’t take more than a couple hours to get everything up and running.CHIP-8 Interpreter2021-04-10T18:31:33+00:002021-04-10T18:31:33+00:00https://www.ericcolvinmorgan.com/projects/2021/04/10/chip8<p>Emulation has always been a topic I wanted to explore, so I recently started to work through a CHIP-8 Interpreter to start to dive into this topic. This isn’t so much of an emulation project as it is an interpreter project as CHIP-8 wasn’t an actual chip, but an interpreted language originally created for the COSMAC VIP in the 1970s. Nonetheless, it is often a recommended jump-off project for those interested in the world of emulation.</p>
<p>My specific implementation is C++ based and utilizes SDL2 for graphics, audio output, and keyboard input. While there is a desktop app included in the project, the majority of my development efforts were focused on the browser app. This application was compiled to WebAssembly using Emscripten. As part of this project I also wrote some machine code to create a custom title card ROM that’s displayed when the user starts the web app. Overall, this was a very enjoyable project. There is a ton of detailed technical documentation online around the CHIP-8, so you are really able to focus on your specific implementation without spinning too much. This definitely laid a great foundation to start moving toward emulating something a bit more in depth. I may explore emulating the NES or GameBoy as one of my next projects.</p>
<p>You can try out my CHIP-8 Interpreter <a href="https://www.ericcolvinmorgan.com/Chip8Emulation/">here</a>.</p>
<p>The following sources were a great help in this project: <br />
<a href="https://en.wikipedia.org/wiki/CHIP-8">CHIP-8 Wikipedia</a> <br />
<a href="https://tobiasvl.github.io/blog/write-a-chip-8-emulator/">Write a CHIP-8 Emulator</a> <br />
<a href="http://devernay.free.fr/hacks/chip8/C8TECH10.HTM">Cowgod’s Chip-8 Technical Reference</a></p>Eric Colvin Morgansite@ericcolvinmorgan.comEmulation has always been a topic I wanted to explore, so I recently started to work through a CHIP-8 Interpreter to start to dive into this topic. This isn’t so much of an emulation project as it is an interpreter project as CHIP-8 wasn’t an actual chip, but an interpreted language originally created for the COSMAC VIP in the 1970s. Nonetheless, it is often a recommended jump-off project for those interested in the world of emulation.Welcome!2021-04-03T19:08:26+00:002021-04-03T19:08:26+00:00https://www.ericcolvinmorgan.com/personal/2021/04/03/welcome<p>Hello! My name is Eric Morgan. I’m a software developer and CPA. I’ve worked as a development resource for multiple Big 4 accounting firms. I’ve served in internal client facing roles doing everything from serving as a development lead of both on-shore and off-shore teams to performing the role of a product owner, and in external client facing roles recommending and implementing technology based solutions to address client pain points and improve engagement efficiency.</p>
<p>This site is intended to be a place for me to express my thoughts and opinions, and to document personal projects.</p>Eric Colvin Morgansite@ericcolvinmorgan.comHello! My name is Eric Morgan. I’m a software developer and CPA. I’ve worked as a development resource for multiple Big 4 accounting firms. I’ve served in internal client facing roles doing everything from serving as a development lead of both on-shore and off-shore teams to performing the role of a product owner, and in external client facing roles recommending and implementing technology based solutions to address client pain points and improve engagement efficiency.