воскресенье, 21 июня 2015 г.

Notes about installing Debian/Ubuntu from other Linux distro with debootstrap

Existing guides are either incomplete, inaccurate or assume that the base system is Debian or Ubuntu: https://help.ubuntu.com/community/Installation/FromLinux#Without_CD

If you want to set it up manually, you can download the latest .tar.gz package from http://ftp.debian.org/debian/pool/main/d/debootstrap/ and run debootstrap from the source tree with DEBOOTSTRAP_DIR env variable. Note that it requires devices.tar.gz to be present in the same directory; you can either build it by running
make devices.tar.gz
or extract prebuilt one from debootstrap .deb package. Besides, it can require explicit arch and mirror URL in commandline, e. g.:
DEBOOTSTRAP_DIR=. ./debootstrap --arch=i386 vivid /mnt/installroot https://mirrors.kernel.org/ubuntu
Without the explicit URL it can attempt to get Release file from https://mirrors.kernel.org/debian even if an Ubuntu release (e. g. vivid) is specified.
All external dependencies are usually present in any Linux system, with the possible exception for Perl, which I installed in my live Slax with no effort (you may also find the previous blogpost about booting Slax from the Windows partition without using removeable media useful).

If everything has been set up properly, debootstrap should succeed installing the base system and you can proceed with mounting, chrooting, basic configuration (remember to configure apt sources) and installing packages. You can find names of Ubuntu metapackages for installing the whole system at https://help.ubuntu.com/community/MetaPackages (although it misses several variants like lubuntu-core).

суббота, 20 июня 2015 г.

If you REALLY need to boot Linux on a Windows machine and don't have any bootable media


You can boot Slax Linux from Windows NTFS partition using GRUB4DOS, not even touching the Windows bootsector. This HOWTO assumes that machine has a typical Windows 7 installation.

1. Download GRUB4DOS, extract grldr.mbr and grldr to your C:\

2. Use bcdedit commandline tool to add GRUB4DOS to bootmgr:
> BCDEDIT.EXE /store C:\boot\BCD /create /d "Start GRUB4DOS" /application bootsector
< {guid}
> BCDEDIT.EXE /store C:\boot\BCD /set {guid} device boot
> BCDEDIT.EXE /store C:\boot\BCD /set {guid} path \grldr.mbr
> BCDEDIT.EXE /store C:\boot\BCD /displayorder {guid} /addlast

See http://diddy.boot-land.net/grub4dos/files/install_windows.htm for more details.

3. Download Slax and extract slax directory to C:\

4. Create C:\menu.lst for GRUB4DOS menu and add menu item for booting Slax:
title slax
kernel /slax/boot/vmlinuz vga=normal load_ramdisk=1 prompt_ramdisk=0 rw printk.time=0 slax.flags=xmode,toram
initrd /slax/boot/initrfs.img

If you wonder where these boot parameters are from, Slax has a syslinux boot config C:\slax\boot\syslinux.cfg which defines a complex boot menu with toggleable options. There are several 'MENU BEGIN xxxxx' blocks, where first 4 menuname characters are 0 or 1, 1st one representing 'Persistent changes' option state, 2nd for 'Graphical desktop', 3rd for 'Copy to RAM' and 4th for 'Act as PXE server'. 5th character is 0...3, meaning nothing but a highlighted menu item number. I've chosen a variant with graphical desktop and copy-to-ram, so I'm using boot parameters from 'MENU BEGIN 01100' block:
KERNEL /slax/boot/vmlinuz
APPEND vga=normal initrd=/slax/boot/initrfs.img load_ramdisk=1 prompt_ramdisk=0 rw printk.time=0 slax.flags=xmode,toram
Note that grub syntax differs from syslinux one (inline kernel arguments, separate initrd line instead of kernel pseudo-argument).

Now you can choose 'Start GRUB4DOS' in Windows boot menu and 'slax' in GRUB4DOS menu to boot Slax.

воскресенье, 7 июня 2015 г.

Using Selenium+PhantomJS+Browsermob-proxy for AJAX scraping

Recently I needed to write a Python script which obtains data from AJAX traffic of a website with a very good anti-robot protection. Selenium with some browser (I prefer headless PhantomJS) is usually good for such tasks, but in this case it was not enough: I needed raw AJAX data, not webpage contents after its JS processing, and server did not accept direct requests, even with cookies which I've got with Selenium. I tried to get traffic dump from the browser; PhantomJS can generate HAR dump, but it turned out that it still misses support for capturing contents. Next idea was to use a capturing proxy; Browsermob-proxy came up as a good choice. It supports HAR, too, and can be easily controlled from Python script with this module, just like a browser with Selenium.
 
Here is the code example:

# Start proxy
from browsermobproxy import Server
server = Server('/path/to/browsermob-proxy')
server.start()
proxy = server.create_proxy()

# Start browser
import selenium.webdriver
browser = selenium.webdriver.PhantomJS('/path/to/phantomjs', service_args=['--proxy={0}'.format(proxy.proxy), '--ignore-ssl-errors=true'])

# Tell browser to open webapp page
browser.get('http://web.app/page/url')

# Tell proxy to start capture
proxy.new_har(options={'captureHeaders':True, 'captureContent':True})

# Tell browser to perform some actions which should cause AJAX requests
browser.find_element_by_id('some-input-id').send_keys('some input')

# Wait for end of transmission
# Ideally, this should be implemented as wating for some browser event
from time import sleep
sleep(10)

# Process results
for entry in proxy.har['log']['entries']:
    if entry['request']['url'] == 'http://web.app/ajax/endpoint/url':
        print(entry['response']['content']['text'])

# Shutdown
browser.quit()
server.stop()